genetic algorithm optimization in drug design qsar bayesian-regularized genetic neural networks...
TRANSCRIPT
-
8/2/2019 Genetic Algorithm Optimization in Drug Design QSAR Bayesian-Regularized Genetic Neural Networks (BRGNN) and
1/21
Mol Divers (2011) 15:269289
DOI 10.1007/s11030-010-9234-9
COMPREHENSIVE REVIEW
Genetic algorithm optimization in drug design QSAR:Bayesian-regularized genetic neural networks (BRGNN)
and genetic algorithm-optimized support vectors machines(GA-SVM)
Michael Fernandez Julio Caballero
Leyden Fernandez Akinori Sarai
Received: 14 May 2009 / Accepted: 25 January 2010 / Published online: 20 March 2010
Springer Science+Business Media B.V. 2010
Abstract Many articles in in silico drug design imple-
mented genetic algorithm (GA) for feature selection, modeloptimization, conformational search, or docking studies.
Some of these articles described GA applications to quan-
titative structureactivity relationships (QSAR) modeling in
combination withregressionand/orclassificationtechniques.
We reviewedthe implementationofGAindrug designQSAR
andspecifically itsperformance in theoptimizationof robust
mathematical models such as Bayesian-regularized artificial
neural networks (BRANNs) and support vector machines
(SVMs) on different drug design problems. Modeled data
sets encompassed ADMET and solubility properties, cancer
target inhibitors, acetylcholinesterase inhibitors, HIV-1 pro-
tease inhibitors, ion-channel and calcium entry blockers, and
antiprotozoan compounds as well as protein classes, func-
tional, and conformational stability data. The GA-optimized
predictors were often more accurate and robust than previ-
ous published models on the same data sets and explained
more than 65% of data variances in validation experiments.
In addition, feature selection over large pools of molecular
descriptors provided insights into the structural and atomic
properties ruling ligandtarget interactions.
M. Fernandez (B) A. Sarai
Department of Bioscience and Bioinformatics, Kyushu Instituteof Technology (KIT), 680-4 Kawazu, Iizuka 820-8502, Japan
e-mail: [email protected]
J. Caballero
Centro de Bioinformatica y Simulacion Molecular, Universidad de
Talca, 2 Norte 685, Casilla 721, Talca, Chile
e-mail: [email protected]
L. Fernandez
Barcelona Supercomputing CenterCentro Nacional
de Supercomputacin, Nexus II Building c/ Jordi Girona, 29,
08034 Barcelona, Spain
Keywords Drug design Enzyme inhibition Feature
selection In silico modeling QSAR Review SAR Structureactivity relationships
List of abbreviations
ADMET Absorption, distribution, metabolism,
excretion and toxicity
AD Alzheimers disease
log S Aqueous solubility
ANNs Artificial neural networks
BRANNs Bayesian-regularized artificial neural
networks
BRGNNs Bayesian-regularized genetic neural
networks
BBB Bloodbrain barrier
CoMFA Comparative molecular field analysis
CG Conjugated Gradient
GA Genetic algorithm
GA-PLS Genetic algorithm-based partial least
squares
GA-SVM Genetic algorithm-optimized support vector
machines
GNN Genetic neural networks
GSR Genetic stochastic resonance
HIA Human intestinal absorption
PPBR Human plasma protein binding rate
Log P Lipophilicity
LHRH Luteinizing hormone-releasing hormone
MMP Matrix metalloproteinase
MT Mitochondrial toxicity
MLR Multiple linear regression
MT Negative mitochondrial toxicity
NNEs Neural network ensembles
EVA Normal coordinate eigenvalue
BIO Oral bioavailability
123
-
8/2/2019 Genetic Algorithm Optimization in Drug Design QSAR Bayesian-Regularized Genetic Neural Networks (BRGNN) and
2/21
270 Mol Divers (2011) 15:269289
PLS Partial least squares
P-gp P-glycoprotein
PCC Physicochemical composition
MT+ Positive mitochondrial toxicity
PC-GA-ANN Principal component-genetic
algorithm-artificial neural network
PCs Principal components
PPR Projection pursuit regressionQSAR Quantitative structureactivity relationship
QSPR Quantitative structureproperty relationship
RBF Radial Basic Function
SOMs Self-organized maps
SR Stochastic resonance
SVMs Support vector machines
Trb1 Thyroid hormone receptor b1
Tdp Torsades de pointes
VKCs Voltage-gated potassium channels
Introduction
One of the main challenges in todays drug design is the
discovery of new biologically active compounds on the basis
of previously synthesized molecules. Quantitativestructure
activity relationship (QSAR) is an indirect ligand-based
approach which models the effect of structural features on
biological activity. This knowledge is then employed to
propose new compounds with enhanced activity and selec-
tivity profile for a specific therapeutic target [1]. QSAR
methods are based entirely on experimental structureactiv-
ity relationships for enzyme inhibitor or receptor ligands. In
comparison to direct receptor-based methods, which include
molecular docking and advanced molecular dynamics simu-
lations, QSAR methods do not strictly require the 3D-struc-
ture of a target enzyme or even a receptoreffector complex.
They are computationally not demanding and allow estab-
lishing an in silico tool from which biological activity of
newly synthesized molecules can be predicted [1].
Three-dimensional-QSAR (3D-QSAR) methods, espe-
cially comparative molecular field analysis (CoMFA) [2]
and Comparative Molecular Similarity Indices Analysis,
(CoMSIA) [3] are nowadays used widely in drug design.
The main advantages of these methods are that they are
applicable to heterogeneous data sets, and they bring a
3D-mappeddescription of favorable andunfavorable interac-
tions, according to physicochemicalproperties. In this sense,
they provide a solid platform for retrospective hypotheses by
means of the interpretation of significant interaction regions.
However, some disadvantages of these methods are related
to the 3D information and alignment of the molecular struc-
tures, since there are uncertainties about different binding
modes of ligands, and uncertainties about the bioactive con-
formations [4].
CoMFA and CoMSIA have emerged as the 3D-QSAR
methods most embraced by the scientific community today;
however, current articles on QSAR encompass the use of
too many forms of the molecular information and statisti-
cal correlation methods. The structures can be described by
physicochemical parameters [5], topological descriptors [6],
quantum chemical descriptors [7], etc. The correlation can
be obtained by linearmethods or nonlinearpredictors such asartificial neural networks (ANNs) [8] and nonlinear support
vector machines (SVMs) [9]. Unlike linear methods (CoM-
FA, CoMSIA, etc), ANNs and SVMs are able to describe
nonlinear relationships, which should bring to a more real-
istic approximation of the structurerelationship paradigm,
since interactionsbetween the ligandand itsbiological target
must be nonlinear.
Two major problems arise when the functional depen-
dence between biological activities and thecomputed molec-
ular descriptor matrix is nonlinear, and when the number of
calculated variable exceeds the number of compounds in the
data set. The nonlinearity problem can be tackled inside anonlinear modeling framework, while the over-dimensional-
ity issue can be handled by implementing a feature selection
routine that determines whichof thedescriptorshavea signif-
icant influence on the activity of a set of compounds. Genetic
algorithm (GA) rather than forward or backward elimination
procedure has been successfully applied for feature selection
in QSAR studies when the dimensionality of the data set is
high and/or the interrelations between variables are convo-
luted [10].
The present review focuses on the application of very
flexible and robust approaches: Bayesian-regularized genetic
neural networks (BRGNNs) and GA-optimized SVM
(GA-SVM) to QSAR modeling in drug design. Biological
activities of low molecular weight compounds and protein
function, class and stability data were modeled to derive reli-
able classifiers with potential use in virtual library screening.
Firstly, we expose a general survey of GA implementation
andapplicationonQSAR drug design. Secondly, wedescribe
the BRGNN and GA-SVM approaches. Finally, we discuss
their applications to model different target-ligand data sets
relevant for drug discovery and also protein function and
stability prediction.
General survey of genetic algorithm implementations
in drug design QSAR
Genetic algorithmsare stochastic optimization methods gov-
erned by biological evolution rules that have been inspired
by evolutionaryprinciples [11]. GA investigates many possi-
ble solutions simultaneously and each one explores different
regions in the parameter space [12]. Firstly, a population of
N individuals is created in which each individual encodes a
123
-
8/2/2019 Genetic Algorithm Optimization in Drug Design QSAR Bayesian-Regularized Genetic Neural Networks (BRGNN) and
3/21
Mol Divers (2011) 15:269289 271
randomly chosen subset of the modeling space and the fit-
ness or cost of each individual in the present generation is
determined. Secondly, parents selected on the basis of their
scaled fitness scores yield a fraction of children of the next
generation by crossover (crossover children) and the rest by
mutation (mutation children). In this way, the new offspring
contains characteristics from its parents. Usually, the rou-
tine is run until a satisfactory rather than the global optimumsolution is achieved. Advantages such as quickly scan a vast
solution set, bad proposals do not effect the end solution
and doesnt have to know any rules of the problem, make
GA very attractive for model optimization in drug discovery
in which every problem is highly particular because of the
lack of previous knowledgeof thefunctional relationshipand
generalization is very difficult.
Chromosome representation
Solving theshortcoming of QSAR analysis such as,selection
of optimum feature subsets, optimization of model parame-ters and also data set manipulation has been the main goal of
GA-based QSAR. Optimization space can include variables
and model parameters. However, since variable selection is
themost commontask, populations havebeen mainlyencode
by binary or integer chromosomes. Binary representation is
very popular due to its easy and straightforward implemen-
tation in which the chromosome is a binary vector having the
same length of main data matrix. Numerical values 1 and 0
represent the inclusion or exclusion of feature in the individ-
ual chromosome, respectively. Models with different dimen-
sionalitycan evolve throughout thesearch process at thesame
time. In this case, the algorithm is highly automatic since no
extra parameters must be set, and the optimum solution is
achieved when a predefined stopping condition is reached.
On the other hand, integer representation is encoded by a
string of integers representing the position of the features in
the whole data matrix. Usually, sizes of feature that vector
encodes in thechromosome arecontrolled according to some
criteria derived from previous knowledge on the modeled
problem. Despite this drawback, algorithm gains efficiency
because inefficient large-dimension models are avoided by
controlling the number of variables during search process.
This aspect is specially important when complex predictors,
given their high tendency to overparametrization/overfitting,
and expensive time-computing, are trained [10]. Model size
can be also controlled in binary GA, but this simple routine
is usually implemented in a very unsupervised way.
Inmany ofGAimplementationsin QSAR studies, individ-
uals in the populations are predictors and training, validation
or/andcrossvalidation errors are the individual fitnessor cost
functions. Different functions have been reported to rank the
individuals in a population depending on the mathematical
model implemented inside the GA framework. The authors
had proposed a variety of fitness functions which are propor-
tional to the residual error of the training set [10,1325], the
validation set [26], or crossvalidation [2730], and combi-
nation of them [3133]. Overfitting has been decreased by
complementing the cost function with terms accounting for
trade-off between number of variables and number of train-
ing cases [34] and/or keeping model complexity as simple as
possible in the searching process [10].
Population generation and ranking of individuals
The first step is to create a gene pool (population of models)
ofNindividuals. Chromosomevalues are randomly initiated,
and the fitness of each individual in this generation is deter-
mined by the fitness function of the model and scaled by
the scaling function. Fitness scaling converts the raw fitness
scores that are returned by the fitness function to values in a
range that is suitable for the selection function. The selection
function uses the scaled fitness values to select the parents
of the next generation. A higher probability of selection toindividualswith higherscaled values is assignedby theselec-
tion function. Controlling the range of the scaled values is
very important because it affects the performance of the GA.
Scaled values varying too widely cause individuals with the
highest scaled values reproduce too rapidly. They take over
the population gene pool too quickly, and prevent the GA
from exploring other areas of the solution space. However,
scaled values varying narrowly cause individuals to have too
similar reproduction chance and the optimization will pro-
gress very slowly. One type of the most used fitness scaling
functions is that of rank-based functions. The position of an
individual in the sorted scores list is its rank. In rank-based
functions scale, the raw scores are based on the rank of each
individual instead of its score. This fitness scaling removes
the effect of the spread of the raw scores [11,12].
Evolution and stopping criteria
Duringevolution,a fractionof children of thenextgeneration
is produced by crossover (crossover children) and the rest by
mutation (mutation children) from the parents. Sexual and
asexual reproductions take place so that the new offspring
contains characteristics from both or one of its parents. In
sexual reproduction, a selection function selects probabilis-
tically two individuals on the basis of their ranking to serve
as parents. An individual can be selected more than once as a
parent, in which case it contributes its genes to more than one
child. Stochastic selection functions, lays out a line in which
each parent corresponds to a section of the line of length
proportional to its scaled value [11,12]. Similarly, roulette
selection chooses parents by simulating a roulette wheel, in
which thearea of thesection of thewheel corresponding to an
individualis proportional to the individualsexpectation.The
123
-
8/2/2019 Genetic Algorithm Optimization in Drug Design QSAR Bayesian-Regularized Genetic Neural Networks (BRGNN) and
4/21
272 Mol Divers (2011) 15:269289
algorithm uses a random number to select one of the sections
with a probability equal to itsarea [11,12].Onthe other hand,
tournament selection chooses each parent by selecting set of
players (individuals) at random and then choosing the best
individual out of that set to be a parent [32]. Then, crossover
of parents performs a random selection of a fraction of its
descriptor set, and a child is constructed by combining these
fragments of genetic code. Finally, the rest of the individ-uals in the new generation are obtained by asexual repro-
duction when parents selected randomly are subjected to a
random mutation of its genes. Reproduction often includes
elitism which protects the fittest individual in any given gen-
eration from crossover or mutation [27]. Finally, stopping
criteria determine what causes the algorithm to terminate.
Most common parameters used to control algorithm flow are
the maximum number of iterations the GA will perform and
themaximum time thealgorithmruns before stopping.Some
implementationsstopa GAif thebestfitnessscoreis less than
or equal to the value of a threshold value; others evaluate the
performance for a number of previously set generations ortime interval, and the algorithm stops if there is no improve-
ment in the best fitness value.
Some applications
GA has been successfully applied to drug design QSAR
to optimize linear and nonlinear predictors. Cho and
Hermsmeier [13] introduced a simple encoding scheme for
chemical features and allocation of compounds in a data set.
They applied GA to simultaneously optimize descriptors and
composition of training and test sets. The method generates
multiple models on subsets of compounds representing clus-
ters with different chemotypes and a molecular similarity
method determined the best model for a given compound in
the test set. The performance on the Selwood data set [35]
was comparable to other published methods.
Hemmateenejad and co-workers [3133] reported semi-
nal study on GA-based QSAR in drug design. They modeled
the calcium channel antagonist activity of a set of nifedi-
pine analogous by GA-optimized multiple linear regression
(MLR) and partial least squares (PLS) regression [31]. Ade-
quate models with low standard errors and high correlation
coefficients werederivedfromtopology, hydrophobicity, and
surface area but PLS had better prediction ability than MLR.
The authors applied a principal componentgenetic algo-
rithmartificial neural network (PCGAANN) procedure to
model activity of another series of nifedipine analogs [32].
Each molecule was encoded by 10 sets of descriptors and
principal component analysis (PCA) was used to compress
the descriptor groups into principal components (PCs). GA
selected the best set of PC to train feed forward ANN. PC
GAANN routine overperformed ANNs trained with top-
rankedPC (PCANN) by yielding betterpredictionalability.
Hemmateenejad et al. [33] reported the application of PC
regression to model structurecarcinogenic activity relation-
ship of drugs. PC correlation ranking and a GA were com-
pared for selecting the best set of PCs for a large data set
containing 735 carcinogenic activities and 1,355descriptors.
Crossvalidation procedure showed that introduction of PCs
by the conventional eigenvalue ranking was outperformed
by correlation ranking and GA with good similar qualityabout 80% accuracy. Thyroid hormone receptor b1 (Trb1)
antagonists are of special interest because of their potential
role in safe therapies for nonthyroid disorders while avoid-
ing the cardiac side effects. Optimum molecular descriptors
selected by GA served as inputs for a projection pursuit
regression (PPR) study yielding accurate models [36]. GA
was also reported to optimize routines of descriptor genera-
tion.Normal coordinateeigenvalue (EVA) structuraldescrip-
tors, based on calculated fundamental molecular vibrational
frequencies are sensitive to 3D structure and additionally
structural superposition is not required [28]. The original
technique involves a standardization method wherein uni-form Gaussians of fixed standard deviation () are used to
smear out frequencies projected onto a linear scale. GA was
used to search for optimal localized values by optimizing
crossvalidated PLS regression scores. Although GA-based
EVA did not improve performance for a benchmark steroid
data set, crossvalidation statistics were 0.25 unit higher than
thesimple EVA approach in thecaseof a more heterogeneous
data set of five structural classes.
A GA-optimized ANN,namedGNW, that simultaneously
optimizes feature selection and node weights, was reported
by Xue and Bajorath [37] for supervised feature ranking.
Interconnected weights were binary encoded as a 16-bit
string chromosome. A primary feature ranking index, defined
as the sum of self-depleted weights and the corresponding
weight adjustments, computed selected relevant features for
some artificial data sets of known feature rankings tested.
GNW outperformed SVM method on three artificial and
matrix metalloproteinase-1 inhibitor data sets [37].
Two-dimensional (2D) representation was chosen to clas-
sify about 500 molecules in seven biological activity classes
using a method based on principal component analysis com-
bined with GA [38]. Scoring functions, which accounted for
number of compounds in pure classes (i.e., compounds with
the same biological activity), singletons, and mixed classes,
identified effective descriptor sets. The results indicated that
combinations of few critical descriptors related to aromatic
character, hydrogen bond acceptors, estimated polar van der
Waals surface area, anda single structural key were preferred
to classifycompoundsaccordingto theirbiologicalactivities.
Kamphausen et al. [39] reported a simplified GA based
on small training sets that runs a small number of gener-
ations. Folding energies of RNA molecules and spinglass
from a multiletter alphabet biopolymers such as peptides
123
-
8/2/2019 Genetic Algorithm Optimization in Drug Design QSAR Bayesian-Regularized Genetic Neural Networks (BRGNN) and
5/21
Mol Divers (2011) 15:269289 273
were optimized. Noteworthy, de novo construction of pep-
tidic thrombin inhibitors, computationally guided by this
approach, resulted in the experimental fitness determination
of only 600 different compounds from a virtual library of
more than 1017 molecules [39].
Caco-2 cell monolayers are widely used systems for
predicting human intestinal absorption and quantitative
structureproperty relationship (QSPR) models of Caco-2 permeability have been widely performed. Yamashita et
al. [34] used a GA-based partial least squares (GA-PLS)
method to predict Caco-2 permeability data using topolog-
ical descriptors. The final PLS model described more than
80% of crossvalidation variance.
In alternative applications, a GA routine based on the the-
ory of stochastic resonance (SR) was reported in which vari-
ables that are related to the bioactivity of a molecule series
were considered as signal and the other non-related features
as noise [40]. The signal was amplified by SR in a nonlinear
system with GA-optimized parameters. The algorithm was
successfully evaluated with the well-known Selwood data set[35]. The relevant variables were enhanced, and their power
spectra were significantly changed and similar to that of the
bioactivity after genetic SR (GSR). The descriptor matrix
continuously became more informative, and the collinear-
ity was suppressed. Then, feature selection was easier and
more efficient and, consequently, QSAR models of the data
set obtained had better performances than previous reported
approaches [40]. Teixido et al. [41] presented another non-
conventional GA to search for peptides that can cross the
bloodbrain barrier (BBB). A genetic meta-algorithm opti-
mized the GA parameters and the approach was validated
by virtual screening of a peptide library of more than 1000
molecules. Chromosomes were populated with chemical
physical properties of peptides instead of aminoacid peptide
sequences and the fitness function was derived from statis-
tical analysis of the experimental data available on peptide-
BBB permeability. The authors stated that GA tuned for a
specific problem cansteer thedesignanddrug discovery pro-
cess and set the stage for evolutionary combinatorial chem-
istry.
Coupling of ANNs and GA in drug QSAR studies was
introduced by So and Karplus [27] by proposing GA-based
ANNs called genetic neural networks (GNNs). After cal-
culating molecular descriptors using different commercially
available software,predictive models weregeneratedby cou-
pling GA feature selection and neural networks function
approximation. The optimum neural networks outperforms
PLS and GA-based MRL models. The authors extended
GNN to 3D-QSAR modeling by exploring similarity matrix
space [42,43]. An early review on this approach [44] reports
its evaluation in several problems such as the Selwood data
set, the benzodiazepine affinity for benzodiazepine/GABAA
receptors, progesterone receptor binding steroids human and
intestinal absorption. Patankar and Jurs also have reported
several QSAR models by hybrid GNN frameworks out-
performing other predictors for the inhibition of acyl-CoA:
cholesterol O-acyltransferase [45], sodium ionproton anti-
porter [46], cyclooxygenase-2 [47], carbonic anhydrase[48],
human type 1 5alpha-reductase [49], and glycine/NMDA
receptor [50]. Another variant of the same hybrid approach
was recently reported by Di Fenza et al. [26] as the firstattempt that combines GA and ANNs for the modeling of
CACO 2 cell apparent permeability. Theoptimum model had
adequate crossvalidation accuracy of 57%, and the selected
descriptors were related to physicochemical characteristics
suchas, hydrophilicity, hydrogen bonding propensity, hydro-
phobicity, and molecular size which are involved in the cel-
lular membrane permeation phenomenon. Ab initio theory
was used to calculate several quantum chemical descriptors
including electrostatic potentials and local charges at each
atom, HOMO and LUMO energies, etc., which were used to
model the solubility of thiazolidine-4-carboxylic acid deriv-
atives by means of theGA-PLS, which yielded relativeerrorsof prediction lower than 4%.
Bayesian-regularized genetic neural networks
In the context of hybrid GA-ANN modeling of biological
interactions, we introduced BRGNNs as a robust nonlinear
modeling techniquethat combines GA andBayesianregular-
ization forneuralnetwork input selectionandsupervised net-
work training, respectively (Fig. 1). This approach attempts
to solve the main weaknesses of neural network modeling:the selection of optimum input variables and the adjustment
of network weights and biases to optimum values for yield-
ing regularized neural network predictors [5052]. By com-
bining the concepts of BRANNs and GAs, BRGNNs were
implemented in such a way that BRANN inputs are selected
insidea GAframework.BRGNN approach is a version of the
So and Karplus article [27] incorporating Bayesian regular-
ization that has been successfully introduced by our group in
drug design QSAR. BRGNN was programmed within Mat-
lab environment [53] using GA [54] and Neural Networks
Toolboxes [55].
Bayesian regularized artificial neural networks
Back-propagation ANNs are data-driven models in the sense
that their adjustable parameters are selected in such a way as
to minimize some network performance function F:
F = MSE =1
N
N
i=1
(yi ti )2 (1)
123
-
8/2/2019 Genetic Algorithm Optimization in Drug Design QSAR Bayesian-Regularized Genetic Neural Networks (BRGNN) and
6/21
274 Mol Divers (2011) 15:269289
Ensemble
averaging
(Optional)
molecular
descriptors pool
Models with R >
threshold value
Best model
(best Q2)
Random splits
Averaged
predictions
GA model
optimization
cross-
validation
Assemblingtest sets
Fig. 1 Flowchart of the BRGNN framework in QSAR studies
In the above equation, MSE is the mean of the sum of
squares of thenetwork errors,Nis thenumberof compounds,
yi is the predicted biological activity of the compound i ,
and ti is the experimental biological activity of the
compound i .
Often, predictors can memorize the training examples,
but it has not learned to generalize to new situations. The
Bayesian framework for ANNs is based on a probabilistic
interpretation of network training to improve generalization
capability of the classical networks. In contrast to conven-
tional network training, where an optimal set of weights
is chosen by minimizing an error function, the Bayesian
approach involves a probability distribution of network
weights. In BRANNs, Bayesian approach yields a posterior
distribution of network parameters, conditional on the train-
ing data, and predictions are expressed in terms of expecta-
tions with respect to this posterior distribution [56,57].
Assuming a set of pairs D = {xi , ti }, where i = 1, . . . , N
is a label running over the pairs, the data set can be modeled
as deviating from this mapping under some additive noiseprocess (vi ):
ti = yi + vi (2)
Ifv is modeled as zero-mean Gaussian noise with stan-
dard deviation v, then, the probability of the data given the
parameters w is:
P(D|w,, M) =1
ZD ()exp ( MSE) (3)
where M is the particular neural network model used, =
1/2v , and the normalization constant is given by ZD() =
(/)N/2. P(D|w,, M) is called the likelihood. Themaxi-
mumlikelihoodparameterswML (the w thatminimizesMSE)
depends sensitively on the details of the noise in the data
[56,57].
For completing the interpolation model, it must be defined
a prior probability distribution which embodies our prior
knowledge on the sort of mappings that are reasonable.
Typically, this is quite a broad distribution, reflecting the fact
that weonly havea vague belief in a range of possible param-
eter values.Once,we haveobserved thedata,Bayestheorem
can be used to update our beliefs, and we obtain the posterior
probability density. As a result, the posterior distribution is
concentratedon a smaller rangeof values than thepriordistri-
bution.Sincea neuralnetwork with largeweights will usually
give rise to a mapping with large curvature, we favor small
values for the network weights. At this point, it is defined a
prior that expresses the sort of smoothness it is expected the
interpolant to have. The model has a prior of the form:
P (w|, M) =1
ZW()exp ( MSE) (4)
where representsthe inverse varianceof thedistributionand
the normalization constant is given by ZW() = (/)N/2.
MSW is the mean of the sum of the squares of the network
weights and is commonly referred to as a regularizing func-
tion [56,57].
Considering the first level of inference, if and are
known, then posterior probability of the parameters w is:
P (w|D,,, M) =P (D|w,, M) P (w|, M)
P (D|,, M)(5)
where P(w|D,,, M) is the posterior probability, that is
theplausibilityof a weightdistribution considering the infor-
mation of the data set in the model used, P(w|, M) is the
123
-
8/2/2019 Genetic Algorithm Optimization in Drug Design QSAR Bayesian-Regularized Genetic Neural Networks (BRGNN) and
7/21
Mol Divers (2011) 15:269289 275
prior density, which represents our knowledgeof theweights
before any data are collected, P(D|w,, M) is the likeli-
hood function, which is the probability of the data occurring,
given the weights and P(D|,, M) is a normalization fac-
tor, which guarantees that the total probability is 1.
Considering that thenoise in the training setdata is Gauss-
ian and that theprior distribution for the weights is Gaussian,
the posterior probability fulfills the relation:
P (w|D,,, M) =1
ZFexp(F) (6)
where ZF depends of objective function parameters. There-
fore, under this framework, minimization ofFis identical to
find the (locally) most probable parameters.
In short, Bayesian regularization involves modifying the
performance function (F) defined in Eq. 1, which is possi-
bly improving generalization by adding an additional term
that regularizes the weights by penalizing overly large mag-
nitudes.
F = MSE + MSW (7)
The relative size of the objective function parameters
and dictates the emphasis for getting a smoother net-
work response. MacKays Bayesian framework automati-
cally adapts the regularization parameters to maximize the
evidence of the training data [56,57]. BRANNs were first
and broadly applied to model biological activities by Burden
and Winkler [51,52].
Genetic algorithm implementation in BRANN feature
selection
A string of integers means the numbering of the rows in
the all-descriptors matrix that will be tested as BRANN
inputs (Fig. 2). Each individual encodes the same number
of descriptors; the descriptors are randomly chosen from a
common data matrix, and in a way such that (1) no two indi-
viduals can have exactly the same set of descriptors and (2)
all descriptors in a given individual must be different. The
fitness of each individual in this generation is determined by
the training mean square error (MSE) of the model, and a top
scaling function which scaled a top fraction of the individu-
als in a population equally; these individuals have the same
probability to be reproduced while the rest are assigned the
value 0. As it is depicted in Fig. 2, children are sexually cre-
ated by single point crossover from father chromosomes and
asexually by mutating one gene in the chromosome of a sin-
gle father. Similar to So and Karplus [27], we also included
elitism thus the genetic content of the best-fitted individual
simply moves on to the next generation intact. The reproduc-
tive cycle is continued until 90% of the generations showed
the same target fitness score (Fig. 3).
Contrary to other GA-based approach, the objective of
the algorithm is not to obtain a sole optimum model but a
reduced population of well-fitted models, with MSE lower
than a threshold MSE value, which the Bayesian regular-
ization guarantees to posses good generalization capabili-
ties (Fig. 3). This is because we used MSE of data training
fitting instead of crossvalidation, or test-set MSE values as
cost function, and, therefore, the optimum model cannot bedirectly derived from the best-fitted model yielded by the
genetic search. However, from crossvalidation experiments
over thesubpopulation of well-fitted models, it canderive the
bestgeneralizable network withthe highest predictive power.
This process also avoids chance correlations. This approach
has shown to be highly efficient in comparison with cross-
validation-based GA approach, since only optimum models,
according to the Bayesian regularization, are crossvalidated
at the end of the routine, and not all the models generated
throughout the searching process.
Genetic algorithm-optimized support vector machines
(GA-SVM)
Support vector machine (SVM) is a machine learning
method, whichhasbeenusedformanykindsofpattern recog-
nition problems [58]. Contrary to BRANN framework that is
not in so much of widespread use, SVM becomes a very pop-
ular pattern recognition technique. Since there are excellent
introductions to SVMs [58,59], only the main idea of SVMs
applied to pattern classification problems is statedhere. First,
the input vectors aremappedinto onefeature space (possible,
with a higher dimension). Second, a hyperplane which can
separate two classes is constructed within this feature space.
Only relatively low-dimensional vectors in the input space
and matrix products in the feature space will be involved in
themapping function.SVMwas designed to minimize struc-
tural risk whereas previous techniques were usually based on
minimization of empirical risk. SVM is usually less vulner-
able to the overfitting problem, and it can deal with a large
number of features.
The mapping into the feature space is performed by a
kernel function. There are several parameters in the SVM,
including the kernel function and regularization parameter.
Thekernelfunction andits specific parameters, together with
regularization parameter, cannot be set from the optimiza-
tion problem but have to be tuned by the user. These can
be optimized by the use of VapnikChervonenkis bounds,
crossvalidation, an independent optimization set, or Bayes-
ian learning. In the articles from our group, the Radial Basic
Function (RBF) was used as kernel function.
For nonlinear SVM models, we used also the GA-based
optimization of kernel regularization parameter Cand width
of an RBF kernel 2 as suggested by Frhlich et al. [60]. We
123
-
8/2/2019 Genetic Algorithm Optimization in Drug Design QSAR Bayesian-Regularized Genetic Neural Networks (BRGNN) and
8/21
276 Mol Divers (2011) 15:269289
Fig. 2 Flow diagram of the
strategy for the genetic
algorithm implemented in the
BRGNNs
simply concatenated a representation of the parameter to our
existing chromosome. That means we are trying to select an
optimal feature subset andan optimal Cat thesametime.This
is reasonable, because the choice of the parameter is influ-
enced by the feature subset taken into account andvice versa.
Usually, it is not necessary to consider any arbitrary value of,
but only certain discrete values with the form: n 10k, where
n = 1, . . . , 9 and k = 3, . . . ,4. Therefore, these values
can be calculated by randomly generating n and k values as
integers between (1, . . . ,9) and (3, . . . ,4), respectively.
In a similar way, we used GA to optimize the width of an
RBF kernel, but in this case, n and k values were integers
between (1, . . . ,9) and (2, . . . ,1). Then, our chromosome
was concatenated with another gene with discrete values in
the interval (0.00190,000) for encoding the C parameter,
and similarly the width of the RBF kernel was encoded in a
gene containing discrete values ranging in the interval (0.01
90). In other articles, feature and hyperparameter genes were
123
-
8/2/2019 Genetic Algorithm Optimization in Drug Design QSAR Bayesian-Regularized Genetic Neural Networks (BRGNN) and
9/21
Mol Divers (2011) 15:269289 277
Fig. 3 Reproduction procedure in the BRGNN implementation
concatenated in the chromosomes and encoded as bit string;
however,evolution wasdriven usingsimilar crossover,muta-
tion, and selection operators according to fitness functions
accounting for crossvalidation accuracies [6163].
Data subsets are created, subsets are generated in the
crossvalidation process for training the SVM, and another
subset is then predicted. This process is repeated until all
subsets have been predicted. A venetian-blind method was
used for creating the data subsets. In the first place, data set is
sorted according to the dependent variable, and in thesecond
step, thecases areadded consecutively to each subset, in such
a way that they become representative samples of the whole
data set. The GA routine minimized the regression MSE and
the misclassification percent of crossvalidation experiment.
The GA-SVM implemented in our articles is a version
of the GA by Caballero and Fernandez [10] but incorporat-
ing SVMhyperparameter optimization thatwasprogrammed
within theMatlabenvironment [53] using libSVMlibrary for
Matlab by Chang and Lin [64].
A few other authors [6163] represented features of chro-
mosomes as bit strings, but SVMparameters were optimized
by Conjugated Gradient (CG) method during models fitness
evaluation. The crossover andmutation rates were set to ade-
quate values according to preliminary experiments, and evo-
lution was stopped when the number of generations reached
a preset maximum value, or when the fitness value remained
constant or nearly constant for a maximum number of gen-
erations [6163].
Models validation
Traditionally, meaningful assessment of statistical fit of a
QSAR model consists of predicting some removed propor-
tion of the data set. The whole data are randomly split into a
number of disjointed crossvalidation subsets. One from each
of these subsets is left out in turn, and the remaining com-
plement of data is used to make a partial model. The samplesin the left-out data are then used to perform predictions. At
the end of this process, there are predictions for all data in
the training set, made up from the predictions originating
from the resulting partial models. All partial models are then
assessedagainst thesameperformance criteria, anddecisions
are made on the basis of the consistency of the assessment
results. The more-often-used crossvalidation method is the
leave-one-out crossvalidation method, when all crossvalida-
tion subsets consist of only one data point each.
In addition to assessment of statistical fit by crossvalida-
tion, randomization of the modeled property (also known
as Y-randomization) have also evaluated model robust-ness [21,24,27,65,66]. Undesirable chance correlations can
be achieved as result of exhaustive GA searches. So and
Karplus et al. [27] proposed the evaluation of crossvalida-
tion performance on several scrambled data sets. The posi-
tion of the dependent variable (modeled property) for every
case along the data set is randomized several times, and Q2
is calculated. The absence of chance correlation is proved
when no Q2 > 0.5 appear during the test [27].
The accuracy of crossvalidation results is extensively
accepted in the literature considering the Q2 value. In this
sense, a high valueof thestatistical characteristic (Q2 > 0.5)
is considered as proof of the high predictive ability of the
model. However, a high value of Q2 appears to be a nec-
essary but not sufficient condition for the model to have a
high predictive power, and the predictive ability of a QSAR
model canonly be estimatedusing a sufficiently large collec-
tion of compounds that was not used for building the model
[65,66]. In this sense, the data set can be divided into training
and validation (or test) partitions. For the given partitioning,
a model is constructed only from the samples of the training
set. At this point, an important step is the generation of these
partitions. Quite a few methods have been used, such as ran-
dom selection,activity-ranked binning, and sphere exclusion
algorithms [65,66]. Various forms of neural networks have
also been employed in theselection of trainingsets, including
Kohonen neural networks [19].
Undoubtedly, external validation is a way to establish the
reliability of a QSAR model. However, the majority of stud-
ies that are validated by external predictions are based on a
singlevalidation set; this maycause thepredictors to perform
well on a particular external set, but there is no guarantee that
the same results may be achieved on another. For example,
it can happen that several outliers, by pure coincidence, are
123
-
8/2/2019 Genetic Algorithm Optimization in Drug Design QSAR Bayesian-Regularized Genetic Neural Networks (BRGNN) and
10/21
278 Mol Divers (2011) 15:269289
out of the test set, in which case, the validation error will be
small even though the training error was high. The ensemble
solution has been proposed for originating multiple valida-
tion sets [67]. An ensemble is a collection of predictors that,
as a whole, provides a prediction which is a combination of
the individual ones. If there is disagreement among those
predictors, then very reliable models can be obtained, since
a further decrease in generalization error can be achieved.Another trait to take into account for the ensemble applica-
tion is theaverage error of ensemble members; with this trait,
when decreasing the error for each individual, the ensemble
gets a smaller generalization error [67].
In BRGNN-relatedstudies, the predictive power wasmea-
sured taking into account R2 and root MSE values of the
averaged test sets of BRGNN ensembles having an optimum
numberof members [15,18,19,21,24,68,69]. For generating
thepredictors that will be averaged, thewhole data was parti-
tioned into several training and test sets. The assembled pre-
dictors aggregate their outputs to produce a singleprediction.
In this way, instead of predicting a sole randomly selectedexternal set, the result of averaging several ones was pre-
dicted. Each case was predicted several times forming train-
ing and test sets, and an average of both values was reported.
Data sets: sources and general prior preparation
Biological activity measurements were taken as affinity con-
stants (Ki) or ligand concentrations for the 50% (IC50) or90% (IC90) inhibition of the targets (Table 1). For model-
ing, IC50 and IC90 were converted in logarithmic activities;
(pIC50 and pIC90) are measurements of drug effectiveness
which is the functional strength of the ligand toward the tar-
get. For classification problems, data were labeled according
to some convenient threshold.
In our articles, prior to molecular descriptor calcula-
tions, 3D structures of the studied compounds (Fig. 4) were
geometrically optimized using the semi-empirical quantum-
chemical methods implemented in the MOPAC 6.0 com-
puter software by Frank J. Seiler Research Laboratory [70].
The articles in Table 1 included QSAR modeling of can-
cer therapy targets [19,20,23,25,7173], HIV target [22],
Table 1 Data set details and statistics of the optimum models reported by BRGNN modeling
Dataset category Target name or
biological
activity/function
Descriptor type Data size Number of
optimum
variables
Validation
accuracy (%)
Ref.
Cancer Farnesyl protein
transferase
3D 78 7 70 [25]
Matrix
metalloproteinase
2D 30a 6 70a [23]
2D 6368b 7 80b [72]Cyclin-dependent kinase 2D 98 6 65 [19]
LHRH(non-peptide) 2D 128 8 75 [20]
LHRH (erythromycin A
analogs)
Quantum chemical 38 4 70 [71]
HIV HIV-1 protease 2D 55 4 70 [22]
Cardiac dysfunction Potassium channel 2D 29 3 91 [16]
Calcium channel 2D 60 5 65 [17]
Alzheimers disease Acetylcholinesterase
inhibition (tacrine
analogs)
3D 136 7 74 [21]
Acetylcholinesterase
inhibition (huprine
analogs)
3D 41 6 84 [24]
Antifungal Candida albicans 3D 96 16 87 [10]
Antiprotozoan Cruzain 2D 46 5 75 [18]
Protein conformational
stability
Human lysozyme 2D 123 10 68 [68]
Gene V protein 2D 123 10 66 [69]
Chymotrypsin
inhibitor 2
3D 95 10 72 [15]
a Average values of five models for MMP-1, MMP-2, MMP-3, MMP-9 and MMP-13 matrix metalloproteinasesb Average values of five models
for MMP-1, MMP-9 and MMP-13 matrix metalloproteinases
123
-
8/2/2019 Genetic Algorithm Optimization in Drug Design QSAR Bayesian-Regularized Genetic Neural Networks (BRGNN) and
11/21
-
8/2/2019 Genetic Algorithm Optimization in Drug Design QSAR Bayesian-Regularized Genetic Neural Networks (BRGNN) and
12/21
280 Mol Divers (2011) 15:269289
Alzheimers disease target [21,24], ion channel blockers
[16,17], antifungals [10], antiprotozoan target [18], ionchan-
nel proteins [29], ghrelin receptor [30] and protein con-
formational stability [15,68,69]. Dragon computer software
[74] was used for generating the majority of the feature
vectors for low weight compounds. Four types of molecu-
lar descriptors (according to Dragon software classification)
were used: zero-dimensional (0D), one-dimensional (1D),two-dimensional (2D), three-dimensional (3D). When 2D
topological representation of molecules was used, spatial
lag was varied from 1 to 8. Four atomic properties (atomic
masses, atomic van der Waals volumes, atomic Sander-
son electronegativities, and atomic polarizabilities) weighted
both, 2D and 3D molecular graphs. In some biological sys-
tems, it was suitable to use quantum-chemical descriptors
which were calculated from output files of the semi-empiri-
cal geometrical optimizations.
In the pharmacokinetic and pharmacodynamic proper-
ties, including absorption, distribution, metabolism, excre-
tion, and toxicity (ADMET) studies using GA-optimizedSVMs, several properties were modeled such as: identifi-
cation of P-glycoprotein substrates and nonsubstrates (P-
gp) [61], prediction of human intestinal absorption (HIA)
[61], prediction of compounds inducing torsades de poin-
tes (Tdp) [61], prediction of BBB penetration [61], human
plasma protein binding rate (PPBR) [62], oral bioavailabil-
ity (BIO) [62], and induced mitochondrial toxicity (MT)
[63]. All the structures of the compounds were generated
and then optimized by using Cerius2 program package (Ce-
rius2, version 4.10) [75]. The authors manually inspected
the 3D structure of each compound to ensure that each
molecule was properly represented and molecular descrip-
tors were computed using the online application PCLINET
[76].
Feature spaces for peptides and proteins in [68] and [69]
were computed using in-house software PROTMETRICS
[77]. Different sets of protein feature vectors were computed
on thesequences [68,69] andcrystal structures [15] weighted
by 48 amino acid/residue properties from AAindex database
[78].
In general, descriptors that were constant or almost con-
stant were eliminated, and pairs of variables with a square
correlation coefficient greater than 0.9 were classified as
intercorrelated, and only oneof these was included for build-
ing the model. Finally, high dimension data matrices were
obtained. Feature subspaces in such matrices were explored
searching for lower dimensional combination of vectors
that derive optimum nonlinear model throughout BRGNN or
GA-SVM techniques. Afterward, in some applications, opti-
mum feature vectors were used for unsupervised training of
competitive neurons to build self-organized maps (SOMs)
[79] for the qualitative analysis of optimum chemical sub-
space distributions at different activity levels.
Application of BRGNN and GA-SVM to ligandtarget
data sets
ADMET modeling
GA-optimized SVMs had been applied at the early stage of
drug discovery to predict pharmacokinetic and pharmacody-
namicproperties, includingADMET [6163]. An interestingSVM method that combined GA for feature selection and
CG method for parameter optimization (GA-CG-SVM),was
reported to predict PPBR and BIO [62]. A general imple-
mentation of this framework is described later. For each
individual, features chromosomes were represented as bit
strings but SVM parameters were optimized by CG method
during models fitness evaluation. The crossover and muta-
tion rates were set to 0.8 and 0.05, respectively. Evolu-
tion was stopped when number of generations equal 500 or
with fitness value remaining constant or nearly constant for
the last 50 generations. This approach yielded, an optimum
29-variables model for the PPBR of 692 compounds withprediction accuracies of 86 and 81% for five-fold crossvali-
dationand the independent test set (161 compounds), respec-
tively. At the same time, an optimum 25-variables model
for the BIO data set including 690 compounds in the train-
ing set and 76 compounds in an independent validation set,
had prediction accuracies of 80 and 86% for the training set
five-fold crossvalidation and the independent test set, respec-
tively [62]. The descriptors selected by GA-CG method cov-
ered a large range of molecular properties which imply that
the PPBR and BIO of a drug might be affected by many
complicated factors. The authors claimed that PPBR and
BIO predictors overcame previous models in the literature
[62].
Drug-induced MT have been one of the key reasons for
drugs failing to enter into or being withdrawn from mar-
ket [80]. That is why MT has became an important test in
ADMET studies. The hybrid GA-CG-SVM approach was
also applied to predict MT using a collected data set of
288 compounds, including 171 MT+ and 117 MT [63].
Data set was randomly divided into training set (253 com-
pounds) and test set (35 compounds). Bit string represen-
tation of feature chromosome was used. Populations were
evolved according to crossover and mutation rates of 0.5
and 0.1, respectively. The algorithm was stopped when the
generation number reaches 200 or the fitness value does
not improve during the last 10 generations [63]. Accuracies
for five-fold crossvalidation and the test set were about
85 and 77%, respectively. A total of 27 optimum molecu-
lar descriptors were selected, which were roughly grouped
into five categories: molecular weight-related descriptors,
van der Waals volume-related descriptors, electronegativi-
ties, molecular structural information, shape,andother phys-
icochemical properties-related descriptors. This descriptor
123
-
8/2/2019 Genetic Algorithm Optimization in Drug Design QSAR Bayesian-Regularized Genetic Neural Networks (BRGNN) and
13/21
Mol Divers (2011) 15:269289 281
Table 2 Data set details and statistics of the optimum models reported by GA-SVM modeling
Dataset category Target name or
biological
activity/function
Descriptor type Data size Number of optimum
variables
Validation accuracy
(%)
Ref.
ADMET Human plasma protein
binding rate (PPBR)
0D, 1D, 2D and 3Da 853 29 81 [63]
Oral bioavailability
(BIO)
0D, 1D, 2D and 3D 766 25 86
Mitochondrial toxicity
(MT)
0D, 1D, 2D and 3D 288 27 77 [64]
P-glycoprotein substrates
and nonsubstrates
(P-gp)
0D, 1D, 2D and 3D 201 8 85 [62]
Human intestinal
absorption (HIA)
0D, 1D, 2D and 3D 196 25 87
Induction of torsades de
pointes (Tdp)
0D, 1D, 2D and 3D 361 17 86
Blood-brain-barrier
(BBB) penetration
0D, 1D, 2D and 3D 3,941 169 91
593 24 97
Cancer Apoptosis 0D, 1D, 2D and 3D 43 7 92 [72]
Aqueous solubility LogS Structural, atom type,electrotopological
1,342 9 90 [95]
Log P Structural, atom type,
electrotopological
10,782 14 82
Protein function/ class Folding class Sequence features and
order
204,277498 700 90 [102]
Subcellular location Physicochemical
composition
504 33 56 [103]
703 28 72
Proteinprotein
complexes
Physicochemical atomic
properties
172,345 30 90 [104]
Voltage-gated K+
channelb
2D 100 3 85 [29]
Ghrelin receptor 2D 23 2 93 [30]
a Descriptor classification according to Dragon software[74]b Average over three physiological variable models
diversity pointed out the high complexity of MT mechanism
[63].
The same methodology was successfully applied to other
ADMET-related properties [61]. Identification of P-gp sub-
stratesandnonsubstratesyieldedaneight-inputmodelexplain-
ing85%ofcrossvalidationvariance.PredictionofHIAyieldeda 25-inputmodelexplaining87%of crossvalidation variance.
Prediction of compounds inducing Tdp yielded a 17-input
modelexplaining86%ofcrossvalidation variance. Prediction
of BBB penetration that yielded two models, 169-input and
24-input models explaining more than 91 and 94% of cross-
validation variance, respectively [61] (Table 2). The authors
above cited claimed that the optimum models significantly
improveoverallpredictionaccuracy andhave fewer inputfea-
tures in comparison to theprevious reported models [61].
Anticancer targets
Cancer is characterized by uncontrolled proliferative growth
and the spread of aberrant cells from their site of origin. Most
anticancer agents exert their therapeutic action by damaging
DNA, blocking DNA synthesis, altering tubulin polymeriza-tiondepolymerization, or disrupting the hormonal stimula-
tion of cell growth [81]. Recent findings on the underlying
genetic changes related to the cancerous state have aroused
interest toward novel mechanistic targets.
Computer-aided development of cancer therapeutics has
taken on newdimensionssince modern biologicaltechniques
openthe wayleading tomechanismand structureunderstand-
ingof key cellularprocessesat theprotein level. In thecontext
of cancer therapy targets, BRGNN have been employed to
123
-
8/2/2019 Genetic Algorithm Optimization in Drug Design QSAR Bayesian-Regularized Genetic Neural Networks (BRGNN) and
14/21
282 Mol Divers (2011) 15:269289
predict inhibition of farnesyl protein transferase [25], matrix
metalloproteinase (MMP) [23,70], cyclin-dependent kinase
[19], and antagonist activity for the luteinizing hormone
releasing hormone (LHRH) receptor [20,69]. Results from
BRGNN modeling of four cancer-target data sets appear in
Table 1. Numbers of selected features varied according to
the size and variability of each data set. The selected fea-
tures correspond to the molecular descriptors which bestdescribed the affinity of the ligands toward the targets. Mod-
els were validated by crossvalidation or/and test set predic-
tion. Validation accuracies were higher than 65% for all data
sets.
Two-dimensional molecular descriptors were used for
BRGNN modeling of the activity toward cancer targets
of several chemotypes in Fig. 4 such as 1H-pyrazolo[3,4-
d]pyrimidine derivatives(1 and 2) ascyclin-dependentkinase
inhibitors; heterocyclic compounds as LHRH agonists; and
thieno[2,3-b]pyridine-4-ones (3), thieno[2,3-d]pyrimidine-
2,4-diones (4), imidazo[1,2-a]pyrimidin-5-ones (5), benz-
imidazole derivatives (6 and 7), N-hydroxy-2-[(phenylsulfo-nyl)amino]acetamide derivatives (8 and 9) and
N-hydroxy--phenylsulfonylacetamide derivatives (10 and
11) as inhibitors of the MMP family.
On the other hand, thiol (12) and non-thiol (13) inhibitors
of farnesyl protein transferase in Fig. 4 were modeled by 3D-
descriptors which encoded distributions of atomic properties
on the tridimensional molecular spaces [25]. Knowledge of
the binding mode was available for this target; thus, ligand
molecules were conveniently aligned to crystal structure of
an inhibitor in binding site. 3D encoding of molecules is
more realistic than 2D approximation but conformation var-
iability could introduce some undesirable noise in the data.
Consequently, 2D descriptors tends to achieve better per-
formance when the system lacks binding mode information
or/and when the target is promiscuous and the ligands bind
in different conformations.
It is worthy to note that BRGNNs trained with chemi-
cal quantum descriptors from 11,12-cyclic carbamate deriv-
atives of 6-O-methylerythromycin A (14) in Fig. 4 predicted
LHRH antagonist activitywith 70%accuracy[69]. Chemical
quantum descriptors onlyencoded informationrelative to the
electronic states of the molecules rather than distribution of
chemical groups on the structure. The structural homogene-
ity of the macrolides in this data set suggests a well-defined
electronic pattern that was successfully recognized by the
networks after supervised training.
Unwanted, defective, or damaged cells are rapidly and
selectively eliminated from the body by the innate mecha-
nism called apoptosis, or programmed cell death. Resistant
tumor cells evade the action of anticancer agents by increas-
ing their apoptotic threshold [82,83]. This has triggered the
interest in novel chemical compounds capable of induc-
ing apoptosis in chemo/immunoresistant tumor cells. There-
fore, apoptosis has received a huge attention in recent years
[82,83]. The induction of apoptosis by a total of 43 4-aryl-
4-H-chromenes (15) in Fig. 4 was predicted by chemomet-
rics methods using molecular descriptors calculated from the
molecular structure [71]. GA and stepwise multiple linear
regression were applied to feature selection for SVM, ANN,
and MLR training. Nevertheless, GA was implemented
inside the linear framework, and then selected descriptorswere used for SVM and ANN training. The optimum 7-var-
iable SVM predictor superseded ANN and MLR as well as
previous reported models, showing correlation coefficients
of 0.950 and 0.924 for training and test set, respectively, with
crossvalidation accuracy of about 70% [71].
Acetylcholinesterase inhibition
Theneurodegenerative Alzheimers disease (AD) is a degen-
erative disorder characterized by a progressive impairment
of cognitive function which seems to be associated with
deposition of amyloid protein and neuronal loss, as well aswith altered neurotransmission in the brain. Neurodegen-
eration in AD patients is mainly attributed to the loss of
the basal forebrain cholinergic system that it is thought to
play a central role in producing the cognitive impairments
[84]. Therefore, enhancement of cholinergic transmission
has been regarded as one of the most promising methods
for treating AD patients.
BRGNN models of acetylcholinesterase inhibition by
huprine- and tacrine-like inhibitors were reported. For ana-
logs of tacrine (16) [21] and huprine (17) [24], in Fig. 4 GA
exploreda wide pool of 3Ddescriptors.The predictivecapac-
ity of the selected model was evaluated by averaging multi-
ple validation sets generated as members of neural network
ensembles (NNEs). Thetacrines model showedadequate test
accuracy about 71% [21] (Table 1). Likewise, huprine ana-
logs data set was also evaluated by NNEs averaging showing
a optimum high accuracy of 85% when 40 networks were
assembled [24]. The higher accuracy yielded for the hup-
rine analogs in comparison to the tacrine analogs would be
related to the higher structural variability of tacrine data set.
This fact contributed to the 30% of prediction uncertainty
of the affinity of tacrine analogs. In this connection, tacrine-
like inhibitors had been found experimentally to bind acetyl-
cholinesterase in different binding modes at the active site
and also at peripheral sites [85,86].
HIV-1 protease inhibition
A numberof targets for potential chemotherapeutic interven-
tion of the human HIV-1 are provided by the retrovirus life
cycle. The protease-mediated transformation fromthe imma-
ture, non-dangerous virion, to the mature, infective virus is
a crucial stages in the HIV-1 life cycle. HIV-1 protease has
123
-
8/2/2019 Genetic Algorithm Optimization in Drug Design QSAR Bayesian-Regularized Genetic Neural Networks (BRGNN) and
15/21
Mol Divers (2011) 15:269289 283
thus becomea major targetfor anti-AIDSdrug design, andits
inhibition has been shown to extend the length and improve
the quality of life of AIDS patients [87]. A large number
of inhibitors have been designed, synthesized, and assayed,
and several HIV-1 protease inhibitors are now utilized in the
treatment of AIDS [8790].
Cyclic urea derivatives (18) in Fig. 4 are among the most
successful candidates for AIDS targeting, and BRGNN wassuccessfully applied to model the activities of a set of such
compounds toward HIV-1 protease [22]. 2D encoding was
used to avoid conformational noise in the feature chemical
space and the optimum BRGNN model accurately predicted
IC50 values with 70%accuracy in validation test for55 cyclic
urea derivatives (Table 1). Despite the feature, the space was
only 2D dependent, and the problem was accurately solved
by the nonlinear approach. Inhibitory activity variations due
to differential chemical substitutions at the cyclic urea scaf-
fold were learned by the networks and the activity of new
compounds were adequately predicted.
Potassium-channel and calcium entry blocker activities
K+ channels constitute a remarkably diverse family of mem-
brane-spanning proteins that have a wide range of functions
in electrically excitable and unexcitable cells. One important
class opens in response to a calcium concentration increase
within thecytosol.Pharmacological andelectrophysiological
evidence and, more recently, structural evidence from clon-
ing studies, have established that there exist several kinds of
Ca2+-activated K+ channels [91,92].
Several compounds have been shown to block the IKCa-
mediated Ca2+-activated K+ permeability in red blood cells
[93]. A model of the selective inhibition of the intermediate
conductance in Ca2+-activated K+ channel by some clotrim-
azole analogs (19, 20) in Fig. 4 was developed by BRGNNs
[16]. Substitutions around triarylmethane scaffold yielded
a differential inhibition of the K+ channel by triarylme-
thane analogs that were encoded in 2D descriptors. BRGNN
approach yielded a remarkable accurate model describing
more than 90% of data variance in validation experiments.
Interactions with the ion channel were encoded in topolog-
ical charge variables, and the homogeneity of the data set
assures a very high prediction accuracy. The SOM map of
blockers depicted a very good behavior of the optimum fea-
tures for unsupervised differentiation of inhibitors at activity
levels [16].
Similarly, a BRGNN model of calcium entry blockers
with myocardial activity (negative ionotropic activity) was
reported [17]. Taking into account the lack of information
about active conformations and mechanism of action of
dilthiazen analogs (2123) in Fig. 4 as cardiac malfunction
drugs, structural information was encoded in 2D topologi-
cal autocorrelation vectors. Remarkably, optimum BRGNN
model exhibited adequate accuracy of about 65% [17]. The
complexity of the cellular cardiac response, a multifactor
event where several interactions such as membrane trespass-
ing and receptor interactions are taking place, accounts for
this discrete but adequate performance.
Antifungal activity
None of the existing systemic antifungals satisfies the med-
ical need completely; there are weaknesses in spectrum,
potency, safety, and pharmacokinetic properties [10]. Few
substances have been discovered that exert an inhibitory
effect on the fungi pathogenic for humans and most of these
are relatively toxic. BRGNN methodology was applied to a
data set of antifungal heterocyclic ring derivatives in Fig. 4:
(2,5,6- trisubstituted benzoxazoles; 2,5-disubstituted benz-
imidazoles; 2-substituted benzothiazoles; and 2-substituted
oxazolo(4,5-b)pyridines) (24 and 25) [10].
A comparative analysis using MLR and BRGNNs wascarried out to correlate the inhibitory activity against Can-
dida albicans (log(1/C)) with 3D descriptors encoding the
chemical structures of the heterocyclic compounds [10].
Beyond the improvement of training set fitting, BRGNN
outperformed multiple linear regression describing 87% of
test set variance. The antifungal nonlinear models showed
that the distribution of van der Waals atomic volumes and
atomic masses have a large influence on the antifungal activ-
ities of the compounds studied. Also, the BRGNN model
included the influence of atomic polarizability that could be
associated with the capacity of the antifungal compounds to
be deformed when interacting with biological macromole-
cules [10].
Antiprotozoan activity
Trypanosoma cruzi, a parasitic protozoan, is the causative
agent of the Chagas disease or American trypanosomiasis,
one of the most threatening endemics in Central and South
America. The primary cysteine protease of Trypanosoma
cruzi, cruzain, is expressed throughout the life cycle and is
essential for the survival of the parasite within host cells
[94]. Thus, inhibiting cruzain has become interesting for the
development of potential therapeutics for the treatmentof the
Chagas disease.
The Ki values ofa setof 46ketone-based cruzain inhibitors
(26 and 27) in Fig. 4 against cruzain was successfully mod-
eled by means of data-diverse ensembles of BRGNNs using
2D moleculardescriptors with accuracy about 75%[18]. The
BRGNNs outperformed GA-optimized PLS model suggest-
ing that the functional dependence between affinity and the
inhibitors topological structure has a strong nonlinear com-
ponent. The unsupervised training of SOM maps with opti-
123
-
8/2/2019 Genetic Algorithm Optimization in Drug Design QSAR Bayesian-Regularized Genetic Neural Networks (BRGNN) and
16/21
284 Mol Divers (2011) 15:269289
mumfeature vectorsdepictedhighandlowinhibitoryactivity
levels that matched well with data set activity profiles.
Aqueous solubility
The aqueous solubility (logS) and lipophilicity (log P) are
very important properties to be evaluated in drug design pro-
cess. Zhang etal. [95] reported SVMclassifiers considering athree-classschemefor these twoproperties.Theyapplied GA
for feature selection and CG method for parameter optimiza-
tion. Two data sets with 1,342 and 10,782 compounds were
used to generate logS and logP models. The chromosome
was represented as bit string, and simple mutation and cross-
over operators were used to create the individuals in the new
generations. Five-fold crossvalidation accuracy was used as
the fitness function to evaluate the quality of individuals to
be allowed to reproduce or survive to the next generation.
A roulette wheel algorithm selected the chromosomes for
crossover to produce offspring, and the swapping positions
were randomly created with crossover and mutation rates of0.5 and 0.1, respectively [95]. The overall prediction accura-
cies for logS were 87 and 90% for training set and test set,
respectively. Similarly, the overall prediction accuracies for
logP are 81.0 and 82.0% for training set and test set, respec-
tively. The prediction accuracies of the two-class models of
logs and logP were higher than three-class models, and GA
feature selection had a significant impact on the quality of
classification [95].
Protein function/classstructure relationships
Functional variations induced by mutations are the main
causes of several genetic pathologies and syndromes. Due to
the availability of functional variation data on mutations of
several proteins and other protein functional/structural data,
it is possible to use supervised learning to model protein
function/property relationships [29,30,70,71,96105]. GA-
SVM regression and binary classification were carried out
to predict functional properties of ghrelin receptor mutants
[30] and voltage-gated K+ channel proteins [29]. Structural
information was encoded in 2D descriptors calculated from
the protein sequences. Regression and classification tasks
were properly attained with accuracies of about 93 and 85%,
respectively (Table 2). The optimum model of the consti-
tutive activity of ghrelin receptor was remarkable accurate
depending on only two descriptors.
A novel 3D pseudo-folding graph representation of pro-
tein sequences inside a magic dodecahedron was used to
classifyvoltage-gatedpotassiumchannels (VKCs) according
to thesignsof threeelectrophysiological variables: activation
threshold voltage, half-activation voltage, and half-inacti-
vation voltage [29]. We found relevant contributions of
the pseudo-core and pseudo-surface of the 3D pseudo-
folded proteins in the discrimination between VKCs accord-
ing to the three electrophysiological variables. On the other
hand, the accuracies of voltage-gated K+ channel models by
GA-SVMwere higher than theother nine GA-wrapper linear
and nonlinear classifiers [29].
Since many disease-causing mutations exert their effects
by altering protein folding, the prediction of protein struc-
tures and stability changes upon mutation is a fundamentalaim in molecular biology. BRGNN technique had been also
applied to model the conformation stability of mutants of
humanlysozyme[68],geneVprotein[69], andchymotrypsin
inhibitor 2 [15]. The change of unfolding Gibbs free energy
change (G) of human lysozyme, gene V protein mutants
were successfully modeled using amino acid sequence auto-
correlation vectors calculated by measuring the autocorre-
lations of 48 amino acid/residue properties [68,69] selected
from theAAindexdata base [78].Onthe other hand,G of
chymotrypsin inhibitor 2 mutants were predicted using pro-
tein-radial distribution scores calculated over 3D structure
using the same 48 amino acid/residue properties. Ensemblesof BRGNNs yielded optimum nonlinear models for the con-
formational stabilities of human lysozymes, gene V proteins,
andchymotrypsin inhibitor2 mutants,whichdescribedabout
68, 66 and 72% of ensemble test set variances (Table 1).
The neural network models provided information about
the most relevant properties ruling conformational stability
of the studied proteins. The authors determined how an input
descriptor is correlated to the predicted output by the net-
work. [15,68,69]. Entropy changes and the power to be at
the N-terminal of a -helix had the strongest contributions
to the stability pattern of human lysozyme. In the case of
gene V protein mutants, the sequence autocorrelations of
thermodynamic transfer hydrophobicity and the power to be
at the middle of a -helix had the highest impact on the
G. Meanwhile, spherical distribution of entropy change
of side-chains on the 3D structure of chymotrypsin inhibitor
2 mutants, exhibited the highest relevance in comparison to
the other descriptors.
Prediction of structural class of protein, that characterizes
the overall folding type or its domain, had been based on a
group of features that only possesses a kind of discriminative
information. Different types of discriminative information
associated with primary sequence have been missed reduc-
ing the prediction accuracy [102]. Li et al. [102] reported a
novel method for the prediction of protein structure class by
coupling GA and SVMs. Proteins were represented by six
feature groups composed of 10 structural and physicochemi-
cal features of proteins and peptides yielding a total of 1,447
features.GA was applied to selectan optimum feature subset
andto optimizeSVMs parameters.Theauthorsused a hybrid
binary-decimal representation of chromosomes, and the fit-
ness function was the accuracy of five-fold crossvalidation.
Features in thechromosomewere representedin1,447binary
123
-
8/2/2019 Genetic Algorithm Optimization in Drug Design QSAR Bayesian-Regularized Genetic Neural Networks (BRGNN) and
17/21
Mol Divers (2011) 15:269289 285
genes and the parameters as two-decimal genes. Jack-knife
tests on the working data sets yielded outstanding prediction
accuracies of classification higher than 97% with an overall
accuracy of 99.5% [102] (Table 2).
SVM learning methods have also shown effectiveness
for prediction of protein subcellular and subnuclear local-
izations, which demand cooperation between informative
features and classifier design. For this propose, Huang etal. [103] reported an accurate system for predicting protein
subnuclear localization, named ProLoc, based on evolution-
ary SVM (ESVM) classifier with automatic feature selec-
tion from a large set of physicochemical composition (PCC)
descriptors. An inheritable GA combined with SVM auto-
matically selected the best number of PCC features using
two data sets, which have 504 proteins localized in six sub-
nuclear compartments, and 370 proteins localized in nine
subnuclearcompartments.The featuresandSVMparameters
were encoded concatenated in binary chromosomes, which
evolved according to mutation and crossover operators. The
training accuracy of ten-fold-crossvalidation was used as fit-ness function. ProLoc with 33 and 28 PCC features reported
leave-one-out accuracies over 56 and 72% for each data set,
respectively [103]. Both predictors overcame a SVM model
using k-peptide composition features and an optimized evi-
dence-theoretick-nearestneighbor classifierutilizing pseudo
amino acid composition.
The nature of different proteinprotein complexes was
analyzed by a computational framework that handles the
preparation,processing,andanalysisof proteinproteincom-
plexes with machine learning algorithms [104]. Among
different machine learning algorithms, SVM was applied
in combination with various feature selection techniques
including GA. Physicochemical characteristics of protein
protein complex interfaces were represented in four different
ways, using two different atomic contact vectors, DrugScore
pair potentialvectors, andSFC score descriptor vectors. Two
different data sets were used: one with contacts enforced
by the crystallographic packing environment (crystal con-
tacts) andbiologically functionalhomodimercomplexes;and
another with permanent complexes and transient protein
protein complexes [104]. The authors implemented a simple
GA with a population size of 30, a crossover rate of 75%,
and a mutation rate of 5%. Two-point crossover and sin-
gle bit mutation were applied to evolve until convergence,
defined as no further changes over 10 generations or 100%
predictionquality, wasreached.Although,SVMdidnotyield
the highest accuracy, the optimum models obtained by GA
selection reached more than 90% accuracy for the packing
enforced/functional and the permanent/transient complexes.
GA also identifiedthediscriminating ability of the three most
relevant features, given in descending order as follows: the
contacts of hydrophobic and/or aromatic atoms located in the
proteinprotein interfaces, the pure hydrophobic/hydropho-
bic atom contacts, and the polar/hydrophobic atom contacts
[104].
Kernytsk et al. [105] reported a framework that sets first
global sequence features and, second, widely expands the
feature space by generically encoding the coexistence of
residue-based features in proteins. A global protein feature
scheme was generated for function and structure prediction
studies. They proposed a combination of individual features,which encompasses the feature space from global feature
inputs to features that can capture every local evidence such
as a the individual residues of a catalytic triad.GA-optimized
ANN and SVM were used to explore the vast feature space
created. Inside GA, the initial population of solutions was
built as multiple combinations of all the global features,
which also contains the maximal intersection of all the fea-
ture classes with 360 input features [105]. New offspring
was created by inserting or deleting nodes in the existing
individuals. Nodes were defines as feature classes, or any
operator on the features which combined two global feature
classes. The mutation probability was set to 0.4 per node pergeneration, and the probability of crossover was set to 0.2
per solution per generation. After new offspring solutions
are generated via crossover and/or mutation (insertion/dele-
tion) of the parent solutions, the worst solutions were dis-
carded to restore the populations original size ensuring that
the best-performing solutions are not selected out of the next
generation by chance, which have a tendency to converge
faster at the cost of losing diversity more quickly among the
solutions. This contrasts with the typical selection scheme
(roulette wheel selection) where the more-fit solutions have
a higher chance than less-fit solutions of getting to the next
generation but have no guaranteed survival. Area under the
receiver operating characteristics curve was monitored as fit-
ness/cost function.
Population size was set to 100 solutions with 50 potential
offspring created in each generation, and GA ran for 1,000
generations. GA was critical to effectively manage a feature
space that is far too large for exhaustive enumeration and
allowed detecting combinations of features that were neither
too general with poor performance, nor too specific leading
to overtraining. This GA variant was successfully applied to
the prediction of protein enzymatic activity [105].
Conclusions
The reviewed articles comprise GA-optimized predictors
implemented to quantitativelyor qualitativelydescribestruc-
tureactivityrelationships in datarelevant fordrugdiscovery.
BRGNN and GA-SVM are presented and discussed as pow-
erful data modeling tools arisen from the combination of GA
andefficientnonlinearmapping techniques, such as BRANN
and SVM. Convoluted relationships can be successfully
123
-
8/2/2019 Genetic Algorithm Optimization in Drug Design QSAR Bayesian-Regularized Genetic Neural Networks (BRGNN) and
18/21
286 Mol Divers (2011) 15:269289
modeled and relevant explanatory variables identify among
large pools of descriptors. Interestingly, accurate predictors
were achieved from 2D topological representation of ligands
andtargets.The approach outperformedother linearand non-
linear mapping techniques combiningdifferent feature selec-
tion methods. BRGNNs showed satisfactory performance,
converging quickly toward the optimal position and avoid
overfitting in a large extent. Similarly, GA-optimizations ofSVMs yielded robust and best generalizable models. How-
ever, considering complexity of network architecture and
weightoptimization routines,BRGNN was more suitable for
function approximation of convoluted but low dimensional
data in comparison to GA-SVM which performed better in
classification tasks of high dimensional data. These method-
ologies are regarded as useful tools for drug design.
Acknowledgements Julio Caballero acknowledges with thanks the
support received through Programa Bicentenario de Ciencia y Tecno-
loga, ACT/24.
References
1. Gasteiger J (2006) Chemoinformatics: a new field with a
long tradition. Anal Bioanal Chem 384:5764. doi:10.1007/
s00216-005-0065-y
2. Cramer RD, PattersonDE,BunceJD (1988) Comparative molec-
ular field analysis (CoMFA). 1. Effect of shape on binding of ste-
roids to carrier proteins. J Am Chem Soc 110:59595967. doi:10.
1021/ja00226a005
3. Klebe G, Abraham U, Mietzner T (1994) Molecular similarity
indices in a comparative analysis (CoMSIA) of drug molecules
to correlate and predict their biological activity. J Med Chem37:41304146. doi:10.1021/jm00050a010
4. Folkers G, Merz A, Rognan D (1993) CoMFA: scope and limi-
tations. In: Kubinyi H (ed) 3D-QSAR in drug design. Theory,
methods and applications. ESCOM Science Publishers BV, Lei-
den pp 583618
5. Hansch C, Kurup A, Garg R, Gao H (2001) Chem-bioinformat-
ics and QSAR: a review of QSAR lacking positive hydrophobic
terms. Chem Rev 101:619672. doi:10.1021/cr0000067
6. Sabljic A (1990) Topological indices and environmental chem-
istry. In: Karcher W, Devillers J (eds) Practical applications of
quantitative structureactivity relationships (QSAR) in environ-
mental chemistry and toxicology. Kluwer, Dordrecht pp 6182
7. Karelson M, Lobanov VS, Katritzky AR (1996) Quantum-chem-
ical descriptors in QSAR/QSPR studies. Chem Rev 96:1027
1043. doi:10.1021/cr950202r8. Livingstone DJ, Manallack DT, Tetko IV (1997) Data modelling
with neural networks: advantages and limitations. J Comput Aid
Mol Des 11:135142. doi:10.1023/A:1008074223811
9. Burbidge R, Trotter M, Buxton B, Holden S (2001) Drug design
by machine learning: support vector machines for pharma-
ceutical data analysis. Comput Chem 26:514. doi:10.1016/
S0097-8485(01)00094-8
10. Caballero J, Fernndez M (2006) Linear and non-linear mod-
eling of antifungal activity of some heterocyc