a review on computational methods in developing quantitative

22
Navdeep Singh Sethi: A Review on Computational Methods in Developing Quantitative Structure-Activity… 815 A Review on Computational Methods in Developing Quantitative Structure-Activity Relationship (QSAR) Navdeep Singh Sethi Department of Pharmacy (Pharmaceutical Chemistry), DoabaGroup of Colleges, Kharar (Mohali)-140103, Punjab, India. ABSTRACT: Virtual filtering and screening of combinatorial libraries have recently gained attention as methods complementing the high-throughput screening and combinatorial chemistry. These chemoinformatic techniques rely heavily on quantitative structure-activity relationship (QSAR) analysis, a field with established methodology and successful history. In this review, we discuss the computational methods for building QSAR models. The review starts with general introduction and theories of QSAR and identifying the general scheme of a QSAR model. Following, the review focus on the methodologies in constructing three main components of QSAR model, namely the methods for describing the molecular structure of compounds, for selection of informative descriptors and for activity prediction. The review present both the well established methods as well as techniques introduced into the QSAR domain. KEYWORDS: QSAR, FreeWilson analysis; Hansch analysis; molecular descriptors (2D descriptors and 3D descriptors); feature selection; machine learning. Introducton If we can understand how a molecular structure brings about a particular effect in a biological system, we have a key to unlocking the relationship and using that information to our advantage. Formal development of these relationships on this premise has proved to be the foundation for the development of predictive models. If we take a series of chemicals and attempt to form a quantitative relationship between the biological effect (i.e. the activity) and the chemistry (i.e. the structure) of each of the chemicals, then we are able to form a quantitative structure-activity relationship or QSAR. Less complex, or quantitative, understanding of the role of structure to govern effects, i.e. that a fragment or sub- structure could result in a certain activity, is often simply termed a structure-activity relationship or SAR. Together SARs and QSARs can be referred to as (Q)SARs and fall within a range of techniques known as in silico approaches. A (Q)SAR comprises three parts: the (activity) data to be modeled and hence predicted, data with which to model and a method to formulate the model. The purpose of in silico studies includes the following: (a) To predict biological activity and physicochemical properties by rational means. (b) To comprehend and rationalize the mechanism of action within a series of chemicals. Underlying these aims, the reasons for wishing to develop these models include: (a) Savings in the cost of product development (e.g. in pharmaceutical, pesticide, personal products, etc. areas). (b) Predictions could reduce the requirement for lengthy and expensive animal tests. (c) Reduction and even in some cases replacement of animal tests, thus reducing animal use and obviously pain and discomfort to animals. (d) Other areas of promoting green and greener chemistry to increase efficiency and eliminate waste by not following leads unlikely to be successful 1-3 . Quantitative structure-activity relationships (QSARs) are based on the assumption that the structure of a molecule (i.e. geometric, steric and electronic properties) must contain the features responsible for its physical, chemical and biological properties and on the ability to represent the chemical by one or more numerical descriptors. Quantitative structure-activity relationships (QSARs) correlate within congeneric series of compounds, affinities of ligands to their binding sites, inhibition constants, rate constants and other biological activities either with certain structural features (Free Wilson analysis) or with atomic, group or molecular properties such as lipophilicity, polarizability, electronic and steric properties (Hansch analysis) 4,5 . Since then, QSAR equations have been used to describe thousands of biological activities within different series of drugs and drug candidates. Especially enzyme inhibitions data have been successfully correlated with physico-chemical properties of the ligands. In certain cases, where X-ray structure of proteins became available, the International Journal of Drug Design and Discovery Volume 3 Issue 3 July – September 2012. 815-836 * For correspondence: Navdeep Singh Sethi, Tel: + 91-9463696295, Fax: 0160-2285155 Email: [email protected] 815

Upload: lamdien

Post on 14-Feb-2017

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Review on Computational Methods in Developing Quantitative

  Navdeep Singh Sethi: A Review on Computational Methods in Developing Quantitative Structure-Activity…  815  

 

A Review on Computational Methods in Developing Quantitative Structure-Activity Relationship (QSAR) 

Navdeep Singh Sethi Department of Pharmacy (Pharmaceutical Chemistry), DoabaGroup of Colleges, Kharar (Mohali)-140103, Punjab, India.

ABSTRACT:  Virtual filtering and screening of combinatorial libraries have recently gained attention as methods complementing the high-throughput screening and combinatorial chemistry. These chemoinformatic techniques rely heavily on quantitative structure-activity relationship (QSAR) analysis, a field with established methodology and successful history. In this review, we discuss the computational methods for building QSAR models. The review starts with general introduction and theories of QSAR and identifying the general scheme of a QSAR model. Following, the review focus on the methodologies in constructing three main components of QSAR model, namely the methods for describing the molecular structure of compounds, for selection of informative descriptors and for activity prediction. The review present both the well established methods as well as techniques introduced into the QSAR domain.

KEYWORDS: QSAR, FreeWilson analysis; Hansch analysis; molecular descriptors (2D descriptors and 3D descriptors); feature selection; machine learning.

Introducton If we can understand how a molecular structure brings about a particular effect in a biological system, we have a key to unlocking the relationship and using that information to our advantage. Formal development of these relationships on this premise has proved to be the foundation for the development of predictive models. If we take a series of chemicals and attempt to form a quantitative relationship between the biological effect (i.e. the activity) and the chemistry (i.e. the structure) of each of the chemicals, then we are able to form a quantitative structure-activity relationship or QSAR.

Less complex, or quantitative, understanding of the role of structure to govern effects, i.e. that a fragment or sub-structure could result in a certain activity, is often simply termed a structure-activity relationship or SAR. Together SARs and QSARs can be referred to as (Q)SARs and fall within a range of techniques known as in silico approaches. A (Q)SAR comprises three parts: the (activity) data to be modeled and hence predicted, data with which to model and a method to formulate the model. The purpose of in silico studies includes the following:

(a) To predict biological activity and physicochemical properties by rational means.

(b) To comprehend and rationalize the mechanism of action within a series of chemicals.

Underlying these aims, the reasons for wishing to develop these models include: (a) Savings in the cost of product development (e.g. in

pharmaceutical, pesticide, personal products, etc. areas).

(b) Predictions could reduce the requirement for lengthy and expensive animal tests.

(c) Reduction and even in some cases replacement of animal tests, thus reducing animal use and obviously pain and discomfort to animals.

(d) Other areas of promoting green and greener chemistry to increase efficiency and eliminate waste by not following leads unlikely to be successful1-3.

Quantitative structure-activity relationships (QSARs) are based on the assumption that the structure of a molecule (i.e. geometric, steric and electronic properties) must contain the features responsible for its physical, chemical and biological properties and on the ability to represent the chemical by one or more numerical descriptors. Quantitative structure-activity relationships (QSARs) correlate within congeneric series of compounds, affinities of ligands to their binding sites, inhibition constants, rate constants and other biological activities either with certain structural features (Free Wilson analysis) or with atomic, group or molecular properties such as lipophilicity, polarizability, electronic and steric properties (Hansch analysis)4,5. Since then, QSAR equations have been used to describe thousands of biological activities within different series of drugs and drug candidates. Especially enzyme inhibitions data have been successfully correlated with physico-chemical properties of the ligands. In certain cases, where X-ray structure of proteins became available, the

 

International Journal of Drug Design and Discovery

Volume 3 • Issue 3 • July – September 2012. 815-836 

* For correspondence: Navdeep Singh Sethi,

Tel: + 91-9463696295, Fax: 0160-2285155

Email: [email protected]

815

Page 2: A Review on Computational Methods in Developing Quantitative

816 International Journal of Drug Design and Discovery Volume 3 • Issue 3 • July – September 2012 

 

results of QSAR regression models could be interpreted with the additional information from the three-dimensional (3D) structures6,7.

QSAR studies can reduce the costly failures of drug candidates in clinical trials by filtering the combinatorial libraries. Virtual filtering can eliminate compounds with predicted toxic of poor pharmacokinetic properties early in the pipeline8,9. It also allows for narrowing the library to drug-like or lead-like compounds and eliminating the frequent hitters i.e. compounds that show unspecific activity in several assays and rarely results in leads10-11. Including such considerations at an early stage results in multidimensional optimization, with high activity as an essential but not only goal. Considering activity optimization, building target-specific structure-activity models based on identical hits can guide high throughput screening (HTS) by rapidly screening the library for most promising candidates. Such focused screening can reduce the number of experiments and allow for use of more complexes, low throughput assay12. Feedback loops of high-throughput and virtual screening, resulting in sequential screening approach allow therefore for more rational progress towards high quality lead compounds13. Later in the drug discovery pipeline, accurate QSAR models constructed on the basis of the lead series can assist in optimizing the lead14. The importance and difficulty of above described tasks facing QSAR models has inspired many chemo informatics researchers to borrow from recent development in various fields including pattern recognition, molecular modeling, machine learning and artificial intelligence. This results in large family of conceptually different methods being used for creating QSARs. The purpose of this review is to guide the reader through the diversity of the techniques and algorithms for developing successful QSAR model.

Quantitative Structure-Activity Relationship (QSAR) Theories All QSAR analyses are based on the assumption of linear additive contribution of the different structural properties or features of a compound to its biological activity, provided that there are no nonlinear dependences of transport or binding on certain physicochemical properties. This simple assumption is proven by some dedicated investigation, for example the scoring function of the de novo drug design program LUDI (Eqn 1.), in addition the result of many Free Wilson and Hansch analyses support this concept15,16.

∆Gbinding = ∆G0 + ∆Ghb + ∆Gionic + ∆Glipo + ∆Gro .....(1)

Overall loss of translational and rotational entropy, ∆G0 = +5.4 kJ mol-1

Ideal neutral hydrogen bond, ∆Ghb = -4.7 kJ mol-1

Ideal ionic contraction, ∆Gionic = -8.3 kJ mol-1

Lipophilic contact, ∆Glipo = -0.17 J mol-1 A-2

Entropy loss per rotatable bond of the ligand, ∆Grot = + 1.4 kJ mol-1

Equation 1 correlates the free energy of binding, ∆Gbinding with a constant term ∆G0 that describes the loss of overall translational and rotational degrees of freedom and ∆Ghb, ∆Gionic and ∆Glipo which are structure-derived energy terms for neutral and charged hydrogen bond interactions and hydrophobic interactions between the ligand and the protein; ∆Grot describes the loss of internal rotational degree of freedom of the ligand. Because of the extra thermodynamic relationship between free energy ∆G and equilibrium constant K (Eqn 2.) or rate constant k (kon = association constant, koff = dissociation constant of ligand-receptor complex formation), the logarithms of such values can be correlated with binding affinities.

∆G = – 2.303 RT log K = – 2.303 RT log kon/ koff …..(2)

Logarithms of molar concentration C that produce a certain biological effect can be correlated with molecular features or with physicochemical properties that are also free energy related equilibrium constant; normally the logarithms of inverse concentrations (log 1/C) are used to obtain larger values for the more active analogs.

Free Wilson Analysis

In 1964, Free and Wilson derived a mathematical model that describes the presence and absence of certain structural features i.e. those groups that are chemical modified, by values of 1 and 0 and correlates the resulting structural matrix with biological activity values (Eqn 3.)

Log1/C = ∑ ai + µ ….. (3)

The values of ai in equation 3 are the biological activity groups contributing of the substituents X1, X2,……Xi in the different positions p of compound 1 (Figure 1) and µ is the biological activity values of the reference compound, most often the unsubstituted parent structure of a series4,6.

 

A common skeleton bears substituents Xi in different position p; the presence or absence of these substituents is coded by the values 1 and 0 respectively.

Fig. 1. Schematic presentation of a molecule for Free Wilson analysis.

Page 3: A Review on Computational Methods in Developing Quantitative

  Navdeep Singh Sethi: A Review on Computational Methods in Developing Quantitative Structure-Activity…  817  

 

Equation 4 describes the antiadrenergic activities for 22 different m-, p- and m,p-disubstituted analogs of the N,N-dimethyl-α-bromophenylamine 2 (Figure 2) where C is the concentration that causes a 50% reduction of the adrenergic effect of a certain epinephrine dose4.

log 1/C = – 0.301 (±0.50) [m-F] + 0.207 (±0.29) [m-Cl] + 0.434 (±0.27) [m-Br] + 0.579 (±0.50) [m-I] + 0.454 (±0.27) [m-Me] +0.340 (±0.30) [p-F] + 0.768 (±0.30) [p-Cl] + 1.020 (±0.30) [p-Br] +1.429 (±0.50) [p-I] + 1.256 (±0.33) [p-Me] + 7.821 (±0.27) (n = 22; r = 0.969; s = 0.194; F = 16.99) …..(4)

 

Fig. 2. N,N-dimethyl-α-bromophenylamines (X, Y = H, F, Cl, Br, I, Me).

Where n = number of compounds; r = correlation coefficient, measure for the relative quality of a model; s = standard deviation, measure for the absolute quality of a model; F = fisher value, measure for statistical significance; C = molar concentration that causes a certain biological effect.

Equation 4 illustrates the main advantage of Free Wilson analysis: only the biological activity values and the chemical structure of the compounds need to be known to derive a QSAR model. On the other hand Free Wilson analysis has several shortcomings:

(a) At least two different position of substituents must be chemically modified;

(b) Predictions can only be made for new combination of substituents already included in the analysis;

(c) Single point determination i.e. the single occurrence of a certain structural feature in the whole data set obscure the statistical results;

(d) Many degrees of freedom are wasted to describe every substituent.

Nevertheless, Free Wilson analysis is often used to see at a glance which physicochemical properties might be important for the biological activity. In this data set, it can be easily concluded from equation 4 that:

• Biological activities increase with increasing lipophilicity (F to Cl, Br and I);

• Biological activities increase with electron donor properties (methyl has larger group contributions than the equi-lipophilic Cl);

• meta-substituents have lower group contributions than para-substituents.

Hansch Analysis

Also in 1964, the linear free-energy related Hansch model (sometimes called the extra thermodynamic approach) was published4,7.

Log 1/C = a (log P)2 + b log P + c σ +... + k …..(5)

P = n-octanol/water partition coefficient; σ = Hammett electronic parameter; a, b, c = regression coefficient; k = constant term.

Equation 5 was developed from the concept that the transport of a drug from the site of application to its site of action depends in a nonlinear manner on the lipophilicity of the drug and that the binding affinity to its biological counter-part, such as an enzyme or a receptor depends on the lipophilicity, the electronic properties and the other free-energy related properties. Equation 5 combines the description of both processes in one mathematical model. In addition to the introduction of a parabolic term for the nonlinear lipophilicity dependence and the combination of different physicochemical properties in one equation, Hansch and Fujita defined lipophilicity parameters π of substituent X (Eqn 6.), in the same matter as Hammett had defined the electronic parameter σ (Eqn 7.) about 30 years earlier. The partition coefficient P in equation 6 is an equilibrium constant, similar to the dissociation or reaction constant K in equation 7. The absence of a reaction term π in equation 6 is explained by the fact that all π values refers to the n-octanol/water system.

πX = log PRX - log PRH …..(6)

ρσ = log KRX - log KRH …..(7)

With the help of these definitions it was possible to use tabulated values instead of measured values.For the data set described by equation 4, 8 (Figure 3) and 9 (Es

meta = steric parameter for meta-substituents) could be derived. All parameters that are relevant in a QSAR study are presented and discussed in Figure 3.

log 1/C = 1.259 (±0.19)π – 1.460 (±0.34)σ+ + 0.208 (±0.17) Es

meta 7.619 (±0.24)

(n = 22; r = 0.959; s = 0.173; F = 69.24; Q2 = 0.869; SPRESS = 0.222) …..(9)

Page 4: A Review on Computational Methods in Developing Quantitative

818 International Journal of Drug Design and Discovery Volume 3 • Issue 3 • July – September 2012 

 

Fig. 3. Equation 8 describes a Quantitative relationship between the antiadrenergic activities of Compound 2.

Equations 8 and 9 demonstrate the superiority of Hansch analysis as compared with Free Wilson analysis. Only a few properties are needed to correlate the biological activities; the model can directly be interpreted in physicochemical terms. The results of the Free Wilson analysis are confirmed in all details but prediction for compounds with other substituents can be made, for example for X = ethyl or CF3. On the other hand, predictions those are too far outside the range of investigated parameters, such as for tert-Bu or -OH or -SO2NH2 will most probably fail because of narrow chemical relationship among the investigated substituents and the very different nature of these chemical groups, in size or in their hydrogen bond donor and acceptor properties. For such predictions much more heterogeneous substituents have to be included in the derivation of QSAR model. The fact that different model can be derived for the same data set frequently offers a dilemma in Hansch analysis. One can never be sure that a certain QSAR model is the correct one for the data set. On the other hand, different models correspond to different working hypothesis. Proposals for the synthesis of new analogs can be made in the following steps, which allow discrimination between these models.

Free Wilson group contributions for every substituent can be derived from equations 8 and 9, which clearly indicate the close theoretical relationship between Free Wilson analysis and linear Hansch analysis. Correspondingly both approaches can be used in one model, the so-called ‘mixed approach’ (Eqn 10).

log 1/C = a (log P)2 + b log P + cσ +…. + ∑ ai + k …..(10)

Equation 10 combines the advantage of Hansch and Free Wilson analysis and widens the applicability of both methods. Physicochemical parameters describe parts of the molecules with broad structural variation, whereas indicator variablesai (Free Wilson type variables) encodes the effect of structural variations that cannot be described otherwise4.

General Scheme of a QSAR Study The chemoinformatics methods used in building QSAR models can be divided into three groups i.e. extracting descriptors from molecular structure, choosing those informative in the context of analyzed activity and finally using the values of the descriptors as independent variables to define a mapping that correlates them with the activity in question. The typical QSAR system realizes these phases as depicted in Figure 4.

Page 5: A Review on Computational Methods in Developing Quantitative

  Navdeep Singh Sethi: A Review on Computational Methods in Developing Quantitative Structure-Activity…  819  

 

The molecular structure is encoded using numerical descriptors. The set of descriptors is pruned to select the most informative ones. The activity is derived as a function of the selected descriptors.

Fig. 4. Main stages of a QSAR study.

Generation of Molecular Descriptors from Structure The small-molecule compounds are defined by their structure, encoded as a set of atoms and covalent bonds between them. However, the structure cannot be directly used for creating structure-activity mapping for reasons stemming from chemistry and computer science. First, the chemical structure do not usually contain in an explicit from the information that relates to activity. This information has to be extracted from the structure. Various rationally designed molecular descriptors accentuate different chemical properties implicit in the structure of the molecule. Only those properties may correlate more directly with the activity. Such properties range from physicochemical and quantum-chemical to geometrical and topological features.

The second, more technical reason which guides the use and development of molecular descriptors stems from the paradigm of feature space prevailing in statistical data analysis. Most methods employed to predict the activity require as input numerical vectors of features of uniform length for all molecules. Chemical structures of compounds are diverse in size and nature and as such do not fit into this model directly. To circumvent this obstacle, molecular descriptors convert the structure to the form of well-defined sets of numerical values.

Selection of Relevant Molecular Descriptors Many applications are capable of generating hundreds or thousands of different molecular descriptors. Typically,

only some of them are significantly correlated with the activity. Furthermore, many of the descriptors are intercorrelated. This has negative effects on several aspects of QSAR analysis. Some statistical methods require that the number of compounds is significantly greater than the number of descriptors. Using large descriptor sets would require large datasets. Other methods, while capable of handling datasets with large descriptors to compounds ratios, nonetheless suffer from loss of accuracy, especially for compounds unseen during the preparation of the model. Large number of descriptors also affects interpretability of the final model. To tackle these problems, a wide range of methods for automated narrowing of the set of descriptors to the most informative ones is used in QSAR analysis.

Mapping the Descriptors to Activity Once the relevant molecular descriptors are computed and selected, the final task of creating a function between their values and the analyzed activity can be carried out. The value quantifying the activity is expressed as a function of the values of the descriptors. The most accurate mapping function from some wide family of functions is usually fitted based on the information available in the training set i.e. compounds for which the activity is known. A wide range of mapping function families can be used including linear or non-linear ones and many methods for carrying out the training to obtain the optimal function can be employed.

Molecular Descriptors Molecular descriptors map the structure of the compound into a set of numerical or binary values representing

Page 6: A Review on Computational Methods in Developing Quantitative

820 International Journal of Drug Design and Discovery Volume 3 • Issue 3 • July – September 2012 

 

various molecular properties that are deemed to be important for explaining activity. Two broad families of descriptors can be distinguished based on the dependence on the information about 3D orientation and conformation of the molecule.

2D QSAR Descriptors The broad family of descriptors used in the 2D QSAR approach shares a common property of being independent from the 3D orientation of the compound. These descriptors range from simple measures of entities constituting the molecule, through its topological and geometrical properties to computed electrostatic and quantum-chemical descriptors or advanced fragment-counting methods.

Constitutional Descriptors

Constitutional descriptors capture properties of the molecules that are related to elements constituting its structure. These descriptors are fast and easy to compute. Examples of constitutional descriptors include molecular weight, total number of atoms in the molecule and number of atoms of different identity. Also, a number of properties relating to bonds are used including total numbers of single, double, triple or aromatic type bonds as well as number of aromatic rings.

Electrostatic and Quantum-Chemical Descriptors

Electrostatic descriptors capture information on electronic nature of the molecule. These include descriptors containing information on atomic net and partial charges17. Descriptors for highest negative and positive charges are also informative, as well as molecular polarizability18. Partial negatively or positively charged solvent-accessible atomic surface areas have also been used as informative electrostatic descriptors for modeling intermolecular hydrogen bonding19. Energies of highest occupied and lowest unoccupied molecular orbital from useful quantum-chemical descriptors as do the derivative quantities such as absolute hardness20,21.

Topological Descriptors

The topological descriptors treat the structure of the compound as a graph, with atoms as vertices and covalent bonds as edges. Based on this approach, many indices quantifying molecular connectivity were defined starting with Wiener index22, which counts the total number of bonds in shortest paths between all pairs of non-hydrogen atoms. Other topological descriptors include Randic indices x23, defined as sum of geometric averages of edge degrees of atoms within paths of given lengths, Balaban’s J index24

and Shultz index25.

Information about valence electrons can be included in topological descriptors e.g. Kier and Hall indices xv-26 or

Galvez topological charge indices27. The first ones use geometric averages of valence connectivities along paths. The latter measure topological valences of atoms and net charges transfer between pair of atoms separated by a given number of bonds. Descriptors combining connectivity information with other properties are also available, e.g. BCUT28-30 descriptors which take form of eigenvalues of atom connectivity matrix with atom charge, polarizability or H-bond potential values on diagonal and additional terms of diagonal. Similarly, the topological sub-structural molecular design (TOSS-MODE/TOPS-MODE)31,32 rely on spectral moments of bond adjacency matrix amended with information on for e.g. bond polarizability. The atom type electrotopological (E-state) indices33,34 use electronic and topological organization to define the intrinsic atom state and the perturbations of this state induced by other atoms. This information is gathered individually for a wide range of atom types to form a set of indices.

Geometrical Descriptors

Geometrical descriptors rely on spatial arrangement of atoms constituting the molecule. These descriptors include information on molecular surface obtained from atomic van der Waals area and their overlap35. Molecular volume may be obtained from atomic van der Waals volumes36. Principal moments of inertia and gravitational indices37 also capture information on spatial arrangement of the atoms in molecule. Shadow areas, obtain by projection of the molecule to its two principal axes are also used38. Another geometrical descriptor is the total solvent-accessible surface area39.

Fragment-Based Descriptors and Molecular Fingerprints

The family of descriptors relying on substructural motifs is often used, especially for rapid screening of very large databases. The BCI fingerprints40 are derived as bits describing the presence or absence in the molecule of certain fragments, including atoms with their nearest neighborhoods, atom pairs and sequences, or ring-based fragments. A similar approach is present in the basic set of 166 MDL keys. However, other variants of the MDL keys are also available, including extending sets of keys or compact sets. The later are results of dedicated pruning strategies41 or elimination methods, e.g. the fast random elimination of descriptors/substructure keys (FRED/ SKEYS)42. Recently introduced Hologram QSAR (HQSAR) approach is based on counting the number of occurrences of certain sub-structural paths of functional groups. For each group, cyclic redundancy code is calculated which serves as a hashing function for partitioning the sub-structural motifs into bins of hash table. The numbers of elements in the bins form a hologram43,44.

Page 7: A Review on Computational Methods in Developing Quantitative

  Navdeep Singh Sethi: A Review on Computational Methods in Developing Quantitative Structure-Activity…  821  

 

The daylight fingerprints are a natural extension of the fragment-based descriptors by eliminating the reliance on pre-defined list of sub-structure motifs. The fingerprint for each molecule is a string of bits. However, a structural motif in the molecule does not correspond to a single bit but leads through a hashing function to a pattern of bits that are added to the fingerprint with a logical “or” operation. The bits in the different patterns may overlap, due to large number of possible patterns and a finite length of a bit string. Thus, the fact that a bit or several bits are set in the fingerprint cannot be interpreted as a proof of pattern’s presence. However, if one of the bits corresponding to a given pattern is not set, this guarantees that the pattern is not present in the molecule. This allows for rapid filtering of the molecules that do not possess certain structural motifs. The patterns are generated individually for each molecule and describe atoms with their neighborhoods and paths of up to 7 bonds. Other approaches than hashed fingerprints are also proposed to circumvent the problem of a pre-defined sub-structure library, e.g. algorithm for optimal discovery of frequent structural fragments relevant to given activity45.

3D QSAR Descriptors The 3D QSAR methodology is much more computationally complex than 2D QSAR approach. In general, it involves several steps to obtain numerical descriptors of the compound structure. First, the conformation of the compound has to be determined either from experimental data or molecular mechanics and then refined by minimizing the energy46,47. Next, the conformers in dataset have to be uniformly aligned in space. Finally, the space with immersed conformer is probed computationally for various descriptors. Some methods independent of the compound alignment have also been developed.

Alignment-Dependent 3D QSAR Descriptors

The group of methods that require molecule alignment prior to the calculation of descriptors is strongly dependent on the information on the receptor for the modeled ligand. In case where such data is available, the alignment can be guided by studying the receptor-ligand complexes. Otherwise, purely computational methods for superimposing the structures in space have to be used48,49. These methods relies e.g. on atom-atom or substructure-substructure mapping.

Comparative Molecular Field Analysis

The Comparative Molecular Field Analysis (CoMFA)50 uses electrostatic (Coulombic) and steric (van der Waals) energy fields defined by the inspected compound. The aligned molecule is placed in a 3D grid. In each point of the grid lattice a probe atom with unit charge is placed and the potentials (Coulomb and Lennard-Jones) of the energy fields are computed. Then, they serve as descriptors in

further analysis, typical using partial least square regression. This analysis allows for identifying structure regions positively and negatively related to the activity in question.

Comparative Molecular Similarity Indices Analysis

The Comparative Molecular Similarity Indices Analysis (CoMSIA)51 is similar to CoMFA in the aspect of atom probing throughout the regular grid lattice in which the molecules are immersed. The similarity between probe atom and the analyzed molecule are calculated. Compared to CoMFA, CoMSIA uses a differential potential function, namely the Gaussian-type function. Steric, electrostatic and hydrophobic properties are then calculated; hence the probe atom has unit hydrophobicity as additional property. The use of Gaussian-type potential function instead of Lennard-Jones and Coulombic functions allows for accurate information in grid points located within the molecule. In CoMFA unacceptably large values are obtained in these points due to the nature of the potential functions and arbitrary cut-offs that have to be applied.

Alignment-Independent 3D QSAR Descriptors

Another group of 3D descriptors are those invariant to molecule rotation and translation in space. Thus, no superposition of compounds is required.

Comparative Molecular Moment Analysis

The Comparative Molecular Moment Analysis (CoMMA)52

uses second-order moments of the mass distribution and charge distribution. The moments relate to center of the mass and center of the dipole. The CoMMA descriptors include principal moments of inertia, magnitudes of dipole moment and principal quadrupole moment. Furthermore, descriptors relating charges to mass distributions are defined, i.e. magnitudes of projections of dipole upon principal moments of inertia and displacement between center of mass and center of dipole.

Weighted Holistic Invariant Molecular Descriptors

The Weighted Holistic Invariant Molecular (WHIM)53,54

and Molecular Surface WHIM55 descriptors provide the invariant information by employing the principal component analysis (PCA) on the centered co-ordinates of the atoms constituting the molecule. This transforms the molecule into the space that captures the most variance. In this space, several statistics are calculated and serve as directional descriptors including variance, proportions, symmetry and kurtosis. By combining the directional descriptors, non-directional descriptors are also defined. The contribution of each atom can be weighted by a chemical property leading to different principal components capturing variance within the given property. The atoms can be weighted by mass, van der Waals volume, atomic electronegativity, atomic polarizability,

Page 8: A Review on Computational Methods in Developing Quantitative

822 International Journal of Drug Design and Discovery Volume 3 • Issue 3 • July – September 2012 

 

electrotopological index of Kier and Hall and molecular electrostatic potential.

VolSurf

The VolSurf56,57 approach is based on probing the grid around the molecule with specific probes, for e.g. hydrophobic interaction or hydrogen bond acceptor or donor groups. The resulting lattice boxes are used to compute the descriptors relying on volumes or surfaces of 3D contours, defined by the same value of the probe molecule interaction energy. By using various probes and cut-off values for the energy, different molecular properties can be quantified. These include e.g. molecular volume and surface, hydrophobic and hydrophilic regions. Derivative quantities, e.g. molecular globularity or factors relating the surface of hydrophobic and hydrophilic regions to surface of the whole molecule can also be computed. In addition various geometry-based descriptors are also available including energy minima distances or amphiphilic moments.

Grid-Independent Descriptors

The Grid-Independent Descriptors (GRIND)58 have been devized to overcome the problems with interpretability common in alignment-independent descriptors. Similarly to VolSurf, it utilizes probing of the grid with specific probes. The regions showing the most favorable energies of interaction are selected provided that the distances between the regions are large. Next, the probe-based energies are encoded in a way independent of the molecule’s arrangement. To this end the distances between the nodes in the grid are discretized into a set of bins. For each distance bin, the nodes with the highest product of energies are stored and the value of the product serves as the numerical descriptors. In addition, the stored information on the position of the nodes can be used to track down the exact regions of the molecule relating to the given property. To extend the molecular information captured by the descriptors, the product of node energies may include not only energies relating to the same probe but also from two different probe types.

2D Versus 3D QSAR Approach It is generally assumed that 3D approaches are superior to 2D in drug design. Yet, studies show such an assumption may not always hold. For example, the results of conventional CoMFA may often be non-reproducible due to dependence of the outputs quality on the orientation of the rigidly aligned molecules on user’s terminal59,60. Such alignment problems are typical in 3D approaches and even though some solutions have been proposed the unambiguous 3D alignment of structurally diverse molecules still remains a difficult task. Moreover, the distinction between 2D and 3D QSAR approaches is not a

crisp one, especially when alignment-independent descriptors are considered. This can be observed when comparing the BCUT with the WHIM descriptors. Both employ a similar algebraic method, i.e. solving an eigen problem for a matrix describing the compound - the connectivity matrix in case of BCUT descriptors and covariance matrix of 3D co-ordinates in case of WHIM.

There is also a deeper connection between 3D QSAR and one of 2D methods, the topological approach. It stems from the fact that the geometry of a compound in many cases depends on its topology. An elegant example was provided by Estrada et al., who demonstrated that the dihedral angles of biphenyl as a function of the substituents attached to it can be predicted by topological indices61. Along the same line, a supposedly typically 3D property, chirality has been predicted using chiral topological indices62, constructed by introducing an adequate weight into the topological matrix for the chiral carbons.

Automatic Selection of Relevant Molecular Descriptors Automatic methods for selecting the best descriptors or features to be used in construction of the QSAR model fall into two categories63. In the wrapper approach, the quality of descriptor subsets is obtained from constructing and evaluating a series of QSAR models. In filtering, no model is build and features are evaluated using some other criteria.

Filtering Methods These techniques are applied independently of the mapping method used. They are executed prior to the mapping to reduce the number of descriptors following some objective criteria, e.g. inter-descriptor correlation.

Correlation-Based Methods

Pearson’s correlation coefficients may serve as a preliminary filter for discarding intercorrelated descriptors. This can be done by e.g. creating clusters of descriptors having correlation coefficients higher than certain threshold and retaining only one, randomly chosen member of each cluster64. Another procedure involves estimating correlation between pair of descriptors and if it exceed a threshold, randomly discarding one of the descriptors65. The choice of the ordering in which pairs are evaluated may lead to significantly different results. One popular method is to first rank the descriptors by using some criterion and then iteratively browse the set starting from pairs containing the highest-ranking features.

One such ranking may be the correlation ranking, based on correlation coefficient between activity and descriptors. However, correlation ranking is usually used in conjunction

Page 9: A Review on Computational Methods in Developing Quantitative

  Navdeep Singh Sethi: A Review on Computational Methods in Developing Quantitative Structure-Activity…  823  

 

with principal component analysis66,67. Methods using measures of correlation activity and descriptors other than Pearson’s have been used, notably the pair-correlation method68-70.

Methods Based on Information Theory

Information content of the descriptor is defined in terms of entropy of descriptor treated as a random variable. Based on this notion, various measures relating the information shared between two descriptors or between descriptor and the activity can be defined. An example of such measure used in descriptor selection for QSAR is the mutual information. The mutual information sometimes referred to as information gain, quantifies the reduction of uncertainty or information content of activity variable by knowing the descriptor values. It is used in QSAR to rank the descriptors71,72.

The application of information-theoretic criteria is straight forward when both the descriptors and activity values are categorical. In case of continuous numerical variables, some discretization schemes have to be applied to approximate the variables. Thus, such criteria are usually used with binary descriptors.

Statistical Criteria

The fisher’s ratio, i.e. ratio of the between class variance to the within class variance can be used to rank the descriptors73. Next, the correlation between pairs of features is used to reduce the set of descriptors. Another method used in assessing the quality of a descriptor is based on the Kolmogorov-Smirnov statistics74. As applied to descriptor selection in QSAR75, it is a fast method not relying on the knowledge of the underlying distribution and not requiring the conversion of variables descriptors into categorical values. For two classes of activity to be predicted, the method measures the maximal absolute distance between cumulative distribution functions of the descriptor for individual activity classes.

Wrapper Methods These techniques operate in a conjunction with a mapping algorithm76. The choice of best subset of descriptors is guided by the error of the mapping algorithm for a given subset measured e.g. with cross-validation. The schematic illustration of wrapper methods is given in Figure 5.

Iteratively, various configurations of selected and discarded descriptors are evaluated by creating a descriptors-to-activity mapping and assessing its prediction accuracy. The final descriptors are those yielding the highest accuracy for a given family of mapping functions.

Fig. 5. Generic scheme for wrapper descriptor selection methods.

Page 10: A Review on Computational Methods in Developing Quantitative

824 International Journal of Drug Design and Discovery Volume 3 • Issue 3 • July – September 2012 

 

Genetic Algorithm

The Genetic Algorithms (GA) is efficient method for function minimization. In descriptor selection context, the prediction error of the model built upon a set of features is optimized77,78. The genetic algorithm mimics the natural evaluation by modeling a dynamic population of solutions. The members of the population referred to as chromosomes, encode the selected features. The encoding usually takes form of bit strings with bits corresponding to selected features set others cleared. Each chromosomes leads to a model built using the encoded features. By using the training data, the error of the model is quantified and serves as a fitness function. During the course of evolution, the chromosomes are subjected to crossover and mutation. By allowing survival and reproduction of the fittest chromosomes, the algorithm effectively minimizes the error function in subsequent generations.

The success of GA depends on several factors. The parameters steering the crossover, mutation and survival of chromosomes should be carefully chosen to allow the population to explore the solution space and to prevent early convergence to homogeneous population occupying a local minimum. The choice of initial population is also important in genetic feature selection. To address this issue, e.g. a method based on Shannon’s entropy combined with graph analysis can be used79. Genetic Algorithm have been used in feature selection for QSAR with a range of mapping methods, e.g. Artificial Neural Networks80,81, k-Nearest Neighbor method82 and Random Forest65.

Simulated Annealing

Simulated Annealing (SA) is another stochastic method for function optimization employed in QSAR65,83,84. As in the evolutionary approach, the function minimized represents the error of the model built using the subset of descriptors. The SA algorithm operates iteratively by binding a new subset of descriptors by altering the current-best one, e.g. by exchanging some percentage of the features. Next, SA evaluates prediction error of the new subset and makes the choice whether to adopt the new solution as the current optimal solution. This decision depends on whether the new solution leads to lower error then the current one. If so, the new solution is used. However, in other case the solution is not automatically discarded. With a given probability based on the Boltzmann distribution the worse solution can replace the current, better one.

Replacing the solution with a worse one allows the SA method to escape from local minima of the error function, i.e. solution that cannot be made better without traversing through less-fitted feature subsets. The power of SA method stems from altering the temperature term in the Boltzmann distribution. At an early stage when the solution

is not yet highly optimized and mostly prone to encounter local minima, the temperature is high. During the course of algorithm, the temperature is lowered and acceptance of worse solution is less likely. Thus, even if the obtained minimum is not global it is nonetheless usually of high quality.

Sequential Feature Forward Selection

While genetic algorithm and simulated annealing rely on guided random process of exploring the space of feature subsets, Forward Feature Selection85 operates in a deterministic manner. It implements a greedy search throughout the feature subsets. As a first step, a single feature that leads to best prediction is selected. Next, sequentially each feature is individually added to the current subset and the errors of resulting models are quantified. The feature that is the best in reducing the error is incorporated into the subset. Thus, in each step a single best feature is added resulting in a sequence of nested subsets of features. The procedure stops when a specified number of features are selected. More elaborate stopping conditions are also proposed, e.g. based on incorporating an artificial random feature86. When this feature is to be selected as the one that improves the best quality of the model, the procedure is stopped. The drawback of forward selection is that if several feature collectively are good predictors but alone each is a poor prediction, none of the features may be chosen. The recursive feature forward selection has been used in several QSAR studies64, 79, 87, 88.

Sequential Backward Feature Elimination

The Backward Feature Elimination85 is another example of a greedy sequential method that yields nested subsets of features. In contrast to forward selection, the full set of features is used as a starting point, next, in each step all subsets of features resulting from removal of a single feature are analyzed for the prediction error. The feature that leads to a model with highest error is removed from the current subsets. The procedure stops when the given numbers of features are dropped.

Backward elimination is slower than forward selection, yet often leads to better results. Recently a significantly faster variant of backward elimination, the Recursive Feature Elimination89 method has been proposed for Support Vector Machines (SVM). In this method, the feature to be removed is chosen based on a single execution of the learning method using all features remaining in the given iteration. The SVM allows for ranking the features according to their contribution to the result. Thus, the least contributing feature can be dropped to form a new narrowed subset of features. There is no need to train SVMs for each subset as in original feature elimination

Page 11: A Review on Computational Methods in Developing Quantitative

  Navdeep Singh Sethi: A Review on Computational Methods in Developing Quantitative Structure-Activity…  825  

 

method. Variants of backward feature elimination method have been used in numerous QSAR studies90-94.

Hybrid Methods In addition to the purely filter or wrapper-based descriptor selection procedures, QSAR studies utilize the fusion of the two approaches. A rapid objective method is used as a preliminary filter to narrow the feature set, next, one of the more accurate but slower subjective method is employed. As an example of such a combination of techniques, the correlation-based test significantly reducing the number of features followed by genetic algorithm or simulated annealing can be used65. A similar procedure which uses a greedy sequential feature forward selection is also in use64. The feature selection can also be implicit in some mapping methods. For example, the Decision Tree (see section 6.2.4) utilizes only a subset of features in the decision process if a single or only a few descriptors are tested at each node and the overall number of features exceeds the number of those used in the nodes. Similarly, ensembles of Decision Stumps (see section 6.2.6) also operate on reduced number of descriptors if the number of members is ensemble is smaller than the number of features.

Mapping the Molecular Structure to Activity Given the selected descriptors, the final step in building the QSAR model is to drive the mapping between the activity and the values of the features. Simple, yet useful methods model the activity as a linear function of the descriptors. Other non-linear methods extend this approach to more complex relations.

Another important division of the mapping methods is based on the nature of the activity variable. In case of predicting a continuous value a regression problem is encountered. When only some categories of classes of the activity need to be predicted, e.g. portioning compounds into active and inactive, the classification problem occurs. In regression the dependent variable is modeled as a

function of the descriptors. In classification framework, the resulting model is defined by decision boundary separating the classes in the descriptor space. The approaches to QSAR mapping is depicted in Figure 6.

Linear Models Linear models have been the bases of QSAR analysis since it’s beginning. They predict the activity as a linear function of molecular descriptors. In general, linear models are easily interpretable and sufficiently accurate for small datasets of similar compounds especially when the descriptors are carefully selected for a given activity.

Multiple Linear Regression Multiple Linear Regression (MLR) models the activity to be predicted as a linear function of all descriptors. Based on the examples from the training set, the coefficients of the function are estimated. These free parameters are chosen to minimize the squares of the errors between the predicted and the actual activity. The main restriction of MLR analysis is the case of large descriptors-to-compounds ratio or multicollinear descriptors in general. This makes the problem ill-conditioned and makes the results unstable. Multiple linear regression is among the most widely used mapping method in QSAR in last decades. It has been employed in conjunction with genetic description selection for modeling GABAA receptor binding, antimalaric activity, HIV-1 protease inhibition and glycogen phosphorylase inhibition95 exhibiting lower cross-validation error than partial least square, both using 4D QSAR fingerprints. MLR has been applied to models in predictive toxicology96,97, Caco-2-permeabbility98 and aqueous solubility99. In prediction of physicochemical properties88,93

and of COX-2 inhibition100, MLR proved significantly worse than non-linear support vector machine, yet comparable or only slightly inferior to neural network. However, in studies of logP101 it proved worse than other models including multi-layer perceptron and Decision Tree.

(a) Linear regression with activity as a function of two descriptors d1 and d2 ; (b) Binary classification with linear decision boundary between classes of active (+) and inactive (-) compounds; (c) Non-linear regression; (d) Non-linear binary classification.

Fig. 6. Approaches to QSAR mapping.

Page 12: A Review on Computational Methods in Developing Quantitative

826 International Journal of Drug Design and Discovery Volume 3 • Issue 3 • July – September 2012 

 

Partial Least Squares

Partial Least Squares (PLS)102-104 linear regression is a method suitable for overcoming the problem in MLR related to multicollinear or over-abundant descriptors. The technique assumes that despite the large number of descriptors the modeled process is governed by a relatively small number of latent independent variables. The PLS tries to indirectly obtain knowledge on the latent variables by decomposing the input matrix of descriptors into two components, the scores and the loadings. The scores are orthogonal and while being able to capture the descriptor information allow also for good prediction of the activity. The estimation of score vectors is done iteratively. The first one is derived using the first eigenvector of the activity-descriptor combined variance-covariance matrix. Next, the descriptor matrix is deflated by subtracting the information explained by the first score vector. The resulting matrix is used in the derivation of the second score vector which followed by consecutive deflation closes the iteration loop. In each iteration step, the coefficient relating the score vector to the activity is also determined.

PLS has been used successfully with 3D QSAR and HQSAR, e.g. in a study of nicotinic acetylocholine receptors binding modeling105 and estrogen receptor binding106. It has also been used in a study involving several dataset, including blood-brain barrier permeability, toxicity, P-glycoprotein transport, multidrug resistance reversal and log D showing results better than decision trees in most ensembles and SVM. PLS regression has been tested in prediction of COX-2 inhibition82, but lower accuracy than neural network and decision tree. However, in a study of solubility prediction107 PLS outperformed a neural network. Studies report PLS-based models in melting point and log P prediction108. Finally, PLS models for BBB permeability109,110, mutagenicity, toxicity, tumor growth inhibition and anti-HIV activity111 and aqueous solubility108,109have been created.

Linear Discriminant Analysis

Linear Discriminant Analysis (LDA)112is a classification method that creates a linear transformation of the original feature space into a space, which maximizes the interclass separability and minimizes the within-class variance. The procedure operates by solving a generalized eigenvalue problem based on the between-class and within-class covariance matrices. Thus, the number of feature has to be significantly smaller than the number of observations to avoid ill-conditioning of the eigenvalue problem. Executing principal component analysis to reduce the dimension of the input data may be employed prior to applying LDA to overcome this problem.

LDA has been used to create QSAR models e.g. for prediction of model validity for new compounds113 where it fared better than PLS but worse than non-linear neural network. However, in BBB permeability prediction, LDA exhibited lower accuracy than PLS-based method. In predicting antibacterial activity114,115 it performed worse than neural network. LDA was also used to predict drug likeness116 showing results slightly better than linear programming machine, a method similar to linear SVM. However, it yielded results worse than non-linear SVM and bagging ensembles. In ecotoxicity prediction117, LDA performed better than other linear methods and k-NN but inferior to decision tree.

Non-Linear Models Non-linear models extend the structure-activity relationships to non-linear function of input descriptors. Such models may become more accurate, especially for large and diverse datasets. However, usually they are harder to interpret. Complex non-linear may also fall prey to over fitting118, i.e. low generalization to compounds unseen during training.

Bayes Classifier

The Bayes Classifier stems from the Bayes rule relating the posterior probability of a class to its overall probability of the observations and the likelihood of a class with respect to observed variables. In Bayes rule, the class minimizing the posterior probability is chosen as the prediction result. However, in real problems the likelihoods are not known and have to be estimated. Yet, given a finite number of training examples such estimation is not trivial. One method to approach this problem is to make an assumption of independence of likelihoods of class with respect to different descriptors. This leads to the Naïve Bayes Classifier (NBC). For typical datasets the estimation of likelihoods with respect to single variables is feasible. The drawback of this method is that independence assumption usually does not hold.

An extensive study using Naïve Bayes Classifier in comparison with other methods was conducted119 using numerous endpoints including COX-2, CDK-2, BBB, dopamine, logD, P-glycoprotein, toxicity and multidrug resistance reversal. In most cases NBC was inferior to other methods, however it outperformed PLS for BBB and CDK-2, k-NN for P-glycoprotein and COX-2 and decision trees for BBB and P-glycoprotein. However, NBC has been shown useful in modeling the inhibition of HIV-1 protease120.

The k-Nearest Neighbor Method

The k-Nearest Neighbor (k-NN)121 is a simple decision scheme that requires practically no training and is asymptotically optimal, i.e. with increase in training data

Page 13: A Review on Computational Methods in Developing Quantitative

  Navdeep Singh Sethi: A Review on Computational Methods in Developing Quantitative Structure-Activity…  827  

 

converges to the optimal prediction error. For a given compound in the descriptor space, the method analyzes its k-nearest neighboring compounds from the training set and predicts the activity class that is most highly represented among these neighbors. The k-NN scheme is sensitive to the choice of metric and to the number of training compounds available. Also the number of neighbors analyzed can be optimized to yield best results. The k-nearest neighbors scheme have been used e.g. for predicting COX-2 inhibition82 where it showed accuracy higher than PLS and similar to neural network. In a study on P-glycoprotein transport activity91, k-NN performed comparably to decision tree but worse than neural network and SVM. In ecotoxicity QSAR117, k-NN was better than some linear methods but inferior to discriminant analysis and decision trees.

Artificial Neural Networks

Artificial Neural Networks (ANN)122 is biologically inspired prediction methods based on the architecture of a network of neurons. A wide range of specific models based on this paradigm have been analyzed in literature with perceptron-based and radial-basis functioned based ones prevailing. These two methods both fall into the category of feed-forward networks in which during the prediction the information flows only in direction from the input descriptors, through a set of layers to the output of the network.

Multi-layer perceptron

The Multi-Layer Perceptron (MLP) model consists of a layered network of interconnected perceptrons, i.e. simple models of a neuron123. Each perceptron is capable of making a linear combination of its input values and by means of a certain transfer function produce a binary or continuous output. A noteworthy fact is that each input of the perceptron has an adaptive weight specifying the importance of the input. In training of a single perceptron, the inputs of the perceptron are formed by the molecular descriptors while the output should predict the activity of the compound. To achieve this goal the perceptron is trained by adjusting the weights to produce a linear combination of the descriptors that optimally predicts the activity. The adjusting process relies on the feedback from comparing the predicted with the expected output. That is, the error in the prediction is propagated to the weights of the descriptors altering them in the direction that counters the error. While a single perceptron is a linear model, a network consisting of layers of perceptrons with output of one layer connected to input of neurons in consecutive layer allows for non-linear prediction124. Multi-layer networks contain a single input layer which consists simply of the values of molecular descriptors, one or more hidden layers which process the descriptors into internal representations and an output layer utilizing the internal representation to produce the final prediction. This architecture is depicted in Figure 7.

Each neuron realizes an inner product of weights wif with its input, extended with a bias term. A binary step transfer function is used in each neuron following the calculation of the inner product.

Fig. 7. Multi-Layer Perceptron neural network with two hidden layers, single output neuron and four descriptors as input.

Page 14: A Review on Computational Methods in Developing Quantitative

828 International Journal of Drug Design and Discovery Volume 3 • Issue 3 • July – September 2012 

 

In multi-layer networks, training, i.e. the adjustment of the weights becomes non-trivial. Apart from the output layer, the feedback information is no longer directly available to adjust the weights of the neuron inputs in the hidden layers. One popular method to overcome this problem is the propagation of error method. The weights of inputs in the output layer neurons are adjusted based on the error as in single perceptron. Then, the information of the error propagates from the output layer neurons to the neurons in the preceding layer proportionally to the weight of the link between given hidden neuron output and the input of the output layer neuron. It is then used to adjust the weights of the inputs of the neurons in the hidden layer. The contribution to the overall error propagates backwards through all hidden layers until the weights of the first hidden layer are adjusted. In general, any given function can be approximated using neural network that is sufficiently large both in terms of number of layers and number of neurons in a layer125. However, a big network can overfit if given a finite training set. Thus, the choice of number of layers and neurons is essential in constructing the networks. Usually, networks consisting of only one hidden layer are used. Another aspect important in the construction of the neural network is the choice of the exact form of the training rule, i.e. the function relating the update of weights with the error of prediction. The most popular is the delta rule; the weight change is proportional to the difference of predicted and expected output and to the input value. The proportionality constant determines the learning rate, i.e. the magnitude of steps used in adjusting the weights. Too large learning rates may prevent the convergence of training to the minimal error, while too small rates increase the computational overhead. Variables decreasing with time learning rate may also be used. The next choice in constructing the MLP is the selection of the neuron transfer function, i.e. the function relating the product of input and weights vectors with the output. Typically, a sigmoid function is used. Finally, the initial values of weights of the links between neurons have to be set. General consensus is that small magnitude random numbers should be used.

The MLP neural networks has shown their usefulness in a wide range of QSAR application where linear models often fail. In a human intestinal absorption study81, MLP outperformed the MLR model. However, single MLP networks have been shown inferior to ensembles of such networks in prediction of antifilarial activity, GABAA receptor binding and inhibition of dihydrofolate reductase126. In an antibacterial activity study114, MLP performed better than LDA. This type of networks has also been applied to prediction of logP101, faring better than linear regression and comparable to decision trees. Modeling of the HIV reverse transcriptase inhibitors and E.

Coli dihydrofolatereductase inhibitors127 constructed with MLP were better than those relying on MLR or PLS. MLP has also been employed to predict carcinogenic activity80, aquatic toxicity128, antimalarial activity and binding affinities to platelet derived growth factor receptor129.

Radial-Basis Function Neural Networks

The Radial-basis Function (RBF)130 neural networks consist of input layer, a single hidden layer and an output layer. Contrary to MLP-ANNs, the neurons in hidden layer do not compute their output based on the product of the weights and the input values. Each hidden layer neuron is defined by its center, i.e. a point in the feature space of descriptors. The output of the neuron is calculated as the function of the distance between the input compound in the descriptor space and the point constituting the neuron. Typically, the Gaussian function is used, although in principal some other function of the distance may be applied. The output neuron is of perceptron type having the output as a transfer function of the product of output values of RBF neurons and their weights. Several parameters are adjusted during the training of the RBF network. A number of RBF neurons has to be created and their centers and scaling of distance, i.e. widths defined. In case of Gaussian function, these parameters correspond to the mean and standard deviation respectively. The simplest approach is to create as many neurons as the compounds in the training set and set the centers to the coordinates of the given examples. Clustering of the training set into a number of groups, e.g. by k-means method and using the group centers and widths can be used alternatively. The orthogonal least squares131 method can also be employed. This procedure selects a subset of training examples to be used as centers by sequentially adding examples which best contribute to the explanation of variance of the activity to be predicted. Once all RBF neurons have been trained, the weights of the connection between them and the output neuron have to be established. This can be done by analytical procedure. The RBF-ANNs have been used in prediction of COX-2 inhibition100, yielding error lower than MLR but higher than SVM. In prediction of physicochemical properties, such as O-H bond dissociation energy in substituted phenols88 and capillary electrophoresis absolute mobilities93 RBF-ANNs showed accuracy worse than non-linear SVM but sometimes better than linear SVM.

Decision Trees

Decision Trees (DT)132,133 differ from most statistical-based classification and regression algorithms by their connection to logic-based and expert systems. In fact, each classification tree can be translated into a set of predictive rules in Boolean logic. The DT classification model consists of a tree like structure consisting of nodes and links. Nodes are linked hierarchically with several child nodes branching from a common parent node. A node with

Page 15: A Review on Computational Methods in Developing Quantitative

  Navdeep Singh Sethi: A Review on Computational Methods in Developing Quantitative Structure-Activity…  829  

 

no children nodes is called a leaf. Typically in each node, a test using a single descriptor is made. Based on the result of the test, the algorithm is directed to one of the child nodes branching from the parent. In the child node, another test is performed and further traversal of the tree towards the leafs is carried out. The final decision is based on the activity class associated with the leaf. Thus, the whole decision process is based on the series of simple tests with result guiding the path from the root of the tree to a leaf as depicted in Figure 8.

The traversal of the tree on the path from the root to a leaf node defining the final decision leads through a set of simple tests using the values of molecular descriptors.

Fig. 8. Classification using decision tree based on three molecular descriptors.

The training of the model consists of incremental addition of nodes. The process starts with choosing the test for the root node. A test that optimally separates the compounds into the appropriate activity classes is chosen. If such a test perfectly discriminates the classes no further nodes are necessary. However, in most cases a single test is not sufficient. A group of compounds corresponding to one outcome of the test contains examples from different activity classes. Thus, iterative procedure is utilized starting with root node as a current node.

At each step, a current node is examined for meeting the criteria of being a leaf. A node may become a leaf if compounds directed to it by the traversal from the root fall into a single activity category or at least one category forms a clear majority. Otherwise, if the compounds are distributed between several classes the optimal discriminating criterion is selected. The results of discrimination form child nodes linked to the current node. Since the decision in the current node introduces new discriminatory information, the subsets of compounds on the child nodes are more homogeneously corresponding to single classes. Thus, after several nodes splitting operations, the leafs can be created. Upon creation of child nodes, they are added to the list of nodes waiting to be assessed and a first node from such a list is chosen as a

next current node for evaluation. One should note that tests carried out in nodes at the same level of the tree may be different in the different branches of the tree following the different class distribution at the nodes. There are several considerations in the development of decision trees. First, the method for choosing the test to be performed on the node is necessary. As the test usually involves values of a single descriptor, the descriptor ranking criteria may be employed. Once the descriptor is chosen, the decision rule must be introduced e.g. a threshold that separates the compounds from two activity classes. The DT method may lead to over fitting on the training set if the tree is allowed to grow until the nodes consist purely of one class. Thus, early stopping is used once the nodes are sufficiently pure. Moreover, pruning of the constructed tree134, i.e. removal of some overspecialized leafs may be introduced to increase the generalization capabilities of the tree133.

In general, DT methods usually offer suboptimal error rates compared to other non-linear methods in particular due to the reliance on single feature in each node. Nonetheless, they are popular in QSAR domain for their ease of interpretability. The tree effectively combines the training process with descriptor selection, limiting the complexity of the model to be analyzed. Furthermore, since several leafs in the tree may correspond to a single activity class, they allow for inspection of different paths leading to the same activity. The decision trees also handle regression problems. This is done by associating each leaf with a numerical value instead of the categorical class. Furthermore, the criteria of splitting the node to form child nodes are based on the variance of the compounds in that node. Decision trees have been tested in a study119 on a wide range of targets including COX-2 inhibition, blood brain barrier permeability, toxicity, multi-drug resistance reversal, CDK-2 antagonist activity, dopamine binding affinity, log D and binding to an unspecified channel protein. It performed worse than SVM and ensembles of decision trees but often better than PLS or Naïve Bayes classifier. In a log P study101, it showed results comparable to MLP-ANNs and better than MLR. In a study concerning P-glycoprotein transport activity91, DT was worse than both SVM and neural networks but better than k-NN method. In various dataset related to ecotoxicity117, decision trees usually achieved lower error than LDA or k-NN methods. Other studies employing decision trees, including anti-HIV activity135, toxicity136 and oral absorption137 have been conducted.

Support Vector Machines The support Vector Machines (SVM)138,139 form a group of methods stemming from the structural risk minimization principle with the linear support vector classifier as its most basic member. The SVC aims at a creating a decision hyperplane that maximizes the margin, i.e. the distance from the hyperplane to the nearest examples from each of the classes. This allows for formulating the classifier training as a constrained optimization problem.

Page 16: A Review on Computational Methods in Developing Quantitative

830 International Journal of Drug Design and Discovery Volume 3 • Issue 3 • July – September 2012 

 

Importantly, the objective function is unimodal, contrary to e.g. neural networks and thus can be optimized effectively to global optimum. In the simplest case, compounds from different classes can be separated by linear hyperplane; such hyperplane is defined solely by its nearest compounds from the training set. Such compounds are referred to as support vectors, giving the name to the whole method. The core definitions behind the SVM classification are illustrated in Figure 9. In most cases, however no linear separation is possible. To take account of this problem slack variables are introduced. These variables are associated with the misclassified compounds and in conjunction with the margin are subject to optimization. Thus, even though the erroneous classification cannot be avoided, it is penalized. Since the misclassification of compounds strongly influences the decision hyperplane, the misclassified compounds also become support vectors. The SVC can be easily transformed into a non-linear classifier by employing the kernel function139,140. The kernel function introduces an implicit mapping from the original descriptor space to a high- or infinite-dimensional space. The linear hyperplane in the kernel space may be highly non-linear in the original space. Thus, by training a linear classifier in the kernel space the classifier which is non-linear with respect to descriptor space is obtained. The strength of the kernel function stems from the fact that while positions of compounds in the kernel space may not be explicitly computed, the dot product can be obtained easily. As the algorithm for the SVC uses the compound descriptors only within their dot products, this allows for computation of the decision boundary in the kernel space. One of two kernel functions are typically used, the polynomial kernel or the radial-basis unction kernel.

The SVC method has been shown to exhibit low over training and thus allows for good generalization to the

previously unseen compounds. It is also relatively robust when only a small number of examples of each class are available. The SVM methods have been extended into Support Vector Regression (SVR)141 to handle the regression problems. By using methods similar to SVC, e.g. the slack variables and the kernel functions accurate non-linear mapping between the activity and the descriptors can be found. However, contrary to typical regression methods the predicted values are penalized only if their absolute error exceeds certain user-specified threshold and thus the regression model is not optimal in terms of the least-square error. The SVM method has been shown to exhibit low prediction error in QSAR142. Studies of P-glycoprotein substrates used SVMs90,91 with results more accurate than neural networks, decision trees and k-NN. A study focused on prediction of drug likeness116, have shown lower prediction error for SVM than for bagging ensembles and for linear methods. In a study involving COX-2 inhibition and aquatic toxicity100, SVM outperformed MLR and RBF neural networks. An extensive study using support vector machines among other machine learning methods was conducted119 using a wide range of end points. In this study, SVM was usually better than k-NN, decision trees and linear methods but slightly inferior to boosting and random forest ensembles. SVM has also been tested with ADME properties including modeling of human intestinal absorption79,90, binding affinities to human serum albumin92 and cytochrome P450 inhibition64. Properties such as O-H bond dissociation energy in substituted phenols88 and capillary electrophoresis absolute mobilities93 have been also studied using SVM which exhibited higher accuracy than linear regression and RBF neural networks. Other properties predicted with SVM included heat capacity94 and capacity factor (log k) of peptides in HPLC143.

In non-separable case, negative margins are encountered and their magnitude is subject to optimization along with the magnitude of the positive margins.

Fig. 9. Support vectors and margins in linearly separable (a) and non-separable (b) problems.

Page 17: A Review on Computational Methods in Developing Quantitative

  Navdeep Singh Sethi: A Review on Computational Methods in Developing Quantitative Structure-Activity…  831  

 

Ensemble Methods

Traditional approach to QSAR analysis focused on constructing a single predictive model. Recently, methods utilizing a combination or ensemble144 of models for improving the prediction have been proposed. A small ensemble of three classifiers is depicted in Figure 10.

Bagging

The Bagging method145 focuses on improving the quality of prediction by creating a set of base models constructed using the same algorithm, yet with varying training set. Before training of each of base models, the original training set is subject to sampling with replacement. This leads to a group of bootstrap replicas of the original training set. The

decisions of the models trained on each replica are averaged to create the final result. The strength of the bagging method stems from its ability to stabilize the classifier by learning on different samples of the original distribution. In QSAR, Bagging and other similar ensembles were used with multi-layer perceptron for prediction of antifilarial activity, binding affinities to GABAA receptors and inhibitors of dihydrofolate reductase126 and yielding lower error than single MLP neural network. The k-NN and decision trees were used as base method in bagging for prediction of drug likeness116 showing results slightly worse than SVM but better than LDA.

Fig. 10. Individual decisions of three binary classifiers and the resulting classifier ensemble with more accurate decision boundary.

Page 18: A Review on Computational Methods in Developing Quantitative

832 International Journal of Drug Design and Discovery Volume 3 • Issue 3 • July – September 2012 

 

Random Subspace Method

The Random Subspace Method146 is another example of ensemble scheme aiming at stabilizing the base model. In training of the models, the whole training set is used. However, to enforce diversity only randomly generated subsets of descriptors are used. The most notable example of random subspace approach is the Random forest method147, which uses decision trees as base models. Such a method was recently proposed for use in QSAR in an extensive study including inhibitors of COX-2, blood brain barrier permeability, estrogen receptor binding activity, multi-drug resistance reversal, dopamine binding affinity and P-glycoprotein transport activity148, yielding better results than single decision tree and then PLS. In an extended study by the same team119 it fares better than SVM and k-NN.

Boosting

A more elaborate ensemble scheme is introduced in the Boosting algorithm149,150. In training the base models, it uses all the descriptors and all the training compounds. However, for each compound a weight is defined. Initially the weights were uniform. After training a base model, its prediction error is evaluated and weights of incorrectly predicted compounds are increased. This focuses the next base model on previously misclassified examples, even at the cost of making errors for those correctly classified previously. Thus, the compounds with activity hardest to predict obtain more attention from the ensemble model. The advantage of boosting compared to other ensemble methods is that it allows for use of relatively simple and erroneous base models. Similar to SVM classifier, the power of boosting stems from its ability to create decision boundaries maximizing the margin151.

In QSAR, the boosting method employing decision tree as base model has been recently shown useful in modeling the COX-2 inhibition, estrogen and dopamine receptors binding, multi-drug resistance reversal, CDK-2 antagonist activity, BBB permeability, log D and P-glycoprotein transport activity119, showing lower prediction error than k-NN, SVM, PLS, decision tree and Naïve Bayes classifier in all cases but one, the P-glycoprotein dataset. In comparison with Random Forest Method, it was better for several datasets but worse for other datasets. In other study, a simple decision stump method was used in boosting for human intestinal absorption prediction.

Conclusion This chemoinformatic methods underlying QSAR analysis are in constant advancement. Well established techniques continue to be used providing successful results especially in small homogeneous datasets consisting of compounds

relating to a single mode of action. These techniques include linear methods such as partial least squares. Classical non-linear methods e.g. artificial neural networks also remain popular. However, the need for rapid accurate assessment of large set of compounds is shifting the attention to novel techniques from pattern recognition and machine learning fields.

Two such methods, both relying on the concept of margin maximization have recently gained attention from the QSAR community. These are the support vector machines and ensemble techniques. Recent studies have shown that both yield small prediction error in numerous QSAR applications. Given the complexity of these methods, one may be tempted to treat them as black boxes. Yet, as recently shown only careful model selection and tuning allows for optimal prediction accuracy116. Thus, adoption of novel methods should be preceded by extensive studies. Moreover, within the machine learning and pattern recognition fields the interpretability of the model is usually not of utmost importance. Thus, the process of adopting emerging techniques from these fields may require substantial effort to develop methods for interpreting the created models.

A similar situation can be encountered in preparing the molecular descriptors to be used. The number of different descriptors reaches thousand in some leading commercial tools. Having at hand powerful methods for automatically selecting the informative features, one may be tempted to leave the descriptor selection process entirely to algorithmic techniques. While this may lead to high accuracy of the model, often the chosen descriptors may not give clear insight into the structure-activity relationship.

Throughout the review, we have focused on predicting the activity or property given the values of the descriptors for a compound. The inverse problem of finding compounds with desired activity and properties has also attracted attention. Such an inverse-QSAR formulation directly focuses on the goal of drug design, i.e. discovery of active compounds with good pharmacokinetic and other properties. Authors such as Kvasnicka152 and Lewis14 have published some algorithm in this direction. Significant insight has also been given by Kier and Hall153 and Zefirov154. Galvez and co-workers155 have shown them topological indices are particularly suited to this aim. The reason is that whereas the conventional physical and geometrical descriptors are structure-related, topological indices are just an algebraic description of the structure itself. Thus, one can go backward and forward between structure and property predicting properties for molecules and vice-versa.

Since the methods for solving the inverse problem are not yet widely adopted, the creation of QSAR models

Page 19: A Review on Computational Methods in Developing Quantitative

  Navdeep Singh Sethi: A Review on Computational Methods in Developing Quantitative Structure-Activity…  833  

 

remains the main task in computer-aided drug discovery. In general, the adoption of novel more accurate QSAR modeling techniques does not reduce the responsibility of the investigator. On the contrary, the more complex and optimized is the model, the more caution it requires during its application. Combined with the increased complexity of the inspected datasets, this makes the QSAR analysis a challenging endeavor.

Acknowledgement I sincerely thanks to Mr. Gaaminepreet Singh, Asst. Professor (Pharmacology) in Doaba Group of Colleges, for his advice, motivation and SardarJeet Singh (C. A. O, Doaba Group of Colleges) for providing me the opportunity and all necessary facilities to accomplish this endeavor successfully.

References [1] Topliss, J. G.; Costello, R. J. J. Med. Chem. 1972, 15,

1066-1068.

[2] Schultz, W. J.; Netzeva, T. I.; Cronin, M. T. D. SAR QSAR Environ. Res. 2003, 14, 59-81.

[3] Hansch, W.; Maloney, P. P.; Fujita, T. Nature 1962, 194, 178-180.

[4] Kubinyi, H. In Burger’s Medicinal Chemistry; Wolff, M. E., Ed.; John Wiley & Sons, 1995; Vol. 1, 5thedn, pp 497-571.

[5] Van de Waterbeemd, H. In The Practice of Medicinal Chemistry; Wermuth, C. G., Ed.; Academic Press, 1996; pp 367-389.

[6] Free, S. M.; Jr and Wilson, J. W. J. Med. Chem. 1964, 7, 395-399.

[7] Hansch, T.; Fujita, T. J. Am. Chem. Soc. 1964, 86, 1616-1626.

[8] Hodgson, J. Nat. Biotechnol. 2001, 19, 722-726.

[9] Van de Waterbeemd, H.; Gifford, E. Nat. Rev. Drug Discov. 2003, 2, 192-204.

[10] Proudfoot, J. R. Bioorg. Med. Chem. Lett. 2002, 12, 1647-1650.

[11] Roche, O.; Schneider, P.; Zuegge, J.; Guba, W.; Kansy, M.; Alanine, A.; Bleicher, K.; Danel, F.; Rogers-Evans, M.; Neidhart, W.; Stalder, H.; Dillon, M.; Sjogren, E.; Fotouhi, N.; Gillepsie, P.; Goodnow, R.; Harris, W.; Jones, P., Tsujii, S. J. Med. Chem. 2002, 45, 137-142.

[12] Bajorath, J. Nat. Rev. Drug Discov. 2002, 1, 882-894.

[13] Rusinko, A.; Young, S. S.; Drewry, D. H.; Gerritz, S. W. Comb. Chem. High Throughput Screen. 2002, 5, 125-133.

[14] Lewis, R. A. J. Med. Chem. 2005, 48, 1638-1648.

[15] Bohm, H-J. J. Comput-Aided Mol. Design 1994, 8, 243-256.

[16] Bohm, H-J.; Klebe, J. Angew. Chem. 1996, 108, 2750-2778.

[17] Mulliken, R. S. J. Phys. Chem. 1955, 23, 1833-1840.

[18] Cammarata, A. J. Med. Chem. 1967, 10, 525-552.

[19] Stanton, D. T.; Egolf, L. M.; Jurs, P. C.; Hicks, M. G. J. Chem. Info. Comput. Sci. 1992, 32, 306-316.

[20] Klopman, G. J. Am. Chem. Soc. 1968, 90, 223-234.

[21] Zhou, Z.; Parr, R. G. J. Am. Chem. Soc. 1990. 112, 5720-5724.

[22] Wiener, H. J. Am. Chem. Soc. 1947, 69, 17-20.

[23] Randic, M. J. Am. Chem. Soc. 1975, 97, 6609-6615.

[24] Balaban, A. T. Chem. Phys. Lett. 1982, 89, 399-404.

[25] Schultz, H. P. J. Chem. Info. Comput. Sci. 1989, 29, 217-222.

[26] Kier, L. B.; Hall, L. H. J. Pharm. Sci. 1981, 70, 583-589.

[27] Galvez, J.; Garcia, R.; Salabert, M. T.; Soler, R. J. Chem. Info. Comput. Sci. 1994, 34, 520-552.

[28] Pearlman, R. S.; Smith, K. Perspect. Drug Discov. Des. 1998, 9-11, 339-353.

[29] Stanton, D. T. J. Chem. Info. Comput. Sci.1999, 39, 11-20.

[30] Burden, F. J. Chem. Info. Comput. Sci.1989, 29, 225-227.

[31] Estrada, E. J. Chem. Info. Comput. Sci.1996, 36, 844-849.

[32] Estrada, E.; Uriarte, E. SAR QSAR Environ. Res. 2001, 12, 309-324.

[33] Hall, L. H.; Kier, L. B. Quant. Struct-Act. Relat. 1991, 10, 43-48.

[34] Hall, L. H.; Kier, L. B. J. Chem. Info. Comput. Sci. 2000, 30, 784-791.

[35] Labute, P. J. Mol. Graph. Model. 2000, 18, 464-477.

[36] Higo, J.; Go, N. J. Comput. Chem. 1989, 10, 376-379.

[37] Katritzky, A. R.; Mu, L.; Lobanov, V. S.; Karelson, M. J. Phys. Chem. 1996, 100, 10400-10407.

[38] Rohrbaugh, R. H.; Jurs, P. C. Anal. Chim. Acta 1987, 199, 99-107.

[39] Weiser, J.; Weiser, A. A.; Shenkin, P. S.; Still, W. C. J. Comp. Chem. 1998, 19, 797-808.

[40] Barnard, J. M.; Downs, G. M. J. Chem. Info. Comput. Sci.1997, 37, 141-142.

[41] Durant, J. L.; Leland, B. A.; Henry, D. R.; Nourse, J. G. J. Chem. Info. Comput. Sci. 2002, 42, 1273-1280.

Page 20: A Review on Computational Methods in Developing Quantitative

834 International Journal of Drug Design and Discovery Volume 3 • Issue 3 • July – September 2012 

 

[42] Waller, C. L.; Bradley, M. P. J. Chem. Info. Comput. Sci. 1999. 39, 345-354.

[43] Wrinkler, D.; Burden, F. R. Quant. Struct-Act. Relat. 1998, 17, 224-231.

[44] Maurer, W. D.; Lewis, T. G. ACM Comput. Surv. 1975, 7, 5-19.

[45] Deshpande, M.; Kuramochi, M.; Wale, N.; Karypis, G. IEEE Trans. on Knowledge and Data Eng. 2005, 17, 1036-1050.

[46] Guner, O. F. Curr. Top. Med. Chem. 2002, 2, 1321-1332.

[47] Akamatsu, M. Curr. Top. Med. Chem. 2002, 2, 1381-1394.

[48] Lemmen, C.; Lengauer, T. J. Comput.-Aided Mol. Des. 2000, 14, 215-232.

[49] Dove, S.; Buschauer, A. Quant. Struct-Act. Relat. 1999, 18, 329-341.

[50] Cramer, R. D.; Patterson, D. E.; Bunce, J. D. J. Am. Chem. Soc. 1988, 110, 5959-5967.

[51] Klebe, G.; Abraham, U.; Mietzner, T. J. Med. Chem. 1994, 37, 4130-4146.

[52] Silverman, B. D.; Platt, D. E. J. Med. Chem. 1996, 39, 2129-2140.

[53] Todeschini, R.; Lasagni, M.; Marengo, E. J. Chemom. 1994, 8, 262-272.

[54] Todeschini, R.; Gramatica, P. Perspect. Drug Discov. Des. 1998, 9-11, 355-380.

[55] Bravi, G.; Gancia, E.; Mascagni, P.; Pegna, M.; Todeschini, R.; Zaliani, A. J. Comput.-Aided Mol. Des. 1997, 11, 79-82.

[56] Cruciani, G.; Crivori, P.; Carrrupt, P. -A.; Testa, B. J. Mol. Struct. 2000, 503, 17-30.

[57] Crivori, P.; Cruciani, G.; Carrrupt, P. -A.; Testa, B. J. Med. Chem. 2000, 43, 2204-2216.

[58] Pastor, M.; Cruciani, G.; Mclay, I.; Pickett, S.; Clementi, S. J. Med. Chem. 2000, 43, 3233-3243.

[59] Cho, S. J.; Tropsha, A. J. Med. Chem. 1995, 38, 1060-1066.

[60] Cho, S. J.; Tropsha, A.; Suffness, M.; Cheng, Y. C.; Lee, K. H. J. Med. Chem. 1996, 39, 1383-1395.

[61] Estrada, E.; Molina, E.; Perdomo-Lopez, J. J. Chem. Inf. Comput. Sci. 2001, 41, 1015-1021.

[62] De Julian-Ortiz, J. V.; de Gregorio Alapont, C.; Rios-Santamarina, I.; Garcia-Domenech, R.; Galvez, J. J. Mol. Graphics Modell. 1998, 16, 14-18.

[63] Guyon, I.; Elisseeff, A. J. Mach. Learn. Res. 2003, 3, 1157-1182.

[64] Merkwirth, C.; Mauser, H.; Roche, O.; Stahl, M.; Lengauer, T. J. Chem. Inf. Comput. Sci. 2004, 44, 1971-1978.

[65] Guha, R.; Jurs, P. C. J. Chem. Inf. Comput. Sci. 2004, 44, 2179-2189.

[66] Gallegos, A.; Girone, X. J. Chem. Inf. Comput. Sci. 2004, 44, 321-326.

[67] Verdu-Andres, J.; Massart, D. L. Appl. Spectrosc. 1998, 52, 1425-1434.

[68] Farkas, O.; Heberger, K. J. Chem. Inf. Model. 2005, 45, 339-346.

[69] Heberger, K.; Rajko, R. J. Chemom. 2002, 16, 436-443.

[70] Rajko, R.; Heberger, K. Chemom. Intell. Lab. Syst. 2001, 57, 1-14.

[71] Liu, Y. J. Chem. Inf. Comput. Sci. 2004, 44, 1823-1828.

[72] Venkatraman, V.; Dalby, A. R.; Yang, Z. R. J. Chem. Inf. Comput. Sci. 2004, 44, 1686-1692.

[73] Lin, T, -H.; Li, H. –T.; Tsai, K. –C. J. Chem. Inf. Comput. Sci. 2004, 44, 76-87.

[74] Massey, F. J. J. Amer. Statistical Assoc. 1951, 46, 68-78.

[75] Byvatov, E.; Schneider, G. J. Chem. Inf. Comput. Sci. 2004, 44, 993-999.

[76] Kohavi, R.; John, G. Artiff. Intell. 1997, 97, 273-324.

[77] Siedlecki, W.; Sklansky, J. Int. J. Pattern Recog. Artiff. Intell. 1988, 2, 197-220.

[78] Siedlecki, W.; Sklansky, J. Pat. Rec. Lett. 1989, 10, 335-347.

[79] Wegner, J. K.; Frohlich, H.; Zell, A. J. Chem. Inf. Comput. Sci. 2004, 44, 921-930.

[80] Hemmateenejad, B.; Safarpour, M. A.; Miri, R.; Nesari, N. J. Chem. Inf. Model. 2005, 45, 195-199.

[81] Wessel, M. D.; Jurs, P. C.; Tolan, J. W.; Muskal, S. M. J. Chem. Inf. Comput. Sci. 1998, 38, 726-735.

[82] Baurin, N.; Mozziconacci, J. –C.; Arnoult, E.; Chavatte, P.; Marot, C.; Morin-Allory, L. J. Chem. Inf. Comput. Sci. 2004, 44, 276-285.

[83] Itskowitz, P.; Tropsha, A. J. Chem. Inf. Model. 2005, 45, 777-785.

[84] Sutter, J. M.; Dixon, S. L.; Jurs, P. C. J. Chem. Inf. Comput. Sci. 1995, 35, 77-84.

[85] Kittler, J. Pattern Recog. Signal Process. 1978, E29, 41-60.

[86] Bi, J.; Bennet, K.; Embrechts, M.; Breneman, C.; Song, M. J. Mach. Learn. Res. 2003, 3, 1229-1243.

[87] Wegner, J. K.; Frohlich, H.; Zell, A. J. Chem. Inf. Comput. Sci. 2004, 44, 931-939.

Page 21: A Review on Computational Methods in Developing Quantitative

  Navdeep Singh Sethi: A Review on Computational Methods in Developing Quantitative Structure-Activity…  835  

 

[88] Xue, C. X.; Zhang, R. S.; Liu, H. X.; Yao, X. J.; Liu, M. C.; Fan, B. T. J. Chem. Inf. Comput. Sci. 2004, 44, 669-677.

[89] Guyon, I.; Weston, J.; Barnhill, S.; Vapnik, V. Mach. Learn. 2002, 46, 389-422.

[90] Xue, Y.; Li, Z. R.; Yap, C. W.; Sun, L. Z.; Chen, X.; Chen, Y. Z. J. Chem. Inf. Comput. Sci. 2004, 44, 1630-1638.

[91] Xue, Y.; Yap, C. W.; Sun, L. Z.; Cao, Z. W.; Wang, J. F.; Chen, Y. Z. J. Chem. Inf. Comput. Sci. 2004, 44, 1497-1505.

[92] Xue, C. X.; Zhang, R. S.; Liu, H. X.; Yao, X. Z.; Liu, M. C.; Hu, Z. D. Fan, B. T. J. Chem. Inf. Comput. Sci. 2004, 44, 1693-1700.

[93] Xue, C. X.; Zhang, R. S.; Liu, M. C.; Hu, Z. D. Fan, B. T. J. Chem. Inf. Comput. Sci. 2004, 44, 950-957.

[94] Xue, C. X.; Zhang, R. S.; Liu, H. X.; Liu, M. C.; Hu, Z. D. Fan, B. T. J. Chem. Inf. Comput. Sci. 2004, 44, 1267-1274.

[95] Senese, C. L.; Duca, J.; Pan, D.; Hopfinger, A. J.; Tseng, Y. J. J. Chem. Inf. Comput. Sci. 2004, 44, 1526-1539.

[96] Trohalaki, S.; Pachter, R.; Geiss, K.; Frazier, J. J. Chem. Inf. Comput. Sci. 2004, 44, 1186-1192.

[97] Roy, K.; Ghosh, G. J. Chem. Inf. Comput. Sci. 2004, 44, 559-567.

[98] Hou, T. J.; Zhang, W.; Xia, K.; Qiao, X. B.; Xu, X. J. J. Chem. Inf. Comput. Sci. 2004, 44, 1585-1600.

[99] Hou, T. J.; Zhang, W.; Xia, K.; Qiao, X. B.; Xu, X. J. J. Chem. Inf. Comput. Sci. 2004, 44, 266-275.

[100] Yao, X. J.; Panaye, A.; Doucet, J. P.; Zhang, R. S.; Chen, H. F.; Liu, M. C.; Hu, Z. D. Fan, B. T. J. Chem. Inf. Comput. Sci. 2004, 44, 1257-1266.

[101] Tino, P.; Nabney, I. T.; Williams, B. S.; Losel, J.; Sun, Y. J. Chem. Inf. Comput. Sci. 2004, 44, 1647-1653.

[102] Wold, S.; Ruhe, A.; Wold. H.; Dunn, W. SIAM J. Sci. Stat. Comput. 1984, 5, 735-743.

[103] Wold, S.; Sjostrom, M.; Eriksson, L. Chemom. Intell. Lab. Syst. 2001, 58, 109-130.

[104] Phatak, A.; de Jong, S. J. Chemom. 1997, 11, 311-338.

[105] Zhang, H.; Li, H.; Liu, C. J. Chem. Inf. Model. 2005, 45, 440-448.

[106] Waller, C. L. J. Chem. Inf. Comput. Sci. 2004, 44, 758-765.

[107] Clark, M. J. Chem. Inf. Model. 2005, 45, 30-38.

[108] Catana, C.; Gao, H.; Orrenius, C.; Stouten, P. W. F. J. Chem. Inf. Model. 2005, 45, 170-176.

[109] Sun, H. J. Chem. Inf. Comput. Sci. 2004, 44, 748-757.

[110] Adenot, M.; Lahana, R. J. Chem. Inf. Comput. Sci. 2004, 44, 239-248.

[111] Feng, J.; Lurati, L.; Ouyang, H.; Robinson, T.; Wang, Y.; Yuan, S.; Young, S. S. J. Chem. Inf. Comput. Sci. 2003, 43, 1463-1470.

[112] Fisher, R. Ann. Eugen. 1936, 7, 179-188.

[113] Guha, R.; Jurs, P. C. J. Chem. Inf. Model. 2005, 45, 65-73.

[114] Murcia-Soler, M.; Perez-Gimenez, F.; Garcia-March, F. J.; Salabert-Salvador, T.; Castro-Bleda, M. J.; Villanueva-Pareja, A. J. Chem. Inf. Comput. Sci. 2004, 44, 1031-1041.

[115] Molina, E.; Diaz, H. G.; Gonzalez, M. P.; Rodriguez, E.; Uriarte, E. J. Chem. Inf. Comput. Sci. 2004, 44, 515-521.

[116] Mueller, K, -R.; Raetsch, G.; Sonnenburg, S.; Mika, S.; Grimm, M.; Heinrich, N. J. Chem. Inf. Model. 2005, 45, 249-253.

[117] Mazzatorta, P.; Benfenati, E.; Lorenzini, P.; Vighi, M. J. Chem. Inf. Comput. Sci. 2004, 44, 105-112.

[118] Hawkins, D. M. J. Chem. Inf. Comput. Sci. 2004, 44, 1-12.

[119] Svetnik, V.; Wang, T.; Tong, C.; Liaw, A.; Sheridan, R. P.; Song, Q. J. Chem. Inf. Model. 2005, 45, 786-799.

[120] Klon, A. E.; Glick, M.; Davies, J. W. J. Chem. Inf. Comput. Sci. 2004, 44, 2216-2224.

[121] Cover, T.; Hart, P. IEEE Trans. Inform. Theory 67, 13, 21-27.

[122] Jain, A.; Mao, J.; Mohiuddin, K. Computer 1996, 29, 31-44.

[123] Rosenblatt, F. Psychol. Rev. 1958, 65, 386-408.

[124] Gallant, S. IEEE Trans. Neural Networks 1990, 1, 179-191.

[125] Hornik, K.; Stinchcombe, M.; White, H. Neural Networks 1989, 2, 359-366.

[126] Agrafiotis, D. K.; Cedeno, W.; Lobanov, V. S. J. Chem. Inf. Comput. Sci. 2002, 42, 903-911.

[127] Chiu, T. –L.; So, S. –S. J. Chem. Inf. Comput. Sci. 2004, 44, 154-160.

[128] Gini, G.; Cracium, M. V.; Konig, C. J. Chem. Inf. Comput. Sci. 2004, 44, 1897-1902.

[129] Guha, R.; Jurs. P. C. J. Chem. Inf. Model. 2005, 45, 800-806.

[130] Mulgrew, B. IEEE Sig. Proc. Mag. 1996, 13, 50-65.

[131] Chen, S.; Cowan, C. F. N.; Grant, P. M. IEEE Trans. Neural Networks 1991, 2, 302-309.

[132] Quinlan, J. R. Mach. Learn. 1986, 1, 81-106.

[133] Gelfand, S.; Ravishankar, C.; Delp, E. IEEE Trans. Pattern Anal. Mach. Intell. 1991, 13, 163-174.

[134] Mingers, J. Mach. Learn. 1989, 4, 227-243.

Page 22: A Review on Computational Methods in Developing Quantitative

836 International Journal of Drug Design and Discovery Volume 3 • Issue 3 • July – September 2012 

 

[135] Daszykowski, M.; Walczak, B.; Xu, Q. –S.; Daeyaert, F.; de Jonge, M. R.; Heeres, J.; Koymans, L. M. H.; Lewi, P. J.; Janssen, P. A.; Massart, D. L. J. Chem. Inf. Comput. Sci. 2004, 44, 716-726.

[136] DeLisle, R. K.; Dixon, S. L. J. Chem. Inf. Comput. Sci. 2004, 44, 862-870.

[137] Bai, J. P. F.; Utis, A.; Crippen, G.; He, H. –D.; Fischer, V.; Tullman, R.; Yin, H. –Q.; Hsu, C. –P.; Jiang, L.; Hwang, K. –K. J. Chem. Inf. Comput. Sci. 2004, 44, 2061-2069.

[138] Cortes, C.; Vapnik, V. Mach. Learn. 1995, 20, 273-297.

[139] Burges, C. Data Min. Knowl. Discov. 1998, 2, 121-167.

[140] Boser, B. E.; Guyon, I. M.; Vapnik, V. N. In COLT ’92: Proceedings of the fifth annual workshop on computational learning theory; ACM Press; New York, NY, USA, 1992.

[141] Smola, A.; Schoelkopf, B. Stat. Comput. 2004, 14, 199-222.

[142] Burbidge, R.; Trotter, M.; Buxton, B. F.; Holden, S. B. Comput. Chem. 2001, 26, 5-14.

[143] Liu, H. X.; Xue, C. X.; Zhang, R. S.; Yao, X. J.; Liu, M. C.; Hu, Z. D.; Fan, B. T. J. Chem. Inf. Comput. Sci. 2004, 44, 1979-1986.

[144] Meir, R.; Raetsch, G. Lecture Notes in Computer Sci. 2003, 2600, 118-183.

[145] Breiman, L. Mach. Learn. 1996, 24, 123-140.

[146] Ho, T. K. IEEE Trans. Pat. Anal. Mach. Intell. 1998, 20, 832-844.

[147] Breiman, L. Mach. Learn. 2001, 45, 5-32.

[148] Svetnik, V.; Liaw, A.; Tong, C.; Culberson, C.; Sheridan, R. P.; Feuston, B. P. J. Chem. Inf. Comput. Sci. 2003, 43, 1947-1958.

[149] Freund, Y.; Schapire, R. J. Comp. Sys. Sci. 1997, 55, 119-139.

[150] Freund, Y.; Schapire, R. J. Japanese Soc. 1999, 14, 771-780.

[151] Freund, Y.; Schapire, R. E.; Bartlett, P. Lee, W. S. The Annals of Statistics 1998, 26, 1651-1686.

[152] Kvasnicka, V.; Pospichal, J. J. Chem. Inf. Comput. Sci. 1996, 36, 516-526.

[153] Kier, L. B.; Hall, L. H. Quant. Struct.-Act. Relat. 1993, 12, 383-388.

[154] Skvortsova, M.; Baskin, I.; Slovokhotova, O.; Palyulin, P.; Zefirov, N. J. Chem. Inf. Comput. Sci. 1993, 33, 630-634.

[155] Galvez, J.; Garcia-Domenech, R.; Bernal, J. M.; Garcia-March, F. J. An. Real Acad. Farm. 1991, 57, 533-546.