3 on an aspect of calculated molecular descriptors in qsar studies of quinolone 13
TRANSCRIPT
8/10/2019 3 on an Aspect of Calculated Molecular Descriptors in QSAR Studies of Quinolone 13
http://slidepdf.com/reader/full/3-on-an-aspect-of-calculated-molecular-descriptors-in-qsar-studies-of-quinolone 1/13
Molecular Diversity (2006) 10: 415–427DOI: 10.1007/s11030-006-9018-4 c Springer 2006
Full-length paper
On an aspect of calculated molecular descriptors in QSAR studies of quinolone
antibacterials
Payel Ghosh1, Megha Thanadath2 & Manish C. Bagchi1,∗
1 Drug Design, Development and Molecular Modelling Division, Indian Institute of Chemical Biology, 4 Raja S.C. Mullick
Road, Jadavpur, Calcutta 700032, India; 2 K.V.M College of Information Technology, Cherthala, Alleppey, Kerala, India
(∗ Author for correspondence, E-mail: [email protected], Tel.: +91 33 2473 3491/3493/0493/6793, Fax: +91 33 2473
5197, +91-33-2472 3967)
Received 25 October 2005; Accepted 18 January 2006
Key words: quantitative structure activity relationship, quinolone antibacterials, molecular descriptors, intermolecular similarity,PERL programming, ridge regression
Summary
The re-emergence of tuberculosis infections, which are resistant to conventional drug therapy, has steadily risen in the last
decade and as a result of that, fluoroquinolone drugs are being used as the second line of action. But there is hardly any
study to examine specific structure activity relationships of quinolone antibacterials against mycobacteria. In this paper, an
attempt has been made to establish a quantitative structure activity relationship modeling for a series of quinolone compounds
against Mycobacterium fortuitum and Mycobacterium smegmatis. Due to lack of sufficient physicochemical data for the anti-
mycobacterial compounds, it becomes very difficult to develop predictive methods based on experimental data. The present paper
is an effort for the development of QSARs from the standpoint of physicochemical, constitutional, geometrical, electrostatic and
topological indices. Molecular descriptors have been calculated solely from the chemical structure of N-1, C-7 and 8 substituted
quinolone compounds and ridge regression models have been developed whichcan explain a better structure-activity relationship.
Consideration of an intermolecular similarity analysis approach that led to a successful computer program development in
PERL language has been used for comparing the influence of various molecular descriptors in different data subsets. The
comparison of relative effectiveness of the calculated descriptors in our ridge regression model gives rise to some interesting
results.
Introduction
The greatest threats to tuberculosis control are the associa-
tion of this disease with the HIV epidemic and the increase
in resistance to the most effective anti-tuberculosis drugs.The global increase of multi-drug resistant M. tuberculosis
strains and intolerance of first line anti-tuberculosis drugs
such as isoniazide, rifampicin, pyrazinamide and ethamb-
utol may cause major problems and necessitate modifica-
tion of standard therapy regimen [1]. Recently developed
drugs like 6-fluoro-4-quinolone-3-carboxylic acids seem to
be very effective in cases of severe intolerance of first line
anti-tuberculosis medication [2, 3]. Of these fluoroquinolone
drugs, sparfloxacin seems to be the most potent agent be-
cause of its broad-spectrum efficacies, both in vitro and
in vivo, better than those of ofloxacin and ciprofloxacin
against mycobacterial infections [4, 5]. Developments in the
quinolone family for producing more active agents against
gram-positive organisms and mycobacteria are being con-
tinued with substitutions at N-1 and C-7 as well as at the
8 position of the quinolone ring with a view to obtain the
relationship between structural modification at these posi-
tions and activity against mycobacteria [6, 7]. These agentswere evaluated for their activities against Mycobacterium
fortuitum and Mycobacterium smegmatis, as the activities
of the compounds against these two organisms were used
for a measure of Mycobacterium tuberculosis activity. But
there is hardly any study to examine specific structure ac-
tivity relationships of the quinolone anti-bacterials against
mycobacteria. Quantitative Structure Activity Relationship
(QSAR) studies are based on the premise that biological
response is a function of the chemical structure. Thus, the
significant parameters of chemical structure have been de-
fined in numerical terms for the use in the development
of specific QSAR models [8]. Computer-aided drug design
methods, in general, have been rapidly developed in the
8/10/2019 3 on an Aspect of Calculated Molecular Descriptors in QSAR Studies of Quinolone 13
http://slidepdf.com/reader/full/3-on-an-aspect-of-calculated-molecular-descriptors-in-qsar-studies-of-quinolone 2/13
8/10/2019 3 on an Aspect of Calculated Molecular Descriptors in QSAR Studies of Quinolone 13
http://slidepdf.com/reader/full/3-on-an-aspect-of-calculated-molecular-descriptors-in-qsar-studies-of-quinolone 3/13
417
Table 1. Quinolone substrates and their activities considered in the present study
MIC Values(µg/mL) MIC Values(µg/mL)Comp No. R 1 R 7 X M.fort M.smeg
Comp No. R 1 R
7X M.fort M.smeg
1 NHN
CH 0.06 0.25 2
N NH3C
CH 0.06 0.25
3 NHN
H3C
CH 0.06 0.25
4 NHN
H 3C
H 3C
CH 0.06 0.13
5 NH N
E t
CH 0.13 0.25
6 N NEt CH 0.06 0.13
7 N NiPr
CH 0.25 0.258
N NiPrCH2
CH 1.0 0.25
9 N NBun
CH 1.0 0.5 10
N
H 2 N
CH 0.03 0.25
11
N
M e N H C H 2
CH 0.25 0. 5 12
N
EtNHCH2
CH 0.5 0.5
13
N
PrNHCH2iCH 0.25 0.5
14
N
(Me)2 NC H2
CH 0.13 0.5
15
N
H2 N CH 2 CH 3
CH 0.03 0.0616
F
F
NHN CH 0.25 0.5
17
F
F
N NH3C CH 0.25 0.518
F
F
NHN
H 3C
CH 0.13 0.5
(Continued on next page)
In the present study, an attempt has been made to classify the
set of 69 quinolone compounds using a criterion of structural
similarity. This criterion keeps a close relationship between
the molecules belonging to each one of the classes and their
biological activity. To study the structural similarity, it is es-
sential to build a mathematical space where chemical struc-
tures are pictured as vectors, whose components describe
topological features proper of their chemical nature. It is
expected that these chemical structures will be distributed
in mathematical space according to their structural charac-
teristics, so that, we could find neighborhoods of similar
molecules. For a well-defined structural space, it is expected
that molecules with similar biological activity will be in the
same neighborhood of structural similarity [22]. A set of
well-chosen descriptors such as physicochemical, geomet-
rical, constitutional, electrostatic and topological descriptors
may be used as variables. These descriptors arise from the
graph theoretical studies, which are often used as a powerful
8/10/2019 3 on an Aspect of Calculated Molecular Descriptors in QSAR Studies of Quinolone 13
http://slidepdf.com/reader/full/3-on-an-aspect-of-calculated-molecular-descriptors-in-qsar-studies-of-quinolone 4/13
418
Table 1. (Continued )
19
F
F
NHN
H3C
H3C
CH 0.25 0.5 20
F
F
NHN
Et
CH 0.25 1.0
21
F
F
N NEt
CH 0.5 1.0
22
F
F
N NiPr
CH 0.5 1.0
23
F
F
N NiPrCH2
CH 2.0 4.024
F
F
N NBun
CH 1.0 2.0
25
F
F
N
H2 N
CH 0.13 0.1326
F
F
N
MeNHCH2
CH 0.13 0.25
27
F
F
N
EtNHCH 2
CH 0.5 0.528 CH2CH3
NHN
CH 0.5 2.0
29 CH2CH3 NHN
H3C
CH 1.0 2.0 30 CH2CH3
NHN
H3C
H3C
CH 0.13 1.0
31 CH2CH3 N NEt
CH 0.25 0.532 CH2CH3
N
H2 N
CH 1.0 4.0
33 CH2CH3
N
MeNHCH 2
CH 2.0 8.034
NH N CH 0.03 0.13
35 NHN
H3C
CH 0.03 0.0636
NHN
H 3C
H3C
CH 0.06 0.06
(Continued on next page)
tool in the rational drug design. Thus, quantitative molecu-
lar similarity analysis was performed to sub-group the set of
quinolone antibacterials by similarity. An atom pair oriented
approach for the inter-molecularsimilarityusing the principle
of Carhart and development of a suitable computer program
in PERL script [23] by our group, will definitely help us to
subdivide the entire database into three categories – (a) the
whole set of 69 compounds, (b) compoundshaving more than50% similarity with Sparfloxacin, a known fluoroquinolone
tuberculostatic drug and (c) compounds having more than
60% similarity with that of Sparfloxacin. The chemical struc-
ture of Sparfloxacin with its biological activity values in MIC
(µg/mL) against M. fortuitum and M. smegmatis are given in
the Figure 1.
The computer program has mainly two tasks- the first
module is to generate the atom pairs for each of the quinolone
derivativesand to determine shortestpath separation. The sec-ond module deals with the calculation of the intermolecular
8/10/2019 3 on an Aspect of Calculated Molecular Descriptors in QSAR Studies of Quinolone 13
http://slidepdf.com/reader/full/3-on-an-aspect-of-calculated-molecular-descriptors-in-qsar-studies-of-quinolone 5/13
419
Table 1. (Continued )
37 N NEt
CH 0.13 0.1338
N
H2 N
CH 0.06 0.13
39
N
MeNHCH2
CH 0.13 0.2540
NH N
CH 1.0 4.0
41 NHN
H3C
CH 0.5 2.042
NHN
H 3C
H3C
CH 0.05 2.0
43 N NEt
CH 1.0 4.0
44
N
H2 N
CH 1.0 2.0
45
N
MeNHCH2
CH 2.0 8.046
NH N
CH 0.5 2.0
47 NHN
H3C
CH 0.25 1.048 NHN
H 3C
H3C
CH 0.13 0.5
49 N NEt
CH 0.5 2.0
50
N
H2 N CH 1.0 1.0
53 NHN
H3C
CBr 0.03 0.0654
NHN
H 3C
H3C
CBr 0.03 0.06
55 N NEt
CBr 0.03 0.0656
N
H2 N
CBr 0.03 0.06
57
N
MeNHCH2
CBr 0.03 0.06 58 NH N
COMe 0.03 0.03
(Continued on next page)
similarity between any two compounds based on the atom
pairs along with the shortest path separation as determined
in the first module. This program is unique in the sense that
it can determine the intermolecular similarity between any
two chemical structures by using simply the positions of the
atoms and bonds of the concerned structures as specified in
the input format. The intermolecular similarity of all the 69
quinolone antibacterials considered in our present study with
that of Sparfloxacin was generated using the above program
and are represented in Table 2.
The computational approach for the generation of the
atom pairs andsimilarity calculation are given below whereas
the main program in PERL script is given in the supple-
mentary section. An atom pair is a substructure composed
8/10/2019 3 on an Aspect of Calculated Molecular Descriptors in QSAR Studies of Quinolone 13
http://slidepdf.com/reader/full/3-on-an-aspect-of-calculated-molecular-descriptors-in-qsar-studies-of-quinolone 6/13
420
Table 1. (Continued )
59 NHN
H3C
COMe 0.03 0.0360
NHN
H 3C
H3C
COMe 0.03 0.03
61 N NEt
COMe 0.03 0.0362
N
H2 N
COMe 0.03 0.03
63
N
MeNHCH2
COMe 0.03 0.0364
NH N
N 0.03 0.06
65 NHN
H3C
N 0.03 0.06
66 NHN
H 3C
H3C
N 0.03 0.06
67 N NEt
N 0.03 0.0668
N
H2 N
N 0.03 0.06
69
N
MeNHCH2
N 0.03 0.06
Figure 1. Sparfloxacin with MIC= 0.06 & 0.13 against M. fort & M. smeg,
respectively
of two non-hydrogen atoms, i and j, and their interatomicseparation,
<atom description i>
− <separation> − <atom description j>
To find the interatomic separation, which is the shortest path
distance between any two atoms in a chemical structure, we
represent the structure in the form of a tree, in which each
level of the tree structure corresponding to a particular atom
shows the number of the neighbors that atom is attached to.
Thus, the program in this direction will definitely help us tocompute atom pairs from a specific input format. The first line
of the input should be a forward slash (/) which represents
the start of the input. The format for the next line following
the forward slash is given as:
< symbol><atom name i> (position of the
neighboring atoms separated by commas (, ))
The<symbol>caneitherbe“#”or“∼” dependingon whether
the atom is having a double bond or single bond respectively.
Molecular similarity, S (s, t ), between any two structures,
s and t may be calculated as,
S (s, t ) =2
d (s) + d (t )
distincttypes i
of atom pairs
MIN[n(i, s), n(i, t )]
where d (s) and d (t ) represents the total number of atom pairs
in s and t respectively.
Theoretical molecular descriptors calculation
The molecular descriptors used in the present paper are of
4 categories viz. (a) physicochemical, (b) constitutional and
geometrical, (c) electrostatic and (d) topological descriptors.The physicochemical descriptors consist of the molecular
8/10/2019 3 on an Aspect of Calculated Molecular Descriptors in QSAR Studies of Quinolone 13
http://slidepdf.com/reader/full/3-on-an-aspect-of-calculated-molecular-descriptors-in-qsar-studies-of-quinolone 7/13
421
Computational approach for the generation of atom pairs from two chemical structures
If Yes
Similarity calculation for the two compounds
Whether input data files of the chemical structuretogether with the bonds and neighbours and atomic
symbol for compounds exit
Shortest path calculation from each atom to all other atoms using tree structure
representation where levels of the tree are treated as the array positions.
Identification of the initial and terminal atoms from the input data filefor obtaining atom descriptions
Classification of each atom from its environment consisting of bonds
and neighbouring atom(s) associated with it.
Terminate program
If No
Print the obtained results in the atom-pair format
Store the calculated atom pairs of two chemical structures infiles, comp1.txt in the 1st iteration and comp2.txt in nextiteration for further analysis.
Go to step1 for thedeterminationof atom
pair for second structure
Exit the program
Calculate the no. of atom-pairs from the files, viz. comp1.pl and comp2.pl
Count the similar type of atom-pairs separately for each compound
Compare the count of each atom pair type from both thestructures and take the minimum of the number of occurrences
Obtain the total count of these minimum numbers of occurrences
Substitute these values in calculating structural similarity between two compounds
Print the results, i.e. the molecular similarity betweentwo compounds in percentage
Exit the program
descriptors like AlogP98 value, AMR value, buffer solubil-
ity, polarizability, vapour density, water solubility etc. De-
scriptors like formal charges, fraction of rotatable bonds,
number of rigid bonds, number of rings, number of charged
groups etc. form the constitutional descriptors. The three-
dimensional or shape descriptors (3-D) are more complex,
encoding information about the three dimensional aspects
of molecular structure. The electrostatic descriptors include
8/10/2019 3 on an Aspect of Calculated Molecular Descriptors in QSAR Studies of Quinolone 13
http://slidepdf.com/reader/full/3-on-an-aspect-of-calculated-molecular-descriptors-in-qsar-studies-of-quinolone 8/13
422
Table 2. Structural similarity of quinolone derivatives against
Sparfloxacin
Comp Similarity with C omp Similarity with
no. Sparfloxacin(%) no. Sparfloxacin (%)
1 61.16 2 55.753 69.91 4 79.09
5 64.58 6 54.05
7 57.06 8 55.03
9 49.21 10 60.86
11 61.45 12 58.16
13 59.52 14 60.36
15 57.18 16 46.43
17 43.30 18 52.89
19 57.65 20 49.11
21 42.23 22 44.62
23 43.27 24 38.19
25 46.17 26 47.21
27 44.39 28 49.13
29 56.57 30 63.13
31 44.25 32 48.18
33 50.15 34 45.13
35 51.78 36 57.06
37 46.36 38 47.79
39 50.20 40 55.66
41 61.94 42 67.99
43 50.07 44 55.05
45 57.18 46 59.29
47 67.99 48 77.09
49 52.40 50 59.29
51 59.53 52 64.01
53 73.40 54 82.58
55 56.79 56 64.01
57 64.75 58 63.16
59 71.88 60 79.89
61 56.08 62 63.44
63 63.49 64 47.79
65 54.05 66 59.81
67 43.72 68 46.90
69 49.66
charge polarization, local dipole index, maximum positiveand negative charges, general polarity parameters, relative
charge etc., whereas the topological descriptors are the
biggest set of molecular descriptors which may again be sub-
divided into two classes- topostructural and topochemical
descriptors. The topostructural descriptors encode informa-
tion strictly on the neighborhood and connectivity of atoms
within the molecule while the topochemical descriptors en-
code information related to both the topology of the molecule
and chemical nature of atoms and bonds within it.
In our present paper, we have used the software pack-
age PreADMET [24], which is a web based application
for predicting ADME data and building drug-like libraryusing insilico method. Two commercially available edition
of PreADMET are available, (i) standard and (ii) profes-
sional. This program can calculate about 955 molecular de-
scriptors including constitutional, geometrical, topological,
electrostatic and physicochemical descriptors, which has
been developed in response to need for rapid prediction of
drug likeliness and ADME/Toxicity data. The input file may
be created either by drawing the chemical structure or using
an appropriateSMILES notation of the compound concerned.
A total number of 444 molecular descriptors were calculated
for our present investigation using PreADMET program and
prior to model development, the set of calculated descrip-
tors was reduced from 444 to 294. The reduction in the de-
scriptors was either due to keeping a constant value for (or
nearly) all of the compounds, or those that were perfectly
correlated with another class of descriptors. Table 3 repre-
sents the symbols of the calculated molecular descriptors
used in our present study together with their corresponding
groups.
Statistical analysis
Multivariate regression analysis (MRA), one of the oldest
data reduction methodologies, continues to be widely used
in QSAR [25] as it does not impose any restriction on the
type and number of graphical invariants used in structure-
property activity studies. For a valid statistical significance
of the MRA, it is necessary to restrict the maximal number of
descriptors, which will depend on the number of compounds
investigated [26, 27]. In order to avoid ambiguities in the
interpretation of regression, only few parameters, or ideally
a single parameter may be used. But the structure activity
relationship of chemical compounds requires a huge number
of physicochemical and molecular descriptors. Considera-
tion of theoretical molecular descriptors like constitutional,
geometric, electrostatic and topologicaldescriptors has found
wide applications in quantitative structure activity relation-
ship modeling [28, 29]. To establish such a relationship be-
tween activity and structural descriptors of the quinolone
compounds under consideration, it is essential to develop a
regression or an input-output model. Multiple linear regres-
sion and partial least squares are common for development of
linear QSAR models while methods such as artificial neural
network areused in thecase of non-linear modeling.Topolog-icalindices are in particularinescapable in the development of
successful multiple regression analysis leading to the QSAR
of rational drug design. The present study regarding QSAR of
quinolone antibacterials involves a huge number of various
types of topological as well as physicochemical descriptors.
Conventional regression i.e. ordinary least squares (OLS)
does not produce reliable models when the number of de-
scriptors exceeds the number of observations [30, 31]. In
this situation, the alternate and appropriate statistical meth-
ods that may be considered are ridge regression (RR) [32],
principal component regression (PCR) [33] and partial least
squares (PLS) [34–36]. All the above three linear statisticalmethods are very useful and have a wide applicability when
8/10/2019 3 on an Aspect of Calculated Molecular Descriptors in QSAR Studies of Quinolone 13
http://slidepdf.com/reader/full/3-on-an-aspect-of-calculated-molecular-descriptors-in-qsar-studies-of-quinolone 9/13
423
Table 3. List of molecular descriptors used in this study
Descriptor classes Descriptor names
Constitutional Descriptors No. amino groups primary, No. amino groups secondary, No. amino groups tertiary,
No. ester groups, No. halogen atoms, Molecular weight, No. Total atoms, No.
Rotatable bonds, Fraction of Rotatable bonds, No. Rigid bonds, No. Rings, No.
Aromatic rings, No. single bonds, No. aromatic bonds, No. H-bond acceptors, Ratio
donors to acceptor.
Geometrical Descriptors 2D-VDW surface, 2D-VDW volume, 2D-VSA hydrophobic, Fraction of 2D-VSA hydrophobic, 2D-VSA
hydrophobic sat, 2D-VSA hydrophobic unsat, 2D-VSA other,
2D-VSA polar, Fraction of 2D-VSA polar, 2D-VSA Hbond acceptor, 2D-VSA Hbond
donor, 2D-VSA Hbond all, Fraction of 2D-VSA Hbond, Fraction of 2D-VSA
chargable groups, Topological PSA.
Electrostatic Descriptors Max negative charge, Max positive hydrogen charge, Total negative charge, Total
positive charge, Total absolute atomic charge, Charge polarization, Local dipole index,
Polarity parameter, Relative positive charge, Relative negative charge, PPSA1(Partial
Positive Surface Area 1st type), PPSA2, PPSA3, PNSA1(Partial Negative Surface
Area 1st type), PNSA3, DPSA1(Difference in Charged Partial Surface Area), DPSA2,DPSA3, FPSA1(Fractional charged partial positive surface area 1st type), FPSA2,
FPSA3, FNSA1(Fractional charged partial negative surface area 1st type), FNSA3,
WPSA1 (Surface weighted charged partial positive surface area 1st type), WPSA2,
WPSA3, WNSA1 (Surface weighted charged partial negative surface area 1st type),
WNSA3, RPCS (Relative positive charge surface area), RNCS (Relative negative
charge surface area), Hydrophobic SA – MPEOE, Positive charged polar SA –
MPEOE, Negative charged polar SA – MPEOE, SADH1 (Surface area on donor
hydrogens 1st type), SADH2 (Surface area on donor hydrogens 2nd type), SADH3
(Surface area on donor hydrogens 3rd type), CHDH1 (Charge on donatable hydrogens
1st type), CHDH2, CHDH3, SCDH1 (Surface weighted charged area on donor
hydrogens 1st type), SCDH2, SCDH3, SAAA1 (Surface weighted charged area
on acceptor atoms 1st type), SAAA2, SAAA3, CHAA1 (Charge on acceptors atoms 1st
type), CHAA2, CHAA3, SCAA1 (Surface weighted charged area on acceptor atoms
1st type), SCAA2, SCAA3, HRNCS, HRNCG.
Topological Descriptors Total structure connectivity index, Chi 0 (Simple zero order chi index), Chi 1, Chi 2,
Chi 3 path (Simple third order path chi index), Chi 3 cluster (Simple 3rd order cluster
chi index), Chi 4 path, Chi 5 path, Chi 4 path/cluster (Simple 4th order path/cluster chi
index), VChi 0 (Valance zero order chi index), VChi 1, VChi 2, VChi 3 path (Valance
3rd order path chi index), VChi 4 path, VChi 3 cluster, VChi 4 path/cluster, VChi 5
path, Kier shape 1 (encodes the degree of cyclicity in the graph, decreases as graph
cyclicity increases), Kier shape 2 (encodes the degree of central branching in the
graph,decreases as the degree of central branching increases.), Kier shape 3 (encodes
the degree of separated branching in the graph,increases as the degree of separation in
branching increases.), Kier alpha 1 (1st Order Kappa Alpha Shape Index), Kier alpha
2, Kier alpha 3, Kier flexibility, Kier symmetry index, Kier steric descriptor, Delta Chi
0 (Delta zero order chi index), Delta Chi 1, Delta Chi 2, Delta Chi 3 path, Delta Chi 3
cluster, Delta Chi 4 path, Delta Chi 4 cluster, Chi 4 path/cluster, Delta Chi 5 path,
Difference chi 0 (Difference simple zero order chi index), Difference chi 1, Difference chi 2,
Difference chi 3, Difference chi 4, Difference chi 5, IC (information content
index), BIC (bond information content), CIC (complementary information content), SIC
(structural information content), IAC total (total information index of atomic
composition), I adj equ (Information index based on the vertex adjacency matrix
equality), I adj mag (Information index based on the vertex adjacency matrix
magnitude), I adj deg equ (Information index based on the degree adjacency matrix
equality), I adj deg mag, I dist equ (Information index based on the distance matrix
equality), I dist mag (Information index based on the distance matrix magnitude),
(Continued on next page)
8/10/2019 3 on an Aspect of Calculated Molecular Descriptors in QSAR Studies of Quinolone 13
http://slidepdf.com/reader/full/3-on-an-aspect-of-calculated-molecular-descriptors-in-qsar-studies-of-quinolone 10/13
424
Table 3. (Continued )
Descriptor classes Descr ip to r names
I edge adj equ (Information index based on the edge adjacency matrix equality),
I edge adj mag (Information index based on the edge adjacency matrix magnitude),
I edge adj deg equ, I edge adj deg mag, I edge dist equ, I edge dist mag,
Wiener index (Half-sum of the off-diagonal elements of the distance matrix of a
graph), Hyper Wiener index, Harary index (Half-sum of the off-diagonal elements of
the reciprocal molecular distance matrix), 1st Zagreb (1st Zegreb index), 2nd Zagreb,
Quadratic index, Rouvray index, 2-MTI (Schultz Molecular Topological Index (MTI)),
2-MTI prime (Schultz MTI by valence vertex degrees), Gutman MTI, Graph diameter,
Graph radius, Graph Petitjean, Eccentric connectivity index, Eccentric adjacency
index, Platt number, Odd-even index, Vertex degree-distance index, Ring degree-
distance index, Balaban index JX, Balaban index JY, Xu (Xu index), Superpendentic
index, Unipolarity distance matrix, Centralization distance matrix,
Dispersion distance matrix, SC-0 (Subgraph Count Index of order 0), SC-1, SC-2,
SC-3 path, SC-3 cluster, SC-4 path, SC-4 cluster, SC-4 path/cluster, SC-5 path, SC-6
path, SC-7 path, SC-8 path, SC-9 path, SC-10 path, Solvation chi 0 (Solvation zeroorder chi index), Solvation chi 1, Solvation chi 2, Solvation chi 3 path, Solvation chi 3
cluster, Solvation chi 4 path, Solvation chi 4 cluster, Solvation chi 4 path/cluster,
Solvation chi 5 path, VS-0 (Valence Shell Count of order 0), VS-1, VS-2, VS-3, VS-4,
VS-5, Molecular walk count 2, Molecular walk count 3, Molecular walk count 4,
Molecular walk count 5, Path/walk 2, Path/walk 3, Path/walk 4, Path/walk 5, Narumi
ATI (Narumi simple topological index (log)), Narumi HTI (Narumi harmonic
topological index), Narumi GTI(Narumi geometric topological index), Pogliani index,
Ramification index, Degree complexity, Graph vertex complexity, Graph distance
complexity, Graph distance index, Mean square distance index, Mean distance
deviation, Edge Wiener index, Edge Hyper Wiener index, Edge MTI, Edge Gutman
MTI, Edge connectivity index, E-state SsCH3, E-state SssCH2, E-state SdsCH, E-state
SsssCH, E-state SaasC, E-state SssssC, E-state SsssNH, E-state SdO, E-state
S hydrophobic, E-state S hydrophobic unsat, E-state S polar, E-state
S hbond donor, E-state S negative charged group, E-state SHssNH2, E-state
SHdsCH, E-state SHCHnX, E-state SH hydrophobic, E-state SH polar, E-state
SaaCH, E-state SdssC, E-state SssNH2, E-state SsssN, E-state SsOHl, E-state SsF, E-
state S hydrophobic sat, E-state S none, E-state S hbond acceptor, E-state
S positive charged group, E-state SHsssNH, E-state SHaaCH.
Physicochemical Descriptors Polarizability Miller, SKlogP value, Water solubilityl, Vapor pressure, Buffer solubility,
SK MP, AMR value (Calculated molecular refractivity index), Polarizability MPEOE,
SKlogS value, SKlogPvp, SKlogS buffer, SK BP, AlogP98 value, AlogP98 002C,
AlogP98 006C, AlogP98 008C, AlogP98 024C, AlogP98 026C, AlogP98 038C, AlogP98
040C, AlogP98 047H, AlogP98 051H, AlogP98 053H, AlogP98 057O, AlogP98 067N, AlogP98 071N,
AlogP98 073N, AlogP98 075N, AlogP98 094Br, AlogP98 084F, AlogP98
001C, AlogP98 003C, AlogP98 005C, AlogP98 011C, AlogP98 029C, AlogP98 046H,AlogP98 050H, AlogP98 052H, AlogP98 060O, AlogP98 066N, AlogP98 068N.
the number of independent variables greatly exceed the num-
ber of observations and when the independent variables are
highly inter-correlated. Each of these methods makes use of
the entire available pool of independent variables as opposed
to selecting a subset, which introduces bias and may result
in the elimination of important parameters from our studies.
From the works of Miller [31] and Friedman [38], it is also
known that data subsetting is less effective than those meth-
ods that retain all of the independent variables and use other
approaches to deal with the rank deficiency. Among the three
statistical methods involving RR, PCR and PLS, it is found
that RR is the best among the three methods, and this is used
extensively in multiple comparative studies [18, 38–40]. For
this reason, the models based on the large set of constitutional
and geometric, electrostatic and topological descriptors were
developed using the RR methodology. RR, like PCR, trans-
forms thedescriptorsto their principal components (PCs) and
uses the PCsas descriptors. However, unlike PCR, RR retains
all of the PCs, and “shrinks” them differentially according to
their eigenvalues. The RR vector of regression coefficients,
8/10/2019 3 on an Aspect of Calculated Molecular Descriptors in QSAR Studies of Quinolone 13
http://slidepdf.com/reader/full/3-on-an-aspect-of-calculated-molecular-descriptors-in-qsar-studies-of-quinolone 11/13
425
b, is given by
b = (XTX+k I)−1XTY
where X is the matrix of descriptors, Y is the vector of observed activities, I is an identity matrix, and k is a non-
negative constant known as the “ridge” constant. If k = 0,
RR reduces to conventional OLS regression. Thus, the Ridge
Regression (RR) method has been applied in our dataset of
quinolone compounds and models have been developed ac-
cordingly for various sets of molecular descriptors and an
effective comparison among the RR models for the above
descriptor classes have been made and discussed in the next
section.
Results and discussion
QSAR studies have been performed using the theoretical
molecular descriptors, calculated from the PreADMET
Molecular Descriptor Calculation package and experimen-
tally derived biological activity data of the quinolone
derivatives both for M. fortuitum and M. smegmatis and the
ridge regression analysis is given in the Table 4. We have con-
sidered all of the 69 quinolone compounds, i.e. N1 and C7 as
well as 8 substituted derivatives of quinolone antibacterials
and different subsets to be used in the statistical analysis.
Further subsetting of the above biological activity data has
been considered by us utilizing molecular similarity analysis
for an effective comparison of the results obtained from the
RR analysis. Similarity analysis performed by us enhance
the scope of sub grouping the data into further two categories
viz., (i) compoundshaving 50%or more inter molecularsimi-
larity with Sparfloxacin, the fluoroquinolone drug used as an
anti-tuberculostatic agent and (ii) compounds having 60%
and more similarity with the drug. So, four cases of ridge
regression models were developed as for example the com-
plete set of 69 quinolone compounds; 51 sets of N1 and C7
substituted quinolone derivatives; and two other groups of
data consisting of 48 and 22 compounds arising out of the
above similarity analysis. To calculate the molecular sim-
ilarity between any two compounds, we have developed acomputer program in PERL script and the main utility of
this program is that it can generate the whole sets of atom-
pairs of each compound andcalculate the structural similarity
afterwards from the input files containing the minimum in-
formation, i.e. the positions of atoms and bonds of respective
compounds.
Thus it is evident that the structural similarity oriented
sub grouping of the entire data set has actually arranged the
quinolone compounds activity-wise for it is known that the
structurally similar compounds may possess similar activity.
From the above sub grouping of Table 4, we can study the
pattern of influence of any descriptor class on activity in thisproposed QSAR model of quinolone compounds that help us
to arrive at some conclusions.Tostudythe pattern of influence
of the descriptor classes, it is necessary to compare the R2
values in our ridge regression model. The total RR analysis
was done using the NCSS software package [41].
The above table provides the regression summary for QSAR
of the quinolone derivatives in cases of Mycobacterium for-
tuitum and Mycobacterium smegmatis. For the complete set
of 69 compounds, the RR model only with the topological
descriptors has R2 values of 0.8357 and 0.8200 for M. for-
tuitum and M. smegmatis respectively and the addition of
other theoretical descriptors like constitutional and geomet-
rical and electrostatic indices have contributed significantly
towards a better R2 value thus improving the model quality.
From the Table, it is seen that the influence of the above de-
scriptors when considered alone result in inferior models. For
the same set of compounds, the RR model based on physic-
ochemical descriptors appears to be very poor compared to
the topological descriptors derived model. When we considerthe group of the first 51 compounds in Table 1 excluding the
derivatives with the substitution at 8 position, we see that
the RR models based on topological descriptors alone can
fit the data very well. The fit is clearly much better com-
pared to the physicochemical model. Even the electrostatic
descriptors can describe the model better with R2 values of
0.7380 and 0.6850 for the case of M. fortuitum and M. smeg-
matis respectively than the physicochemical descriptors with
the R2 values of 0.6947 and 0.6408 against those respec-
tive mycobacteria. If we take all the calculated molecular
descriptors like topological, electrostatic and constitutional
and geometrical indices into account, we get an excellent fit
with the value of R2 being 0.9021 and 0.8830 for M. fortu-
itum and M smegmatis respectively. In the third case, where
48 quinolone compounds were considered on the basis of
50% or more similarity cases, it is worthwhile to mention
that this dataset gives overall improved values of R2 than
the previous datasets. Here also the topological descriptors
alone can describe the model much better than the physico-
chemical property based model and the combination of all the
calculated descriptors such as constitutional and geometrical,
electrostatic and topological indices contribute to a more sig-
nificant model development. This trend is also continued in
the last subset of 22 quinolone compounds possessing 60%
or more structural similarity with sparfloxacin. The pattern of influence of these structural descriptors seems to be the same
as in the previous cases when compared to the physicochem-
ical descriptors. So it is evident from the QSAR reported in
Table 3 that the calculated molecular descriptors could pro-
vide a better quality predictive model for N-1, C-7 and 8 sub-
stituted quinolone derivatives. The physicochemical property
based QSPR studies resulted in much inferior models. The
QSAR models based on molecular descriptors that are calcu-
lated solely from the chemical structure can be used as more
reliable models for predicting the potential of any quinolone
derivatives. It is hoped that the model development in this di-
rection will throw new light on the anti-tuberculostatic drugdesign.
8/10/2019 3 on an Aspect of Calculated Molecular Descriptors in QSAR Studies of Quinolone 13
http://slidepdf.com/reader/full/3-on-an-aspect-of-calculated-molecular-descriptors-in-qsar-studies-of-quinolone 12/13
426
Table 4. Regression summary for QSARs of Quinolone compounds
R2
Molecular Descriptors M. fortuitum M. smegmatis
N = 69 ( whole set of 69 compounds)Constitutional and Geometrical Descriptors 0. 5991 0. 6265
Electrostatic Descriptors 0.6785 0.6172
Topological Descriptors 0.8357 0.8200
With all th e ab ove three sets of descrip to rs 0. 8928 0. 8932
Physicochemical Descriptors 0.6932 0.6424
N = 51 (considering C1 and N7 substitution)
Constitutional and Geometrical Descriptors 0. 5496 0. 5954
Electrostatic Descriptors 0.7380 0.6850
Topological Descriptors 0.8196 0.8226
With all th e ab ove three sets of descrip to rs 0. 9021 0. 8830
Physicochemical Descriptors 0.6947 0.6408
N = 48 (50% and above similarity)
Compounds: 1–8, 10–15, 18–19, 29–30,
33, 35–36, 39–63, 65–66
Constitutional and Geometrical Descriptors 0. 6574 0. 7689
Electrostatic Descriptors 0.7693 0.7300
Topological Descriptors 0.9226 0.9333
With all th e ab ove three sets of descrip to rs 0. 9369 0. 9535
Physicochemical Descriptors 0.7360 0.7350
N = 22 (60% and above similarity)
Compounds:1, 3–5, 10–11, 14, 30, 41–42,
47–48, 52–54, 56–60, 62–63
Constitutional and Geometrical Descriptors 0. 7345 0. 8521
Electrostatic Descriptors 0.9585 0.9415
Topological Descriptors 0.9914 0.9952
With all th e ab ove three sets of descrip to rs 0. 9931 0. 9962
Physicochemical Descriptors 0.7679 0.8890
Ref to Table 2
Acknowledgement
Payel Ghosh thanks the Council of Scientific and Indus-
trial Research, New Delhi 110001, India for the grant of
a Junior Research Fellowship to her. The authors sincerely
acknowledge the valuable comments of the anonymous re-viewers that helped to improve the quality of the final
manuscript.
References
1. Lubasch, A., Erbes, R., Munch, H. and Lode, H., Sparfloxacin in treat-
ment of drugresistant tuberculosisand intolerance of firstline therapy,
Eur. Respir. J., 17 (2001) 641–646.
2. Albino, J.A. and Reichman, L.B., The treatment of tuberculosis, Res-
piration, 65 (1998) 237–255.
3. O’Brien, R.J. and Vernon, M., New tuberculosis drug development ,
Am J Respir Crit Care Med., 157 (1998) 1705–1707.
4. Nakamura, S., Minami, A., Nakata, K., Kurobe, N., Kouno, K.,Sakaguchi, Y., Kashimoto, S., Yoshida, H., Kojima, T., Ohue, T.,
Fujimoto, K., Nakamura, M., Hashimoto, M. and Shimizu, M., In vitro
and in vivo antibacterial activities of AT-4140, a new broad-spectrum
quinolone. Antimicrob. Agents Chemother., 33 (1989) 1167–1173.
5. Rastogi, N., Labrousse, V., Goh, K.S. and Sousa, J.P., Antimycobacte-
rialspectrumof sparfloxacinand its activities alone and in association
with other drugs against Mycobacterium avium complex growing ex-
tracellularly and intracellularly in murine and human macrophages,Antimicrob. Agents Chemother., 35 (1991) 2473–2480.
6. Reanau, T.E., Sanchiez, J.P., Gage, J.W., Dever, J.A., Shapiro, M.A.,
Gracheck, S.j. and Domagala, J.M., Structure-activity relationships of
the quinolone antibacterials againstmycobacteria: Effect of structural
changes at N-1 and C-7 , J. Med. Chem., 39 (1996) 729–735.
7. Reanau,T.E., Gage, J.W., Dever, J.A., Roland, G.E., Joannides, E.T.,
Shapiro, M.A., Sanchiez, J.P., Gracheck, S.J., Domagala, J.M., Ja-
cobs, M.R. and Reynolds, R.C., Structure-activity relationships of
quinolone agents against mycobacteria: Effect of structural modifica-
tion atthe 8 position, Antimicrob. AgentsChemother., 40 (1996)2363–
2368.
8. Hansch, C., On the structure of medicinal chemistry, J. Med. Chem.,
19 (1976) 1–6.
9. Hansch, C. and Leo, A. QSAR: Fundamentals and Applications in
Chemistry andBiology, American Chemical Society, Washington, DC,1995.
8/10/2019 3 on an Aspect of Calculated Molecular Descriptors in QSAR Studies of Quinolone 13
http://slidepdf.com/reader/full/3-on-an-aspect-of-calculated-molecular-descriptors-in-qsar-studies-of-quinolone 13/13
427
10. Molina, E., Diaz, H.G., Gonzalez, M.P., Rodriguez, E. and Uriarte, E.,
Designing antibacterial compounds through a topological substruc-
tural approach, J. Chem. Inf. Comput. Sci., 44 (2004) 515–521.
11. Bagchi, M.C., Maiti, B.C., Mills, D. and Basak, S.C., Usefulness of
graphical invariants in quantitative structure–activity correlations of
tuberculostatic drugs of the isonicotinic acid hydrazide type, J Mol
Model, 10 (2004) 102–111.
12. Bagchi, M.C. and Maiti, B.C., On application of atom pairs on drug
design, J Mol Struct: THEOCHEM, 623 (2003) 31–37.
13. Bagchi,M.C.,Maiti,B.C. andBose,S., QSARof antituberculosis drugs
of INH type using graphical invariants, J Mol Struct: THEOCHEM,
679 (2004) 179–186.
14. Roy, K., Topological descriptors in drug design and modeling studies,
Mol. Div., 8 (2004) 321–323.
15. Balaban, A.T., Basak, S.C., Beteringhe, A., Mills, D. and Supuran,
A.T., QSAR study using topological indices for inhibition of carbonic
anhydrase II by sulfanilamides and Schiff bases, Mol. Div., 8 (2004)
401–412.
16. Besalu, E., Ponec, R. and Julian-Ortiz, J.V., Virtual generation of
agents against Mycobacterium tuberculosis. A QSAR study, Mol. Div.,
6 (2003) 107–120.
17. Votano, J.R., Parham, M., Hall, L.H. and Kier, L.B., New predictors
for several ADME/Tox properties: Aqueous solubility, human oral ab-
sorption, and Ames genotoxicity using topological descriptors. Mol.
Div., 8 (2004) 379–391.
18. Gonzalez-Diaz, H., Torres-Gomez, L.A., Guevara, Y., Almeida, M.
S., Molina, R., Castanedo, N., Santana, L. and Uriarte, E., Markovian
chemicals“in silico” design (MARCH-INSIDE), a promising approach
for computer-aided moleculardesign III: 2.5D indices for the discovery
of antibacterials, J. Mol. Model (Online), 11 (2005) 116–123.
19. Basak, S.C., Gute, B.D. and Mills, D., Quantitative molecular similar-
ity analysis(QMSA) methods for property estimation: A comparison of
property-based, arbitrary, and tailored similarity spaces, SAR QSAR
Environ Res., 13 (2002) 727–742.
20. Basak, S.C., Mills, D., Hawkins, D.M. and El-Masri, H.A., Prediction
of tissue: Air partition coefficient: A comparison of structure-based and property-based methods, SAR QSAR Environ Res., 13 (2002)
649–665.
21. Carhart, R.E., Smith, D.H. and Venkataraghavan, R., Atom pairs as
molecular features in structure activity studies: Definition and appli-
cations, J. Chem. Inf. Comput. Sci., 32 (1985) 664–674.
22 . Nino V., M., Daza C., E.E. and Tello, M., A criteria to classify biolog-
ical activity of benzimidazoles from a model of structural similarity, J.
Chem. Inf. Comput. Sci., 41 (2001) 495–504.
23. Tisdall, J. (1st Ed.), Beginning Perl for Bioinformatics, O’Reilly,
(2001).
24. http://preadmet.brdrc.org/.
25. Katritzky, A.R., Petrukhin, R., Tatham, D., Basak, S., Benfenati, E.,
Karelson, M. and Maran, U., Interpretation of quantitative structure–
property and – activity relationships, J. Chem. Inf. Comput. Sci., 41
(2001) 679–685.
26. Rao, C.R., Linear statistical inference and its applications (2nd ed.),
Wiley, New York (1973).
27. Randic, M., Novel shape descriptors for molecular graphs, J. Chem.
Inf. Comput. Sci., 41 (2001) 607–613.
28. Basak, S.C., Grunwald, G.D. and Niemi, G.J., In: A.T. Balaban (Ed.),
From Chemical Topology To Three-Dimensional Geometry, Plenum
Press, New York (1997), pp. 73–116.
29. Basak,S.C.,Use of molecularcomplexityindicesin predictive pharma-
cology and toxicology: A QSAR approach, Med. Sci. Res., 15 (1987)
605–609.
30. Estrada, E., In: J. Devillers and A.T. Balaban (Eds.), Topo-
logical Indices And Related Descriptors In QSAR And QSPR,
Gordon and Breach, Amsterdam, The Netherlands (1999), pp. 403–
453.
31. Miller, A.J., Subset selection in regression, Chapman and Hall, (1990)
New York, NY.
32. Rencher, A. C. and Pun, F.C., Inflation of R2 in best subset regression,
Technometrics, 22 (1980) 49–53.
33. Hoerl,A.E. and Kennard, R.W., Ridge regressionbiased estimation for
nonorthogonal problems, Technometrics, 8 (1970) 27–51
34. Massy,W.F., Principalcomponents regressionin exploratory statistical
research, J. Amer. Statist. Assoc., 60 (1965) 234–246.
35. Wold H., Soft modeling by latent variables: The nonlinear iterative
partial least squares approach. In: Gani J (Ed.) Perspectives in Proba-
bility and Statistics, papers in honor of Bartlett MS. Academic Press,
London, (1975).
36. Hoskuldsson, A., PLS regression methods, Journal of Chemometrics,
2 (1988) 211–228.
37. Hoskuldsson, A., A combined theory for PCA and PLS , Journal of
Chemometrics, 9 (1995) 91–123.
38. Frank,I.E. andFriedman, J.H., A statistical view of some chemometrics
regression tools, Technometrics, 35 (1993) 109–135.39. Basak, S.C. , Mills, D. , Hawkins, D.M. and El-Masri, H., Prediction
of human blood: Air partition coefficient: A comparison of structure-
based and property-based methods, Risk Analysis, 23 (2003) 1173–
1184 .
40. Basak, S.C., Mills, D., Mumtaz, M.M. and Balasubramanian, K., Use
of topological indices in predicting aryl hydrocarbon receptor bind-
ing potency of dibenzofurans: A hierarchical QSAR approach, Ind. J.
Chem., 42A (2003) 1385–1391.
41. NCSS – Statistical and Power Analysis Software; Hintze, J. (2004),
NCSS and PASS. Number Cruncher Statistical Systems, Kaysville,
Utah, http://www.ncss.com/.