quantitative structure-activity relationships (qsar) study of flavonoid derivatives for inhibition...
TRANSCRIPT
Quantitative Structure-Activity Relationships (QSAR)Study of Flavonoid Derivatives for Inhibition ofCytochrome P450 1A2
Taesung Moon1, Myung Hwan Chi1, Dong-Hyun Kim1, Chang No Yoon1* and Young-Sang Choi2
1 Bioanalysis and Biotransformation Research Center, Korea Institute of Science and Technology, P.O. Box 131 Cheongryang,
Seoul 130-650, Korea2 Department of Chemistry, Korea University, Seoul 136-761, Korea
Abstract
The quantitative structure-activity relationships (QSAR)
studies on ¯avonoid derivatives as cytochrome P450 1A2
inhibitors were performed using multiple linear regression
analysis (MLR) and neural networks (NN). The results of
MLR and NN show that Hammett constant, the highest
occupied molecular orbital energy (HOMO), the nonoverlap
steric volume, the partial charge of C3 carbon atom, and the
HOMO p coef®cients of C3, C30 and C4
0 carbon atoms of
¯avonoids play an important role in inhibitory activity. The
correlations between the descriptors and the activities were
improved by neural networks although the descriptors of
optimum MLR model were used in the networks, which
implies that the descriptors used in MLR model include
nonlinear relationships. Moreover, neural networks using
descriptors selected by the pruning method gave higher r2
value than neural networks using MLR descriptors.
1 Introduction
The quantitative structure-activity relationships (QSAR)
were introduced by Hansch and co-workers [1, 2]. In many
works, their methods have been applied to ®nd the relation-
ships between biological activities of chemical compounds
and their physicochemical properties. These relationships are
determined using multiple linear regression which minimizes
the variance between the data and model. However, third and
higher order terms as well as cross-product terms correspond-
ing to the interaction between physicochemical properties are
not used in practice. The equation determined using multiple
linear regression is simple but the lack of high order terms
restricts to ®nding linear relationships.
The neural networks have been focused in the ®eld of pattern
recognition. The most important features of neural networks
are the interconnections of many nodes called neurons, which
enable the parallel and distributed processing in the neural
networks. The interconnections convey information learned
from environments and facilitate content addressable storage
called neuron. To store a particular pattern, the connection
strengths called weights must be modi®ed to memorize the
distinguishable features of the pattern so that the pattern can
be recalled later. The nonlinear feature of neural networks
suggests their potential usefulness in QSAR study. Recently,
neural networks have been applied to ®nd the relationships
between the molecular physicochemical parameters and
biological activities [3±6].
Flavonoids are widely distributed in nature and found in all
parts of edible plants [7]. Based on chemical structures, they
can be broadly classi®ed as ¯avone, ¯avanonol, ¯avonol,
¯avonone, and ¯avan (Figure 1). Many ¯avonoids exhibit
activity on different enzymatic systems. They have been
shown to possess antiin¯ammatory, antiallergeric, antiviral,
antimutagenic, and anticarcinogenic activities [8±11].
Furthermore, some of these compounds were found to have
estrogenic or antiestrogenic activities as well as cytochrome
P450 1A2 inhibitory activity. Caffeine N3-demethylase
activity is inhibited by the presence of various ¯avonoids.
The quantitative structure-activity relationships (QSAR)
studies were carried out to obtain further insight into the
relationships between the structure and biological activity of
several ¯avonoid inhibitors for human cytochrome P450
1A2. The neural networks as well as multiple linear regres-
sion analysis were performed to ®nd out the relationships
between them. While there are several neural network
methods, the back propagation algorithm was used in this
study.
* To receive all correspondence
Key words: QSAR, multiple linear regression analysis (MLR),
neural networks (NN), ¯avonoids, cytochrome P450 1A2
Quant. Struct.-Act. Relat., 19 (2000) # WILEY-VCH Verlag GmbH, D-69469 Weinheim 0931-8771/00/0306-0257 $17.50+.50/0 257
Flavonoid Derivatives for Inhibition of Cytochrome P450 1A2 QSAR
2 Methods
2.1 Biological Activity Measurements
The inhibition assays of cytochrome P450 1A2 activities by
¯avonoids were performed by caffeine 3-demethylation
assay using human liver microsome [12]. The inhibitory
activities of ¯avonoids were determined as IC50 (concentra-
tion required to reduce the caffeine 3-demethylation activity
by 50%). The ¯avonoids used in this study and their structur-
al features are shown in Figure 1 and Table 1.
2.2 Molecular Modeling
Molecular structures of all the ¯avonoid derivatives were
constructed using the InsightII molecular modeling package
(MSI Inc.). All the rotable bonds were searched from 0� to
360� in 6� increments in order to obtain low energy struc-
tures. In the cases of neohesperidin and panasenoside, the
rotable bonds were searched in 30� increments because these
molecules have many rotable bonds. The lowest energy
conformer for a given molecule was minimized using the
conjugate gradient and va09a minimizers until maximum
energy derivatives were less than 0.001 kcaly(mol AÊ ). The
minimized structures were then fully geometry optimized
using AM1 model Hamiltonian in MOPAC in order to
compute the values of molecular descriptors which are
dependent on conformation. The following set of descriptors
were used in multiple linear regression analysis and neural
networks; (1) Hammett constant of B ring, (2) molecular
volume, (3) nonoverlap steric volume between each analogue
and reference molecule (¯avone), (4) connolly surface area,
(5) ratio of molecular volume to surface area (VolyArea), (6)
largest principal moments of inertia, (7) total dipole moment,
(8) highest occupied molecular orbital energy, HOMO, (9)
lowest unoccupied molecular orbital energy, LUMO, (10)
partial charges of C3, C5, C7, C30, C4
0, and C50 carbon atoms,
(11) HOMO p coef®cients of C3, C5, C7, C30, C4
0, and C50
carbon atoms, and (12) the difference between HOMO and
LUMO.
Table 1. The structures and IC50 values for ¯avonoid derivatives on caffeine N3-demethylationactivity by human hepatic microsomes [12].
Flavonoids Structure IC50 (M)
Flavone Chrysin 5,7-Dihydroxy¯avone 2.0610ÿ7
Apigenin 4 0,5,7-Trihydroxy¯avone 1.35610ÿ6
Luteolin 3 0,4 0,5,7-Tetrahydroxy¯avone 1.34610ÿ5
Flavonol Galangin 3,5,7-Trihydroxy¯avone 3.06610ÿ6
Quercetin 3 0,4 0,3,5,7-Pentahydroxy¯avone 1.69610ÿ4
Avicularin Quercetin 3-O-ararbinofuranose 3.77610ÿ4
Quercitrin Quercetin 3-O-Rha 2.24610ÿ4
Myricetin 3 0,4 0,5 0,3,5,7-Hexahydroxy¯avone 1.85610ÿ4
Fisetin 3 0,4 0,3,7-Tetrahydroxy¯avone 2.37610ÿ4
Morin 2 0,4 0,3,5,7-Pentahydroxy¯avone 9.46610ÿ6
Kaempferol 4 0,3,5,7-Tetrahydroxy¯avone 7.34610ÿ5
Panasenoside Kaempferol 3-O-Gal-Glu 3.68610ÿ4
Populnin Kaempferol 7-O-Rha 3.27610ÿ4
Flavanone Hesperetin 3 0,5,7-Trihydroxy¯avanone 2.72610ÿ4
Neohesperidin Hesperetin 7-O-neohesperidoside 5.05610ÿ4
Prunin Naringenin 7-O-Glu 2.73610ÿ4
Hesperetin-5-glucoside Hesperetin 5-O-Glu > 6610ÿ4
Naringenin 4 0,5,7-Trihydroxy¯avanone 1.82610ÿ4
Flavan (-)Epigallocatechin(EGC) 3 0,4 0,5 0,3,5,7-Hexahydroxy¯avan 1.05610ÿ4
Figure 1. Structure and numbering of ¯avonoid derivatives.
QSAR T. Moon, M.H. Chi, D.-H. Kim, C.N. Yoon and Y.-S. Choi
258 Quant. Struct.-Act. Relat., 19 (2000)
All conformational searches and molecular mechanics cal-
culations were carried out using Discover2.97 package (MSI
Inc.) with CVFF (Consistent Valence Force Field) imple-
mented on SGI Indigo2 (R4400) workstation.
2.3 QSAR Analysis
2.3.1 Selection of Training Set and Multiple Linear
Regression Analysis
In order to ensure reliability of our model, the data set was
divided into the training and test sets. The least-squares
estimate is b� (X 0X)ÿ1X 0y in the linear model
y � Xb� e �1�
where y is an N61 vector of independent variables, X is an
N6k matrix of dependent variables, b is the k61 vector of
coef®cients to be estimated, and e is an N61 vector of error
terms. Several criteria have been used for selection of the best
subset. One of the most popular criteria is the maximization
of jX 0Xj which is used in D-optimal design. In this work, the
stepwise addition method was applied for the D-optimal
design of Mitchell [13] for selection of training set. The
subsets having high jX 0Xj were listed in the order of
decreasing jX 0Xj. Then, one sample was added to the subsets
and the new subsets were listed again. This procedure was
repeated to reach a given number of members in the training
set. The number of training set members was set to 14 taking
into consideration of the ratio of the training to test sets. Prior
to the D-optimal design, the crossvalidation was performed
in order to ®nd out the optimum descriptors. Up to ten
descriptors, all the possible combinations of descriptors were
used in the crossvalidation to ®nd out the best regression
model and four descriptors were obtained ®nally. The multi-
collinearity among descriptors was identi®ed using variance
in¯ation factor (VIF) [14]. The VIF for the ith regression
coef®cient expressed as
VIF � 1
1ÿ r2i
�2�
is the coef®cient produced by regressing the descriptor xi
against the other descriptors, the xj (j 6� i). The models of
which VIF is greater than 10 were not considered. The
predictivity of the model is quantitated in terms of r2 which
is de®ned as
r2 � 1:0ÿP �ypred ÿ yactual�2P �yactual ÿ ymean�2
�3�
where ypred, yactual, and ymean are predicted, actual, and mean
values of the target property, respectively.
2.3.2 Neural Networks
Arti®cial neural networks consist of layers of which the
outputs are connected to the other neurons. While there are
many different arti®cial neural network architectures, the
most popular network used in QSAR is multi-layer feed-
forward network [15]. In this type of network the neurons are
arranged into groups called layers; an input layer, an output
layer and various number of hidden layers. The number of
neurons and layers depends on the number of descriptors in
the data set, the number of compounds and the type of output.
In this study, the back propagation neural network (BPN) [16,
17] was applied for the learning phase. The number of layers
is arbitrary and generally consists of n layers. The value of a
neuron Oj at the nth layer may be expressed
Oj �1
eÿayj�4�
yj � �P
j Wijxi� � yj �5�
where xi is one of the values of the neurons at the n±1 layer,
Wij is the connection weight to neuron j from neuron i, and yj
is a bias term. The training is carried out until a mean square
error (MSE) becomes small enough. The MSE is
MSE �P �tj ÿ Oj�2
�no: of compds:� no: of output units� �6�
where tj and Oj are the desired output and calculated output,
respectively. The calculated output was obtained by aver-
aging neural network predictions over several independent
networks in order to avoid the local minimum [18]. The
connection weights are iteratively changed to minimize MSE
as follows
W 0ij � Wij ÿ Z@MSE
@Wij
�7�
where W 0ij is the weight after iteration, MSE is a mean square
error, Z is a momentum. The values of input layer are rescaled
to have values between circa 0.1 and 1.0 by the scaling
equation
x 0ij �xij ÿ xmin � 0:1
xmax ÿ xmin � 0:1�8�
where xij is the value of nth descriptor, xmin and xmax are its
minimum and maximum values.
Flavonoid Derivatives for Inhibition of Cytochrome P450 1A2 QSAR
Quant. Struct.-Act. Relat., 19 (2000) 259
2.3.3 Pruning Method for Descriptor Selection
The importance of neurons in hidden or input layers was
estimated according to sensitivity [18±19].
Si �P wji
maxa jwjaj
!2
�Sj �9�
where maxa is the maximum weight of all weights ending at
neuron j and Sj is a sensitivity in the upper layer. The neuron
having the greatest value Si gives the most important in¯u-
ence on all other neurons in the next layer. So the sensitivities
of neurons in input and hidden layers were calculated and the
neurons with the small sensitivities were deleted. The prun-
ing was carried out iteratively to achieve a given number of
neurons which have small mean square error (MSE) accord-
ing the following procedure [20] ; (1) choose a large size
network and determine the number of neurons to be pruned to
step, (2) training the network, (3) compute neuron sensitivity
for every N-epochs, (4) delete the neurons with low sensi-
tivity, and (5) if stopping criterion is not met, go to step (2).
The sensitivities of output layer neurons are set to 1. All the
sensitivities in a layer are normalized to a maximum value
of 1.
3 Results and Discussion
3.1 Multiple Linear Regression Analysis
Table 2 shows the 14 training set members selected using D-
optimal design. Prior to training set selection, the four
descriptors were obtained by crossvalidation. Hesperetin-5-
glucoside was not used in the selection of training set
members because of its unde®ned IC50 value of
46� 10ÿ4 (Table 1). The high crossvalidated r2 value of
0.719 was achieved using four descriptors of Hammett
constant (Sig), the highest occupied molecular orbital energy
(HOMO), the HOMO p coef®cient of C3 carbon atom (Cp3),
and the HOMO p coef®cient of C30 carbon atom (Cp3
0). The
optimum MLR model is as follows
ÿlog�IC50� � 2:289�Sig� ÿ 2:295�HOMO� ÿ 2:580�Cp3���1:223� ��0:597� ��0:575��2:761�Cp 03� ÿ 15:795 �10���0:942� ��5:426�
This result suggests that the activity depends linearly on
Hammett constant, the highest occupied molecular orbital
energy (HOMO), the HOMO p coef®cients of C3 and C30
carbon atoms (Cp3 and Cp30). Since Hammett constant, the
partial charge, and HOMO p coef®cient are mainly depen-
dent on the substituents, the substituents of C3 and C30 carbon
atoms have an in¯uence on the activity. However, it is not
always possible to discuss about the effect of individual term
according to its coef®cient since the coef®cient results from
the combination of all descriptors. In our MLR model, the
individual descriptors of Sig, HOMO, Cp3, and Cp30 gave
low r2 values of 0.448, 0.055, 0.249, and 0.017, respectively,
so it is impossible to discuss about the effect of individual
descriptor according to its coef®cient.
3.2 Neural Networks
The neural networks (NN) were carried out using two kinds
of descriptor sets; (1) descriptors used in multiple linear
regression analysis (MLR) and (2) descriptors selected using
pruning method. The pruning was performed using one input
layer - one hidden layer - one output layer neural network
architecture. The learning rate and momentum were set to 0.7
and 0.3, respectively. The network was not allowed to run
more than 10000 epochs since full network training takes lots
of computation time. In order to avoid the local minimum, the
neuron sensitivity was obtained by averaging the sensitivities
over 10 independent networks. The 4-4-1 (an input layer with
four neurons, a hidden layer with four neurons, and an output
layer with one neuron) network architecture was ®nally
determined starting from the 23-23-1 one. The selected
descriptors were the nonoverlap steric volume (Steric), the
partial charge of C3 carbon atom (C3), the HOMO p coef®-
cients of C3 and C40 carbon atoms (Cp3 and Cp4
0) (Table 3).
The nonoverlap steric volume between each ¯avon deriva-
Table 2. The descriptors used in multiple linear regressionanalysis.
Compounds Sig(a) HOMO(b) Cp3(c) Cp3
0(d)
Training Set
Chrysin 0.00 ÿ9.267 ÿ0.457 ÿ0.076
Galangin 0.00 ÿ8.812 ÿ0.482 ÿ0.090
Morin ÿ0.25 ÿ8.839 ÿ0.445 ÿ0.000
Luteolin ÿ0.25 ÿ9.096 0.221 0.176
Naringenin ÿ0.37 ÿ9.282 0.131 ÿ0.027
Quercetin ÿ0.25 ÿ8.590 ÿ0.387 ÿ0.267
Myricetin ÿ0.13 ÿ8.613 ÿ0.360 ÿ0.205
EGC ÿ0.13 ÿ8.726 ÿ0.088 0.135
Hesperetin ÿ0.15 ÿ8.892 0.086 ÿ0.237
Prunin ÿ0.37 ÿ9.187 0.352 0.057
Populnin ÿ0.37 ÿ8.648 ÿ0.096 ÿ0.089
Avicularin ÿ0.25 ÿ8.841 ÿ0.383 ÿ0.267
Quercitrin ÿ0.25 ÿ9.003 0.318 0.208
Panasenoside ÿ0.25 ÿ9.250 0.445 ÿ0.012
Test Set
Kaempferol ÿ0.37 ÿ8.643 ÿ0.452 ÿ0.179
Apigenin ÿ0.37 ÿ9.102 ÿ0.464 ÿ0.206
Neohesperidin ÿ0.15 ÿ9.025 0.142 ÿ0.197
Fisetin ÿ0.25 ÿ8.980 0.177 0.069
Hesperetin 5-glucoside ÿ0.15 ÿ8.833 0.033 ÿ0.196
(a)Sig�Hammett constant(b)HOMO� highest occupied molecular orbital energy(c)Cp3�HOMO p coef®cient of C3 carbon atom(d)Cp3
0 �HOMO p coef®cient of C30 carbon atom
QSAR T. Moon, M.H. Chi, D.-H. Kim, C.N. Yoon and Y.-S. Choi
260 Quant. Struct.-Act. Relat., 19 (2000)
tive and the shape reference molecule (¯avon) depends on the
size of the substituents, so the size of the substituents has an
in¯uence on the activity. The HOMO p coef®cient of C3
atom is also selected in MLR, which implies the substitution
of C3 carbon atom plays an important role in inhibitory
activity. The full BPN was then performed with the 4-4-1
network architecture. The calculated output was obtained by
averaging neural network predictions over 50 independent
networks in order to avoid the local minimum. The experi-
mental and calculated ÿlog(IC50) values of training set in
multiple linear regressions analysis (MLR) and neural net-
works (NN) are shown in Table 4, in which the training set
shows good correlations between the descriptors and the
activities in NN. The r2 values of MLR, NN1 (NN using MLR
descriptors), and NN2 (NN using descriptors selected by
pruning) in training set are 0.867, 0.947, and 0.984, respec-
tively. Although the descriptors of optimum MLR model
were used as inputs of neural networks, the correlations
between the descriptors and the activities were improved
by neural networks. These results show that the descriptors
used in MLR model include nonlinear relationships. More-
over, neural networks (NN2) using descriptors selected by
the pruning method gave higher r2 value than neural networks
(NN1) using MLR descriptors. From the results, it is implied
that the descriptor selection method based on linearity (i.e.
multiple linear regression or crossvalidation) is not suf®cient
Table 3. The descriptors selected by pruning method.
Compounds Steric(a) C3(b) Cp3
(c) Cp40(d)
Training Set
Chrysin 13.01 ÿ0.287 ÿ0.457 ÿ0.200
Galangin 24.17 ÿ0.086 ÿ0.482 ÿ0.251
Morin 58.29 ÿ0.069 ÿ0.445 ÿ0.049
Luteolin 44.63 ÿ0.270 0.221 0.183
Naringenin 66.91 ÿ0.241 0.131 ÿ0.030
Quercetin 36.83 ÿ0.084 ÿ0.387 ÿ0.339
Myricetin 65.53 ÿ0.080 ÿ0.360 ÿ0.383
EGC 82.78 0.013 ÿ0.088 0.249
Hesperetin 84.24 ÿ0.232 0.086 ÿ0.233
Prunin 164.16 ÿ0.264 0.352 0.090
Populnin 137.70 ÿ0.091 ÿ0.096 ÿ0.141
Avicularin 130.19 ÿ0.079 ÿ0.383 ÿ0.315
Quercitrin 166.89 ÿ0.086 0.318 0.230
Panasenoside 280.29 ÿ0.034 0.445 0.007
Test Set
Kaempferol 29.97 ÿ0.091 ÿ0.452 ÿ0.287
Apigenin 20.06 ÿ0.293 ÿ0.464 ÿ0.300
Neohesperidin 292.82 ÿ0.272 0.142 ÿ0.207
Fisetin 52.92 ÿ0.070 0.177 0.068
Hesperetin 5-glucoside 196.07 ÿ0.230 0.033 ÿ0.195
(a)Steric� nonoverlap steric volume(b)C3� partial charge of C3 carbon atom(c)Cp3�HOMO p coef®cient of C3 carbon atom(d)Cp4
0 �HOMO p coef®cient of C40 carbon atom
Table 4. Experimental versus calculated 7 log(IC50) values in multiple linear regression analysis(MLR) and neural networks (NN1 and NN2).
MLR(a) NN1(b) NN2(c)
ÿlog(IC50) ÿlog(IC50) residual ÿlog(IC50) residual ÿlog(IC50) residual
Compounds (exp) (calc) (calc) (calc)
Training Set
Chrysin 6.70 6.439 ÿ0.261 6.494 ÿ0.206 6.673 ÿ0.027
Galangin 5.51 5.421 ÿ0.089 5.531 0.021 5.297 ÿ0.213
Morin 5.02 4.938 ÿ0.082 5.097 0.077 5.137 0.117
Luteolin 4.87 4.296 ÿ0.574 4.420 ÿ0.450 4.853 ÿ0.017
Naringenin 3.74 4.060 0.320 4.153 0.413 3.809 0.069
Quercetin 3.77 3.480 ÿ0.290 3.466 ÿ0.304 4.058 0.288
Myricetin 3.73 3.969 0.239 3.514 ÿ0.216 3.653 ÿ0.077
EGC 3.97 4.466 0.496 4.011 0.041 3.898 ÿ0.072
Hesperetin 3.57 3.315 ÿ0.255 3.467 ÿ0.103 3.551 ÿ0.019
Prunin 3.56 3.504 ÿ0.056 3.628 0.068 3.485 ÿ0.075
Populnin 3.49 3.020 ÿ0.471 3.459 ÿ0.031 3.494 0.004
Avicularin 3.42 4.046 0.626 3.594 0.174 3.520 0.100
Quercitrin 3.65 3.921 0.271 3.823 0.173 3.477 ÿ0.173
Panasenoside 3.43 3.553 0.123 3.606 0.176 3.466 0.036
Test Set
Kaempferol 4.13 3.678 ÿ0.452 3.505 ÿ0.625 4.750 0.620
Apigenin 5.87 4.688 ÿ1.182 4.839 ÿ1.031 6.556 0.686
Neohesperidin 3.30 3.586 0.286 3.508 0.208 3.467 0.167
Fisetin 3.63 3.848 0.218 3.677 0.047 3.640 0.010
Hesperetin 5-glucoside 3.22 3.429 0.209 3.474 0.254 3.473 0.253
(a)MLR�multiple linear regression analysis(b)NN1� neural networks using MLR descriptors(c)NN2� neural networks using descriptors selected by pruning method
Flavonoid Derivatives for Inhibition of Cytochrome P450 1A2 QSAR
Quant. Struct.-Act. Relat., 19 (2000) 261
for nonlinear ®tting (i.e. neural networks). The test set also
shows good predictivity. The predictive r2 values are 0.626,
0.671, and 0.800 in MLR, NN1, and NN2, respectively.
Statistics of MLR, NN1, and NN2 are summarized in Table 5.
The weight values of NN1 are listed in Table 6. In order to
compare the weight values of NN with the regression
coef®cients of MLR, the MLR was carried out using the
input values which are rescaled to have values between circa
0.1 and 1.0. The regression coef®cients of Sig, HOMO, Cp3,
and Cp30 are 0.196,ÿ0.271,ÿ0.395, and 0.237, respectively.
Since the signs of weights between hidden and output units
are all positive, the signs of weights between input and
hidden units can decide the signs of overall weight. The
weight values of Sig and Cp30 between input and hidden
layers are all positive and those of HOMO and Cp3 are all
negative. These results show good agreement with MLR in
which the regression coef®cients of Sig and Cp30 are positive
and those of HOMO and Cp3 are negative. The total weights
of descriptors were calculated by the following equation
Wtot �P
i Wij �Wjk �11�
where Wtot is the total weight of input descriptor i, Wij is the
weight between input and hidden units, and Wjk is the weight
between hidden and output units. The Wtots of Sig, HOMO,
Cp3, and Cp30 are 8.509, ÿ22.338, ÿ24.240, and 18.663,
respectively. The signs of Wtot and regression coef®cient are
the same. The Wtots and regression coef®cients of Sig and
Cp30 are positive and those of HOMO and Cp3
0 are negative.
The order of magnitude of Wtots and regression coef®cients is
also same (Cp34HOMO4Cp304Sig), which shows good
agreement between Wtots and regression coef®cients.
4 Conclusions
In this study the multiple linear regression analysis and
neural networks were carried out in order to obtain the
information about the inhibitory activity on cytochrome
P450 1A2. The D-optimal design and pruning method were
tried for selection of training set and descriptors, respec-
tively. The MLR, NN1, and NN2 showed good correlations
between the descriptors and the activities in training set (r2
values of 0.867, 0.947, and 0.984, respectively). The correla-
tions were improved by neural networks, although the
descriptors of optimum MLR model were used as inputs of
neural networks. These results imply that the descriptors used
in MLR model include nonlinear relationships. The weight
values of NN1 showed good agreement with the regression
coef®cients of MLR. Moreover, neural networks (NN2)
using descriptors selected by the pruning method gave higher
r2 value than neural networks (NN1) using MLR descriptors,
which suggests that the descriptor selection method based on
linearity is not suf®cient for nonlinear ®tting.
References
[1] Hansch, C., Maloney, P.P., Fujita, T., and Muir, R.M.,Correlation of biological activity of phenoxyacetic acids withHammett substitution constants and partition coef®cients,Nature 194, 178±180 (1962).
[2] Hansch, C., A quantitative approach to biochemical structure-activity relationships, Acc. Chem. Res. 2, 232±239 (1969).
[3] Andrea, T. A., and Kalayeh, H., Applications of Neuralnetworks in quantitative structure-activity relationships ofdihydrofolate reductase inhibitors, J. Med. Chem. 34, 2824±2836 (1991).
[4] Aoyama, T., Suzuki, Y., and Ichikawa, H., Neural networksapplied to quantitative structure-activity relationship analy-sis, J. Med. Chem. 33, 2583±2590 (1990).
Table 6. The weight values of NN1(a).
input hidden output weight
unit unit unit value
[between input and hidden units]
Vol 1st 2.396
Vol 2nd 2.497
Vol 3rd 2.399
Vol 4th 2.501
C3 1st ÿ6.334
C3 2nd ÿ6.428
C3 3rd ÿ6.323
C3 4th ÿ6.433
Cp3 1st ÿ6.914
Cp3 2nd ÿ6.812
Cp3 3rd ÿ6.912
Cp3 4th ÿ6.809
Cp50 1st 5.330
Cp50 2nd 5.227
Cp50 3rd 5.326
Cp50 4th 5.218
[between hidden and output units]
1st 1st 1.418
2nd 1st 0.414
3rd 1st 1.345
4th 1st 0.341
(a)NN1� neural networks using MLR descriptors
Table 5. Statistics of multiple linear regression analysis (MLR)and neural networks (NN1 and NN2).
Training Set Test Set
r2 s r2 s
MLR(a) 0.867 0.346 0.626 0.596
NN1(b) 0.946 0.219 0.671 0.559
NN2(c) 0.984 0.121 0.800 0.453
(a)MLR�multiple linear regressions(b)NN1� neural networks using MLR descriptors(c)NN2� neural networks using descriptors selected by pruning method
QSAR T. Moon, M.H. Chi, D.-H. Kim, C.N. Yoon and Y.-S. Choi
262 Quant. Struct.-Act. Relat., 19 (2000)
[5] Maddalena, D.J. and Johnston, G.A., Prediction of receptorproperties and binding af®nity of ligands to benzodiazepi-neyGABAA receptors using arti®cial neural networks, J.Med. Chem. 38, 715±724 (1995).
[6] Manallack, D.T., Relating biological activity to chemicalstructure using neural networks, Pestic. Sci. 45, 167±170(1995).
[7] Stavric, B., Biological signi®cance of trace levels ofmutagenic heterocyclic aromatic amines in human diet: acritical review, Food. Chem. Toxicol. 32, 977±994 (1994).
[8] Brown, J.P., A review of the genetic effects of naturallyoccurring ¯avonoids anthraquinones, and related compounds,Mutation Research 75, 243±277 (1980).
[9] Galati, E.M., Monforte, M.T., Kirjavainen, S., Forestier,A.M., Trovato, A., and Tripodo, M.M., Biological effects ofhesperidin, a citrus ¯avonoid, Farmaco 40, 709±717 (1994).
[10] Nagai, T., Suzuki, Y., Tomimori, T., and Yamada, H.,Antiviral activity of plant ¯avonoid, Biol. Pharm. Bull. 18,295±302 (1995).
[11] Shimoi, K., Masuda, S., Furugori, M., Esaki, S., and Kinae,N., Radio protective effect of anti-oxidative ¯avonoids ingamma ray irradiated mice, Carcinogenesis 15, 2669±2675(1994).
[12] Lee, H., Yeom, H., Kim, Y.K., Yoon, C.N., Jin, C., Choi,J.S., Kim, B.R., and Kim, D.H., Structure-related inhibitionof human N3-demethylation by naturally occurring ¯avo-noids, Biochem. Pharmacol. 55, 1369±1375 (1998).
[13] Mitchell, T.J., An algorithm for the construction of D-Optimal experimental designs, Technometrics 16, 203±210(1974).
[14] Myers, R.H., Classical and modern regression with applica-tions. PWSyKENT, Boston 1990.
[15] Salt, D. W., Yildiz, N., Livingstone, D.J. and Tinsley, J., Theuse of arti®cial neural networks in QSAR, Pestic. Sci. 36,161±170 (1992).
[16] SchuÈuÈrmann, G., and MuÈller, E., Back-propagation neuralnetworks recognition vs. prediction capability, Environmen-tal Toxicology and Chemistry 13, 743±747 (1994).
[17] Zupan, J., and Gasteiger, J., Neural networks for chemists.VCH, Weinheim 1993.
[18] Tetko, I.V., Livingstone, D.J., and Luik, A.I., Neural networkstudies. 1. Comparison of over®tting and overtraining, J.Chem. Inf. Comput. Sci. 35, 826±833 (1995).
[19] Tetko, I.V., Villa, A.E.P., Livingstone, D.J., Neural networkstudies. 2. Variable selection, J. Chem. Inf. Comput. Sci. 36,794±803 (1996).
[20] Babri, H.A., Kot, A.C., Tan, N.T., and Tang, J.G., Dynamicpruning algorithms for improving generalisation of neuralnetworks, ICICS '97, 679±683 (1997).
Received on October 1, 1999; accepted on December 17, 1999
Flavonoid Derivatives for Inhibition of Cytochrome P450 1A2 QSAR
Quant. Struct.-Act. Relat., 19 (2000) 263