quantitative structure–activity relationships (qsar): studies of inhibitors of tyrosine kinase

9
European Journal of Pharmaceutical Sciences 20 (2003) 63–71 www.elsevier.com / locate / ejps Quantitative structure–activity relationships (QSAR): studies of inhibitors of tyrosine kinase a a,b a a a, * ¨ Qi Shen , Qing-Zhang Lu , Jian-Hui Jiang , Guo-Li Shen , Ru-Qin Yu a State Key Laboratory of Chemo / Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, Changsha 410082, PR China b College of Chemistry and Environmental Science, Henan Normal University, Xinxiang 453002, PR China Received 31 January 2003; received in revised form 10 June 2003; accepted 16 June 2003 Abstract A quantitative structure–activity relationship (QSAR) study of the 1-phenylbenzimidazoles as inhibitors of the platelet-derived growth factor receptor (PDGFR) was performed. Some new electronic parameters Q , Q and Q are suggested for characterizing the effect of o m p substituents. Many other descriptors are also used which are selected by evolution algorithm (EA) using modified Cp as objective function proposed by the present authors. The descriptor Q is shown to be an important variable to express effect of substituents. The variable m selection shows that spatial descriptors are most important variables revealing important properties of the inhibitors. Electron-releasing substitutes at 5-position and the absence of bulky groups at 4,7-positions of the parent structure can enhance inhibitor activity. Principal component analysis is performed to classify this series of compounds. 2003 Elsevier B.V. All rights reserved. Keywords: Platelet-derived growth factor receptor; Electronic parameters; Modified Cp statistic; QSAR; 1-Phenylbenzimidazoles 1. Introduction following vascular interventions but also are involved in the development of tumor angiogenesis. Inhibitors of Traditionally, anticancer drugs have been targeted at PDGFR are of interest as potential anticancer drugs. Many inhibiting DNA synthesis and function during mitosis. tumors, particularly gliomas and sarcomas, undergo auto- However, these drugs appear to be limited both in the crine PDGFR activation that can be inhibited by PDGF degree of efficacy of cell killing that they can induce and antisera ( Palmer et al., 1998, 1999). in the selectivity with respect to tumor and normal cells, A large number of different classes of compounds especially in organs that require rapid cellular proliferation ( Maguire et al., 1994; Dolle et al., 1994) has been reported for full potency. Abnormal activity of tyrosine kinases has as selective inhibitors to the activity of PDGFR. been implicated in many cancers and a large number of 1-Phenylbenzimidazoles ( Palmer et al., 1998, 1999) are inflammatory responses ( Kurup et al., 2001). Inhibitors of shown to be a new class of adenosine triphosphate (ATP) tyrosine kinase as a new kind of effective anticancer drug site inhibitors of PDGFR. A number of structure–activity are important mediators of cellular signal transduction that studies involving 1-phenylbenzimidazoles as inhibitors of affects growth factors and oncogenes on cell proliferation. PDGFR have been published ( Oblak et al., 2000; Zhu et The development of tyrosine kinase inhibitors has there- al., 2001; Pierre et al., 2000; Naumann and Matter, 2002). fore become an active area of pharmaceutical science. There is, however, a lack of well-defined quantitative Platelet-derived growth factor receptor (PDGFR) which structure–activity relationships (QSARs) for this system. A plays a vital role as a regulator of cell growth is one of the QSAR study on a limited set of 22 1-phenylben- intensely studied tyrosine kinase targets of inhibitors. zimidazoles ( Kurup et al., 2001) has been performed, but Inhibitors of the PDGFR not only can prevent restenosis quite a few compounds could not be included in the regression because of the lack of the electronic parameter s . In this study, we propose electronic charge parameters 1 *Corresponding author. Tel.: 186-731-882-1577; fax: 186-731-882- to replace the electric parameters s for characterizing the 2577. 1 E-mail address: [email protected] (R.Q. Yu). effect of substituents. Many other descriptors are also used 0928-0987 / 03 / $ – see front matter 2003 Elsevier B.V. All rights reserved. doi:10.1016 / S0928-0987(03)00170-2

Upload: qi-shen

Post on 18-Sep-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Quantitative structure–activity relationships (QSAR): studies of inhibitors of tyrosine kinase

European Journal of Pharmaceutical Sciences 20 (2003) 63–71www.elsevier.com/ locate/ejps

Q uantitative structure–activity relationships (QSAR): studies of inhibitorsof tyrosine kinase

a a,b a a a ,*¨Qi Shen , Qing-Zhang Lu , Jian-Hui Jiang , Guo-Li Shen , Ru-Qin YuaState Key Laboratory of Chemo /Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University,

Changsha 410082,PR ChinabCollege of Chemistry and Environmental Science, Henan Normal University, Xinxiang 453002,PR China

Received 31 January 2003; received in revised form 10 June 2003; accepted 16 June 2003

Abstract

A quantitative structure–activity relationship (QSAR) study of the 1-phenylbenzimidazoles as inhibitors of the platelet-derived growthfactor receptor (PDGFR) was performed. Some new electronic parametersQ , Q andQ are suggested for characterizing the effect ofo m p

substituents. Many other descriptors are also used which are selected by evolution algorithm (EA) using modified Cp as objective functionproposed by the present authors. The descriptorQ is shown to be an important variable to express effect of substituents. The variablem

selection shows that spatial descriptors are most important variables revealing important properties of the inhibitors. Electron-releasingsubstitutes at 5-position and the absence of bulky groups at 4,7-positions of the parent structure can enhance inhibitor activity. Principalcomponent analysis is performed to classify this series of compounds. 2003 Elsevier B.V. All rights reserved.

Keywords: Platelet-derived growth factor receptor; Electronic parameters; Modified Cp statistic; QSAR; 1-Phenylbenzimidazoles

1 . Introduction following vascular interventions but also are involved inthe development of tumor angiogenesis. Inhibitors of

Traditionally, anticancer drugs have been targeted at PDGFR are of interest as potential anticancer drugs. Manyinhibiting DNA synthesis and function during mitosis. tumors, particularly gliomas and sarcomas, undergo auto-However, these drugs appear to be limited both in the crine PDGFR activation that can be inhibited by PDGFdegree of efficacy of cell killing that they can induce and antisera (Palmer et al., 1998, 1999).in the selectivity with respect to tumor and normal cells, A large number of different classes of compoundsespecially in organs that require rapid cellular proliferation (Maguire et al., 1994; Dolle et al., 1994) has been reportedfor full potency. Abnormal activity of tyrosine kinases has as selective inhibitors to the activity of PDGFR.been implicated in many cancers and a large number of 1-Phenylbenzimidazoles (Palmer et al., 1998, 1999) areinflammatory responses (Kurup et al., 2001). Inhibitors of shown to be a new class of adenosine triphosphate (ATP)tyrosine kinase as a new kind of effective anticancer drug site inhibitors of PDGFR. A number of structure–activityare important mediators of cellular signal transduction that studies involving 1-phenylbenzimidazoles as inhibitors ofaffects growth factors and oncogenes on cell proliferation. PDGFR have been published (Oblak et al., 2000; Zhu etThe development of tyrosine kinase inhibitors has there- al., 2001; Pierre et al., 2000; Naumann and Matter, 2002).fore become an active area of pharmaceutical science. There is, however, a lack of well-defined quantitativePlatelet-derived growth factor receptor (PDGFR) which structure–activity relationships (QSARs) for this system. Aplays a vital role as a regulator of cell growth is one of the QSAR study on a limited set of 22 1-phenylben-intensely studied tyrosine kinase targets of inhibitors. zimidazoles (Kurup et al., 2001) has been performed, butInhibitors of the PDGFR not only can prevent restenosis quite a few compounds could not be included in the

regression because of the lack of the electronic parameters . In this study, we propose electronic charge parameters1*Corresponding author. Tel.:186-731-882-1577; fax:186-731-882-to replace the electric parameterss for characterizing the2577. 1

E-mail address: [email protected](R.Q. Yu). effect of substituents. Many other descriptors are also used

0928-0987/03/$ – see front matter 2003 Elsevier B.V. All rights reserved.doi:10.1016/S0928-0987(03)00170-2

Page 2: Quantitative structure–activity relationships (QSAR): studies of inhibitors of tyrosine kinase

64 Q. Shen et al. / European Journal of Pharmaceutical Sciences 20 (2003) 63–71

in the QSAR study of PDGFR inhibitors, and variables are (i 5 o, m, p for ortho-, meta- and para-positions, respec-selected by evolution algorithm (EA) using modified Cp tively) of all substituting groups connected with the(Shen et al., 2003) as objective function as proposed by the benzene ring moiety of benzimidazole. An indicator vari-present authors. In order to rationalize the structure–activi- able for the 5-substituted derivatives, I is assigned the5

ty relationship in depth, a classification result based on value of 1 when a substituting group is present at 5-principal component analysis (PCA) was obtained for position, and a value of 0 is assigned otherwise.comparison, and the main factors affecting to the inhib- A series of other molecular descriptors are calculated foritory activity of 1-phenylbenzimidazoles considered have 1-phenylbenzimidazoles derivatives including spatial,been identified. structural, electronic, quantum mechanical, thermodynamic

descriptors, and E-State indices. The spatial descriptors(Stanton and Jurs, 1990; Rohrbaugh and Jurs, 1987) used

2 . Material and methods involve radius of gyration (RadOfGyration), density,principal moment of inertia (PMI), molecular volume,

2 .1. Data sets Verloop’s sterimol parameter (B5) and shadow indices.Structural descriptors include the molecular weight (M ),w

A group of 75 1-phenylbenzimidazole derivatives the number of rotatable bonds (Rotbonds) and the number(Palmer et al., 1998, 1999) which have substituents only of hydrogen bond (Hbond acceptor, Hbond donor). Theon benzimidazole ring is used in this study. The molecular electronic descriptors (Molecular Simulations, 1997) takenstructure and numbering of substituents in the series of are concerning surperdelocalizability (Sr), atomic1-phenylbenzimidazole derivatives are shown inFig. 1. A polarizabilities (Apol), and the dipole moment (Dipole).list of the compounds studied along with their inhibitory Quantum mechanical descriptors include the energy of thedata is summarized inTable 1. Inactive 1-phenylben- highest occupied molecular orbital (HOMO), the energy ofzimidazoles are automatically assigned the value of lg the lowest unoccupied molecular orbital (LUMO), charge(1/ IC ) of 4.3 (IC 550 mM). This data set of 75 distribution-related descriptorsQ , Q , Q and Q as50 50 N-3 o m p

1-phenylbenzimidazoles is randomly divided into two described above. The thermodynamic descriptors (Vis-groups with 55 compounds used as training set for wanadhan et al., 1989) are taken describing the hydro-developing regression models and the remaining 20 com- phobic character (lgP, logarithm of the partition coeffi-pounds used only as the validation set in the prediction of cient in octanol–water), refractivity (MolRef, molar refrac-biological activities. tivity) and the dissolution free energy for water and

octanol (Fh2o, desolvation free energy for H O; Foct,2

2 .2. Descriptors desolvation free energy for octanol). Electrotopological-state indices (E-State indices) (Hall and Kier, 1991, 1995)

The charge densities of particular atoms in the molecule used involve S-aaaC, S-aaN, S-aaCH etc. For example, inconcerned, for example, the charge of the nitrogen atom in the symbol S-aaaC, S represents electronic topologicalposition 3,Q , are calculated using AM1 method. The state of atom; a stands for the bond in an aromatic ring;N-3

charge densities of the atom in theortho- (q ), meta- (q ) and C represents the carbon.o m

and para- (q ) positions of a benzene ring when a After elimination of zero variance descriptors andp

substituting group is connected with the ring are obtained descriptors that are difficult to interpret, there are 47from corresponding charges with and without substitution; variables for describing compounds.Table 2 summarizesq andq are the average values of the charge densities of all the molecular descriptors used in this QSAR study.o m

two ortho- and meta-positions, respectively. The descrip- Calculation of quantum mechanical descriptors is per-tors Q 2 s (i 5 o, m, p for ortho-, meta- and para- formed with the AM1 semiempirical quantum chemistryi

positions, respectively) are defined as the sum ofq 2 s method inHYPERCHEM 6.0 software package on PC. Otheri2molecular descriptors are generated using theCerius

QSAR1, 199723.5 soft system on silicon graphics R3000

workstation. The evolutionary algorithm was written inMATLAB 5.3 and run on a personal computer (Intel pentiumprocessor 4/1.5 GHz 256 MB RAM).

2 .3. Methods

Evolution algorithm (EA) (Hasegawa et al., 1997; Luke,1994; Kubingi, 1996) is employed as the searching pro-cedure in variable selection. A chromosome is formulatedby a binary bit string, and each bit represents a descriptor.

Fig. 1. Structure of 1-phenylbenzimidazole. When a descriptor is selected, a value of 1 is given, and

Page 3: Quantitative structure–activity relationships (QSAR): studies of inhibitors of tyrosine kinase

Q. Shen et al. / European Journal of Pharmaceutical Sciences 20 (2003) 63–71 65

T able 1Summary of experimental and calculated biological activities of 1-phenylbenzimidazoles derivatives along with their structures used in QSAR study

cNo. Substituent Log (1/ IC ) Series50

a bObserved Calculated

Eq. (1) Eq. (2) Eq. (3) Eq. (4)

1 H 5.03 5.1359 4.7599 4.9992 4.9329 12 4-OMe 4.301 4.3768 4.4525 4.3627 4.4223 13 4-OH 4.8539 4.739 4.5956 4.5467 4.6155 14 5-Me 5.3565 5.3561 5.554 5.438 5.5506 15 5-OMe 6.3665 5.6684 6.052 5.6253 5.7893 26 5-OH 6.3565 5.2653 5.7457 5.3404 5.5533 17 5-Cl 5.3979 5.3138 5.5814 5.3213 5.471 28 5-COOH 5.03 5.4108 5.4938 5.396 5.4773 19 5-COOMe 6.081 5.615 5.7474 5.6778 5.6999 1

10 5-CONH 4.7959 5.5311 5.4773 5.2112 5.3022 12

11 5-NO 4.7959 4.9008 5.4828 4.9566 5.135 22

12 5-COMe 6.0655 5.4794 5.4554 5.4902 5.5481 213 5-CHO 6.3665 5.3478 5.4554 5.4582 5.5225 114 5-OC H 6.6021 5.9993 6.0853 6.0759 6.1616 13 7

15 5-OEt 6.6198 5.7107 6.0759 5.8525 5.9783 216 5-OCH(Me) 5.5086 5.9601 6.0891 5.8629 5.9867 12

17 5-OC H 5.8861 6.3216 6.0911 6.2968 6.3428 14 9

18 5-OCH CH=CH 6.2147 6.196 6.0559 6.0734 6.1487 12 2

19 5-O(CH ) OH 6.3468 6.4967 6.0661 6.4462 6.4415 22 4

20 5-OCH (oxiranyl) 6.4969 5.9346 6.2982 6.0852 6.1119 12

21 5-OCH CH(OH)CH OH 6.5086 6.2273 6.0603 6.1434 6.1803 12 2

22 5-O(CH ) NH 6.1871 5.9207 6.05 5.7096 5.8214 22 2 2

23 5-O(CH ) N(Me) 5.8239 6.2111 6.0655 6.3 6.3318 22 2 2

24 5-O(CH ) N(Me) 6.8239 6.6366 6.0745 6.5148 6.5109 12 3 2

25 5-O(CH ) N(Me) 6.7959 5.9882 6.0769 6.7395 6.6952 12 4 2

26 5-O(CH ) Nmorph 6.1367 5.9428 6.4571 5.9365 6.0292 12 2

27 5-O(CH ) Nmorph 6.7696 5.9131 6.4602 6.1577 6.2107 12 3

28 5-O(CH ) Nmorph 6.5686 6.7481 6.4623 6.3843 6.3965 12 4

29 5-SH 5.4815 5.3269 5.6362 5.23 5.4106 130 5-SMe 6.1308 5.7202 5.6307 5.5915 5.7161 131 5-OCSN(Me) 5.3372 5.5928 6.0237 5.7367 5.8605 12

32 6-Me 4.3979 5.1188 4.7763 4.9832 4.9283 133 6-OMe 5.1938 5.2054 5.2776 5.2211 5.2074 234 6-OH 5.6778 5.1311 4.986 4.9308 4.967 235 6-Cl 5.2676 4.9202 4.8037 4.8421 4.829 136 6-COOH 4.301 4.914 4.7161 4.8852 4.8102 137 6-COOMe 4.8861 4.9164 4.972 5.1718 5.0367 238 6-CONH 4.6021 4.9772 4.6996 4.6747 4.6145 12

39 6-NO 4.301 4.5418 4.7051 4.5019 4.5127 12

40 6-NH 4.6383 5.1693 4.9626 4.6819 4.7461 22

41 7-OMe 4.4318 4.1068 4.458 4.4028 4.4544 142 4,5-diOH 4.6021 4.8572 4.8694 4.8763 4.8731 143 4-OH,5-OMe 5.1487 5.0766 5.1681 5.1854 5.1286 144 4-CH CH(Me)O-5 4.5376 5.7176 5.1856 5.3796 5.1396 12

45 5,6-diOH 5.6383 5.3239 5.2418 5.2784 5.239 146 5,6-diMe 5.9208 5.9208 5.848 5.8486 5.7007 147 5,6-OCH O 5.6576 5.2181 5.4837 5.3865 5.137 22

48 5-OMe,6Me 6 5.6503 5.3646 5.6146 5.4355 249 5-OH,6-Me 5.6021 5.3032 5.0501 5.3233 5.1943 150 5-OMe,6-COOH 4.6778 5.3042 5.2905 5.5487 5.343 151 5-OH,6-COOH 5.3665 5.0032 4.9898 5.2586 5.1028 152 5-OMe,6-COOMe 6.0605 5.3765 5.5592 5.8769 5.6028 253 5-OMe,6-CH OH 6.4318 5.3541 5.3273 5.7366 5.5069 12

54 5-OMe,6-CHO 6 5.3761 5.2571 5.5939 5.3746 155 5-S(CH ) Nmorph 4.3 5.6053 5.9695 5.9873 6.0146 12 3

56 4-Me 4.3 4.7429 4.3599 4.544 4.5276 157 4-Cl 4.3 4.7703 4.4834 4.462 4.487 2

Page 4: Quantitative structure–activity relationships (QSAR): studies of inhibitors of tyrosine kinase

66 Q. Shen et al. / European Journal of Pharmaceutical Sciences 20 (2003) 63–71

T able 1. ContinuedcNo. Substituent Log (1/ IC ) Series50

a bObserved Calculated

Eq. (1) Eq. (2) Eq. (3) Eq. (4)

58 4-COOH 4.3 4.7617 4.3302 4.4387 4.4051 159 4-COOMe 4.3 4.6087 4.1841 4.2508 4.2103 160 4-CONH 4.3 4.6761 4.0935 4.0042 4.0041 12

61 4-NO 4.3 4.0957 4.1285 3.8367 3.9121 12

62 4-NH 4.3 4.9183 4.5742 4.2947 4.3903 22

63 7-Me 4.3 4.6183 4.3599 4.5951 4.5683 164 7-OH 4.3 4.6864 4.5792 4.5693 4.6254 265 7-Cl 4.3 4.7499 4.4834 4.6885 4.6681 166 7-COOH 4.3 4.1479 4.3302 4.1582 4.1808 267 7-COOMe 4.3 3.4105 4.1875 3.8521 3.8915 168 7-CONH 4.3 4.2433 4.0935 4.097 4.0783 12

69 7-NO 4.3 3.9242 4.1285 4.0682 4.0973 12

70 7-NH 4.3 4.7144 4.5742 4.3162 4.4075 12

71 4-OMe,5-OH 4.3 4.2732 4.6913 4.6899 4.6644 172 4,5-diOMe 4.3 4.5588 5.0228 4.9787 4.9063 273 4-Br,5-OH 4.3 4.0942 4.6368 4.0001 4.0746 174 4-Br,5-OCH CH=CH 4.3 4.7409 4.9797 4.7361 4.6861 12 2

75 4-CH CH=CH ,5-OH 4.3 4.2661 3.915 4.5921 4.4754 12 2

a 32 32Logarithm of the inverse value of the concentration of inhibitor to reduce the level of P (from added [ P]-ATP) incorporated into theglutamate–tyrosine copolymer substrate as reported by Palmer et al.(1998, 1999).

b Calculated using Eqs. (1)–(4) inTable 4.c Randomly selected as the member of training (1) and validating (2) sets.

the value of 0 is taken otherwise. At first, a population of tion in RSS. The details of modified Cp have been100 models is collected by randomly choosing subsets of described elsewhere (Shen et al., 2003).independent variables, i.e. taken 1 or 0 for differentvariables. Then the objective function (modified Cp, seebelow) value for each model is calculated. The evolving 3 . Results and discussionprocess includes mutation and selection operations. Eachmodel is allowed to create a new model through mutation 3 .1. Definition of some descriptorsoperation, then the Cp (Nishii, 1984) of the new model isrecalculated, and all new models are added to form a QSAR of some 1-phenylbenzimidazoles was performed200-model population. According to the Cp values, 100 byKurup et al. (2001).Due to the lack of the electronicmodels with lowest Cp are selected from the set of 200 parameters , only 22 compounds were included in the1

models. Mutation and selection operations are continuous- regression. It seems that the electronic parameters is an1

ly repeated until the convergence criterion is satisfied. important variable in the QSAR study concerned. TheThe modified Cp statistic as objective function is applied parameters is a summation of thes values of all1 1

to variable selection in this QSAR study of 1-phenylben- substituents at different positions of each molecule. It iszimidazoles. The modified Cp in MLR is expressed as the measure of field / inductive effect. The electric chargefollows. of each atom in the parent compound is changed when it is

connected with substituents at different positions. The2ˆCp(p)5RSS /s 2 (n 2 2p) (1) electronic properties are initially developed from a consid-p PLS

eration of the effects in aromatic compounds such aswhere n is the number of dependent variables,p is the benzoic acids. The benzene ring is the most commonnumber of independent variables. RSS is the residual sumstructural element in all kinds of pharmaceutical com-p

2 2ˆ ˆof the squares ofp-variable model.s is a modifieds pounds. To express the field/ inductive effect of sub-PLS

by taking advantage of the capability of PLS in dealing stituents, we proposed parametersq , q and q aso m p

with the multicollinearity problem and providing a correct described above which are the changes of charge densitiesestimation of model error. When the original data set is of carbon atoms of a benzene ring when a substituting

2ˆsubjected to PLS analysis,s is defined as the value of group is connected to the ring. Electron-releasing sub-PLS

RSS corresponding to the minimum number of principal stituents have large values ofq and small values ofq andm o

components when further increase of the number of q . Otherwise, electron-attracting substituents have smallp

principal components does not cause a significant reduc- values ofq and large values ofq and q .m o p

Page 5: Quantitative structure–activity relationships (QSAR): studies of inhibitors of tyrosine kinase

Q. Shen et al. / European Journal of Pharmaceutical Sciences 20 (2003) 63–71 67

T able 2List of molecular descriptors for 1-phenylbenzimidazoles studied as candidate variables

Functional families Descriptorsof descriptors

Spatial descriptors RadOfGyration (Radius of gyration),Shadow indices (surface area projections)(Shadow-XY, Shadow-XZ, Shadow-YZ,

Shadow-XYfrac, Shadow-XZfrac,Shadow-YZfrac, Shadow-nu, Shadow-Xlength,Shadow-Ylength, Shadow-Zlength)

V (molecular volume)m

DensityArea (molecular surface area)PMI (principal moment of inertia)(PMI-mag-X, PMI-mag-Y, PMI-mag-Z)B5 (Veloop’s sterimol parameter),4,7

Structural descriptors M (molecular weight),w

Hbond acceptor (number of hydrogen bond acceptors),Hbond donor (number of hydrogen bond donors),Rotbonds (number of rotatable bonds),

Electronic descriptors Apol (sum of atomic polarizabilities),Dipole (Dipole-mag, Dipole-X, Dipole-Y, Dipole-Z)Sr (superdelocalizability),

Quantum mechanical HOMO (highest occupied molecular orbital energy)descriptors LUMO (lowest unoccupied molecular orbital energy)

Q (electronic charge of N-3 in the 1-phenylbenzimidazoles)N3

Q , Q , Q (electronic effect of substituents)o m p

Thermodynamic A logP, log P (the octanol–water partition coefficient)descriptors Fh2o (desolvation free energy for water)

Foct (desolvation free energy for octanol)MR , MolRef (molar refractivity)CM** 23

E-State index S-aaCH, S-aasC, S-aaaC, S-aaN, S-aasN, S-ssO

Indicator variable I5

Q , the electric quantity of N-3 in the benzimidazole is large compared to the sample size, which deteriorates theN-32ˆalso a descriptor defined in this study, as the nitrogen atom performance of QSAR modeling. Usings in Cp9 in such

is suggested to form hydrogen bonds (Palmer et al., 1999). ill-conditioned systems would result in overfitting andunderestimation of model error. Because PLS has thecapacity to deal with the multicollinearity problem and to3 .2. Modified Cp statistic 2ˆprovide a correct estimation of model error,s in Cp9 is

2ˆreplaced bys as defined in Section 2.3. The ex-PLSThe Cp statistic is modified for variable selection. Theperimental results show that the penalty to the number ofconventional Cp (denoted by Cp9 here) is expressed asindependent variables in modified Cp is moderate.

2ˆCp95RSS /s 2 (n 22p) (2)p3 .3. QSAR modeling for 1-phenylbenzimidazoles

2ˆwheres is the estimation of RSS in the model involvingall variables. Usually, the Cp9 statistic can perform satis- In the first step of regression analysis, we calculate thefactorily in well-conditioned situations where the sample correlation coefficients using one-parameter regressionsize is large compared to the number of variables and the model for each descriptor with respect to all 75 com-collinearity among variables is negligible. However, the pounds. Descriptors highly correlated with inhibitory ac-collinearity among variables is a common case rather than tivity are listed inTable 3.Indicator variable I , sterimol5

an exception in QSAR studies under normal conditions, parameter B5 , radius of gyration (RadOfGyration) and4,7

and the number of variables representing the objects is shadow-Xlength show high correlation with the inhibitionlarge compared to the sample size. When Cp9 is used as the activity. On the basis of the results obtained, we haveobjective function, even when an apparently optimum Cp9 assumed that most of the information concerning theis reached, the number of descriptors might still be too structure–activity relationship of 1-phenylbenzimidazoles

Page 6: Quantitative structure–activity relationships (QSAR): studies of inhibitors of tyrosine kinase

68 Q. Shen et al. / European Journal of Pharmaceutical Sciences 20 (2003) 63–71

T able 3Descriptors and their correlation coefficients (R) in one-parameter regression model involving all 75 compounds

Descriptor

B5 I RodOfGyration Shadow-Xlength Shadow-XY Area PMI-Z PMI-Y PMI-magV Rotbonds Apol4,7 5 m

R 20.6387 0.6325 0.6132 0.5909 0.58 0.573 0.577 0.583 0.578 0.567 0.558 0.527

is contained in the group of spatial descriptors. Among the rings. The more rotatable bonds exist in the molecule, the12 variables listed inTable 3,nine are spatial descriptors. larger the degree of conformational flexibility the moleculeThis shows that steric effect plays an important role in possessing. A flexible ligand can easily transform toinhibitory activity of 1-phenylbenzimidazoles. Shadow favorable steric configuration for binding with a relativelyindices merit attention that all surface area projections narrow ATP site. This is in accordance with the positiverelevant x coordinate axe such as shadow-Xlength, coefficient of the descriptor Rotbonds as shown inTable 4.shadow-XZ and shadow-XY have higher correlation co- The positive coefficient of descriptor LUMO impliedefficients (R. 0.577) than other area projections only molecules with high-energy LUMOs would promote therelevanty or z axes. For example,R values for shadow- inhibitory activity. Negative coefficient of the descriptorZlength, shadow-Ylength and shadow-YZ are20.15, 0.28 Hbond donor suggests that an increase in the number ofand 0.247, respectively. For the parameterQ , the correla- hydrogen bond donors in a molecule would reduce them

tion coefficient calculated with respect to all active and activity of molecules. That is to say a low electrophilicityinactive compounds is 0.183, while anR value of 0.61 is of the molecule is favorable for promoting the activity. Theobtained when only active compounds are involved in the correlation between the experimentally observed lg 1/ IC50

regression model. Subsequent studies as discussed below and those calculated by the best 5-variable model is shownshow thatQ is an important descriptor, whileQ andQ in Fig. 2A. The correlation coefficient for the training setm o p

are not such in prediction the inhibitory activity. was 0.8529 and that for the validation set was 0.8708,MLR is used with the modified Cp as the statistic in respectively. InFig. 2A, there is an obvious outlier with

EAs. The best model with minimum Cp value among the rather high deviation of calculated activity from thefinal 100 combinations contains four variables during the experimentally measured value of compound 55. ThisEA search. Best models involving 3, 4, 5 and 6 variables value for this compound is an outlier also in all equationsare shown inTable 4. In Eqs. (2) and (4), the large shown inTable 4,no matter whether it was placed in thepositive coefficient of variableQ implies an increase in training or predicted sets.m

the value ofQ is conductive to the activity of molecule. When the EA search terminates, one may count them

Electron-releasing groups, which have higherQ value number of times for a particular molecular descriptor tom

than electron-attracting group, enhance the activity of appear in 100 individual combinations. When one lists theinhibitors while electron-attracting groups reduce the in- descriptors by order of decreasing numbers of times ofhibitory activity. Negative coefficient of B5 in these appearance, the top descriptors or the most frequently4,7

equations shows bulky groups would reduce the activity of appeared feathers are shown inTable 5. Once again,molecules when they are attached to position 4 and 7. spatial descriptors occupy an important position and oneSubstituents at 5-position enhance the inhibitory activity third of top descriptors are of spatial type. From lg1/ IC50

by a positive effect shown by positive I . However, of this series of compounds, one notices that compounds5

substituents parameterQ I and B5 are not sufficient to with substitutes at 5-position are more active againstm 5 4,7

describe the activity of inhibitor, and parameters describ- PDGFR than those with substitutes at other positions. Soing the integral molecule appear necessary. The parent spatial descriptors related to substitute position, such ascompound 1-phenylbenzimidazole is fairly rigid, with only B5 and Shadow-Xlength, are important descriptors.4,7

one rotatable bond between the phenyl and benzimidazole Importance of descriptor RadOfGyration (Radius of gyra-

T able 4Results of variable selection by EA using Cp and MLR modeling involving 75 compounds

a a a bEquation R S F R Rp max

lg 1/ IC 5 20.5244*B5, 1 0.2592*LUMO1 0.2388*Shadow-Xlength1 1.9262 (1) 0.8105 0.5304 32.5538 0.8435 0.245550 4,7

lg 1/ IC 5 20.4004*B5, 1 5.4778*Q 1 0.777*I 1 0.0623*S-ssO1 5.8374 (2) 0.8382 0.4987 29.5386 0.8742 0.470250 4,7 m 5

lg 1/ IC 520.4125*B5, 10.3027*LUMO10.2216*Rotbonds50 4,7

–0.2928*Hbond donor10.4402*I 14.2138 (3) 0.8529 0.4822 26.1647 0.8708 0.48825

lg 1/ IC 520.5244*B5, 12.7188*Q ,10.2457*LUMO10.1818*Rotbonds50 4,7 m

20.2533*Hbond donor10.6170*I 14.7015 (4) 0.8603 0.4758 22.7837 0.8724 0.48825

a R, correlation coefficient;S, standard deviation;F, F statistics.b R , correlation coefficient of prediction set;R , the maximum correlation coefficient among the variables.p max

Page 7: Quantitative structure–activity relationships (QSAR): studies of inhibitors of tyrosine kinase

Q. Shen et al. / European Journal of Pharmaceutical Sciences 20 (2003) 63–71 69

Fig. 2. (A) Calculated versus observed Ig IC of a five-descriptor model of 75 compounds; (B) Calculated versus observed Ig IC of a three-descriptor50 50

model of 54 compounds.

T able 5Most frequently appeared descriptors during the EA search

Compounds Prefered variables

Inactive and I , Q , Rotbonds, LUMO, MolRef, Area, Hbond donor,5 m

active RadOfGyration, Density, B5 ,4,7

compounds Shadow-Xlength, S-ssO,

Active compounds I , Q , Rotbonds, LUMO, MolRef, Area, Hbond5 m

donor,V , S-aaaC, S-aaN, Dipole-Ym

tion) indicates the significance of steric hindrance caused small effect on the activity asAlog P is excluded in theseby the size of functional groups. With the same reason, equations and the top descriptors.indicator descriptor I is an important variable. Molar To explore further into effects influencing inhibitory5

refractivity (MolRef) is a combined measure of molecular activity, we also carried QSAR study solely on 54 activesize and polarizability, and it turned out to be one of the compounds. The best equations involving three, four andimportant variables. Nitrogen at position 3 (N-3) of five variables are listed inTable 6. The correlationbenzimidazole moiety was believed to form hydrogen bond coefficient for the prediction set as given by Eqs. (2) and(Palmer et al., 1999). However, descriptorQ turned to (3) are rather low, though the correlation coefficients forN-3

be not important during EA search. As each molecule in the training set obtained by these two equations arethis series contains atom N-3, electric charge in atom N-3 acceptable. This is a symptom of overfitting which seemsseems to be not so essential with respect to inhibitory to be related with the relatively high correlation amongactivity. variables involved in these equations. The correlation of

Hydrophobicity, a factor usually much considered in the calculated and observed lg1/ IC by Eq. (1) is shown in50

development of QSAR in biochemistry, seems to have a Fig. 2B. The most frequently appeared descriptors during

T able 6Results of variable selection by EA using Cp and MLR modeling involving 54 compounds

a a a aNo. Equation R S F R Rp max

1 lg1/ IC 50.2120*Rotbonds10.7796*I 1 0.8245 0.4763 24.7632 0.7288 0.439050 5

0.9252*S-aaaS12.52252 lg1/ IC 50.3081*Rotbonds10.8587*I 1 0.8552 0.4420 23.3800 0.6673 0.870250 5

2.5908*S-aaaS124.9477*S-aaN120.67133 lg1/ IC 50.3682*Rotbonds10.8674*I 1 0.8695 0.4281 20.4531 0.6708 0.870250 5

2.6171*S-aaaS125.9895*S-aaN–0.2108* Hbond donor125.1076

a See footnote a ofTable 4.

Page 8: Quantitative structure–activity relationships (QSAR): studies of inhibitors of tyrosine kinase

70 Q. Shen et al. / European Journal of Pharmaceutical Sciences 20 (2003) 63–71

the EA search when solely activity compounds are in- the distribution of compounds active to PDGFR is differentvolved are also shown inTable 5. As the data set is from that of the inactive compounds, though a clearreduced, the relative importance of different descriptors is classification was not obtained.changed. For instance, compounds with substituting groupsat 4,7-positions are commonly inactive ones. As thesecompounds are excluded from the data set, the B54,7 4 . Conclusionsbecomes unimportant, and it was not selected during theEA search. Some new electronic parametersQ , Q and Q foro m p

Because 5-position appears an activate position, MLR substituents are suggested together with a series of estab-was performed on 28 compounds with 5-substitutes and lished descriptors are used to predict and classify inhibitorthe following model was obtained: activities of 75 1-phenylbenzimidazoles as inhibitors of

PDGFR. DescriptorQ is shown to be important variablemLog IC 5 17.8478*Q 1 0.1610*Hbond donor50 mto express effect of substituents. Variable selection by EA

2 0.4591*Rotbonds2 0.5685*Alg P 1 9.2649 using modified Cp based on MLR modeling show thatn 5 28R5 0.7969S 5 0.4423F 5 10.8747 (3) spatial descriptors are most important variables revealed

important properties of the inhibitors. Electron-releasingHere n is the number of observations,R is correlation substitutes at 5-position and the absence of bulky groups atcoefficient,S is standard deviation,F is F statistics. From 4, 7-positions of the parent structure can enhance inhibitorthe large coefficient ofQ in this equation one can see thatm activity.Q plays an essential role in terms of the activity.m

Hydrophobicity (A lgP) exerts some negative influence toPDGFR inhibition.

A cknowledgements

3 .4. Principal component analysis for classification ofThe work was financially supported by the NationalPDGFR inhibitors

natural Science Foundation of China (Grant No.29735150,20075006, 20105007)There are 21 compounds inactive to PDGFR in the

whole set of 75 compounds, it is interesting to use PCA asa unsupervised classification method to classify this set of

R eferencescompounds. PCA is a multivariate statistical analysismethod, which can extract information contained in a data

D olle, R.E., Dunn, J.A., Bobko, M., Singh, B., Kuster, J.E., Baizman, E.,matrix and reduce the original number of variables to aHarris, A.L., Sawutz, S.G., Miller, D., Wang, S., Faltynek, C.R., Xie,few factors called principal component (PC). All 47W., Sarup, J., Bode, C.E., Pagani, E.D., Silver, P.J., 1994. 5,7-

descriptors listed inTable 2are used, and PCA is used to Dimethoxy-3-(4-pyridinyl) quinoline is a potent and selective inhibitorproject the high dimensional patterns into the two dimen- of human vascularb-type platelet-derived growth factor receptorsional space of the first two PC-s (Fig. 3). The tendency of tyrosine kinase. J. Med. Chem. 37, 2627–2629.

H all, L.H., Kier, L.B., 1991. The electrotopological state: structureinformation at the atomic level for molecular graphs. J. Chem. Inf.

Comput. Sci. 31, 76–78.H all, L.H., Kier, L.B., 1995. Electrotopological state indices for atom

types: a novel combination of electronic, topological, and valence stateinformation. J. Chem. Inf. Comput. Sci. 35, 1039–1045.

H asegawa, K., Miyashita, Y., Funatsu, K., 1997. GA strategy for variableselection in QSAR studies: GA Based PLS Analysis of calciumchannel antagonists. J. Chem. Inf. Comput. Sci. 37, 306–310.

K ubingi, H., 1996. Evolutionary variable selection in regression and PLSanalysis. J.Chemometrics 10, 119–133.

K urup, A., Garg, R., Hansch, C., 2001. Comparative QSAR study oftyrosine kinase inhibitors. Chen. Rev. 101, 2573–2600.

L uke, B.T., 1994. Evolutionary Programming Applied to the developmentof quantitative structure–activity relationships and quantitative struc-ture–property relationships. J. Chem. Inf. Comput. Sci. 34, 1279–1287.

M aguire, M.P., Sheets, K.R., McVety, K., Spada, A.P., Ziberstein, A.A.,1994. New series of PDGF receptor tyrosine kinase inhibitors: 3-substituted quinoline derivatives. J. Med. Chem. 21, 29–2137.

2C erius QSAR. Molecular Simulations, San Diego, CA.N aumann, T., Matter, T., 2002. Structural classification of protein kinases

using 3D molecular interaction field analysis of their ligand bindingFig. 3. Score plot of the first two Pcs of the whole set of 75 compounds. sites: target family landscapes. J. Med. Chem. 45, 2366–2378.

Page 9: Quantitative structure–activity relationships (QSAR): studies of inhibitors of tyrosine kinase

Q. Shen et al. / European Journal of Pharmaceutical Sciences 20 (2003) 63–71 71

N ishii, R., 1984. Asymptotic properties of criteria for selection of applied in studies of structure/activity and structure/property relation-variables in multiple regression. Ann. Stat. 12, 758–765. ships. Anal. Chim. Acta 199, 99–109.

O blak, M., Randic, M., Solmajer, T., 2000. Quantitative structure– S hen, Q., Jiang, J.H., Shen, G.L., Yu, R.Q., 2003.Variable selection by anlckactivity relationship of flavonoid analogues. 3. Inhibition of P56 evolution algorithm using modified Cp based on MLR and PLS

protein tyrosine kinase. J. Chem. Inf. Comput. Sci. 40, 994–1001. modeling: QSAR studies of carcinogenicity of aromatic amines. Anal.P almer, B.D., Kraker, A.J., Hartl, B.G., Panopoulos, A.D., Panek, R.L., Bioanal. Chem. 375, 248–254.

Batley, B.L., Lu, G.H., Susanne, T.K., Showalter, H.D.H., Denny, S tanton, D.T., Jurs, P.C., 1990. Development and use of charged partialW.A., 1999. Structure–activity relationships for 5-substituted surface area structural descriptors in computer-assisted quantitativephenylbenzimidazoles as Selective ATP site inhibitors of the platelet- structure–property relationship studies. Anal. Chem. 62, 2323–2329.derived growth factor receptor. J. Med. Chem. 42, 2373–2382. Z hu, L.L., Hou, T.J., Chen, L.R., Xu, X.J., 2001. 3D QSAR analyses of

P almer, B.D., Smaill, J.B., Boyd, M., Boschelli, D.H., Doherty, A.M., novel tyrosine kinase inhibitors based on pharmacophore alignment. J.Hamby, J.M., Khatana, S.S., Kramer, J.B., Kraker, A.J., Panek, R.L., Chem. Inf. Comput. Sci. 41, 1032–1040.Lu, G.H., Dahring, T.K., Winters, R.T., Showalter, H.D.H., Denny, V iswanadhan, V.N., Ghose, A.K., Revankar, G.R., Robins, R.K., 1989.W.A., 1998. Structure–activity relationships for 1-phenylben- Atomic physicochemical parameters for three dimensional structurezimidazoles as Selective ATP site inhibitors of the platelet-derived directed quantitative structure–activity relationships. 4. Additionalgrowth factor receptor. J. Med. Chem. 41, 5457–5465. parameters for hydrophobic and dispersive interactions and their

P ierre, D., Michel, L., David, S.G., 2000. 3D-QSAR CoMFA on cyclin- application for an automated superposition of certain naturally occur-dependent kinase inhibitors. J. Med. Chem. 43, 4098–4108. ring nucleoside antibiotics. J. Chem. Inf. Comput. Sci. 29, 163–172.

R ohrbaugh, R.H., Jurs, P.C., 1987. Descriptions of molecular shape