a general model for prediction of caco-2 cell permeability

8
A General Model for Prediction of Caco-2 Cell Permeability Anneli Nordqvist a, b , Jonas Nilsson* c , Tuulikki Lindmark b , Alf Eriksson d , Per Garberg b and Mats Kihle ¬n c a Division of Organic Pharmaceutical Chemistry, Department of Medicinal Chemistry, BMC, Uppsala University, Box 574, SE-751 23 Uppsala, Sweden b In Vitro Sciences c Stuctural Chemistry d Analytical Sciences, Biovitrum AB, SE-112 76 Stockholm, Sweden Full Paper Permeability across the epithelium is one of the major barriers to drug absorption and is one property subject to in silico prediction attempts. Prediction models provide a possibility to address absorption issues, early in drug discovery, for a large number of compounds. The aim of this study was to develop a general comprehensive partial least square projection to latent structures (PLS)-model for prediction of Caco-2 cell permeability using theoret- ically calculated descriptors suited for large virtual libra- ries. In order to deal with current issues of data quantity and quality the well-established Caco-2 cell model was used to generate accurate permeability data of apparent passive transport for a large set of structurally diverse compounds. PLS statistics was used to correlate calculated descriptors to log P app . This new prediction model for Caco-2 cell permeability has incorporated many different descriptor types to deal with the multivariate nature of permeability. The model is designed to classify discovery compounds as low, medium or high permeable. A good statistical model was derived (R 2 0.79, Q 2 0.65, n 46) using 70 descriptors including lipophilicity, hydrogen bonding, polar surface area, size and charge descriptors and some nonlinear terms. The model has been tested and proved valid on two different external test sets (n 5 and n 125 respectively). Root mean square error of predic- tion (RMSEP) was 0.45 for the small external test set. The model predicted 82% of the compounds in the test sets as members to the correct class, 18% were classified wrong. No low permeable compounds were classified as high permeable and only one high permeable compound was classified as low permeable. With this model it has been shown that the in silico prediction models for Caco-2 cell permeability has taken a step closer to meet the expect- ations of a high throughput filter tool applied in early phase drug discovery. 1 Introduction During the last decade, absorption, distribution, metabolism and excretion (ADME) related questions have been introduced earlier in the discovery programs than what has traditionally been the case. One of the reasons is that ADME and pharmacokinetic related issues are major reasons for failure in clinical trials [1]. If these issues are addressed early, it is possible that drugs with higher chances of success can be brought forward through the clinical trials. Therefore, various in vitro methods have been introduced to predict in vivo behavior [2]. Today, the in vitro methods can no longer match the demand in throughput as the number of compounds that can be generated has increased dramati- cally. A rising need for introducing selection filters prior to design and purchase of large compound libraries has turned the interest towards in silico predictions for ADME related properties. Hence, giving the chemists an opportunity to screen compounds in silico prior to synthesis. The impor- tance of good starting points, i.e. molecules with good ADME properties entering a discovery program, has indeed been emphasized [3], since it is often difficult to improve bad ADME properties of a specific scaffold. Absorption is one of the fundamental ADME properties that is screened for, and the intestinal epithelium (perme- QSAR Comb. Sci. 2004, 23 DOI: 10.1002/qsar.200330868 ¹ 2004 WILEY-VCH Verlag GmbH&Co. KGaA, Weinheim 303 * To receive all correspondence: E-mail: Jonas.E.Nilsson@biovi- trum.com; Tel.: 46 8 6973870 Key words: PLS, QSPR, ADME prediction, in silico, passive transport Abbreviations: ADME, Absorption Distribution Metabolism Excretion; BVT, Biovitrum; P app , Apparent Permeability Coeffi- cient; PCA, Principal Component Analysis; PLS, Partial Least Square Projection to Latent Structures; PSA, Polar Surface area; RMSEP, Root Mean Square Error of Prediction A General Model for Prediction of Caco-2 Cell Permeability & Combinatorial Science

Upload: anneli-nordqvist

Post on 06-Jul-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: A General Model for Prediction of Caco-2 Cell Permeability

A General Model for Prediction of Caco-2 Cell PermeabilityAnneli Nordqvista, b, Jonas Nilsson*c, Tuulikki Lindmarkb, Alf Erikssond, Per Garbergb and Mats Kihle¬nc

a Division of Organic Pharmaceutical Chemistry, Department of Medicinal Chemistry, BMC, Uppsala University, Box 574,SE-751 23 Uppsala, Sweden

b In Vitro Sciencesc Stuctural Chemistryd Analytical Sciences, Biovitrum AB, SE-112 76 Stockholm, Sweden

Full Paper

Permeability across the epithelium is one of the majorbarriers to drug absorption and is one property subject toin silico prediction attempts. Prediction models provide apossibility to address absorption issues, early in drugdiscovery, for a large number of compounds. The aim ofthis study was to develop a general comprehensive partialleast square projection to latent structures (PLS)-modelfor prediction of Caco-2 cell permeability using theoret-ically calculated descriptors suited for large virtual libra-ries. In order to deal with current issues of data quantityand quality the well-established Caco-2 cell model wasused to generate accurate permeability data of apparentpassive transport for a large set of structurally diversecompounds. PLS statistics was used to correlate calculateddescriptors to log Papp. This new prediction model forCaco-2 cell permeability has incorporated many differentdescriptor types to deal with the multivariate nature ofpermeability. The model is designed to classify discovery

compounds as low, medium or high permeable. A goodstatistical model was derived (R2� 0.79, Q2� 0.65, n� 46)using 70 descriptors including lipophilicity, hydrogenbonding, polar surface area, size and charge descriptorsand some nonlinear terms. The model has been tested andproved valid on two different external test sets (n� 5 andn� 125 respectively). Root mean square error of predic-tion (RMSEP) was 0.45 for the small external test set. Themodel predicted 82% of the compounds in the test sets asmembers to the correct class, 18% were classified wrong.No low permeable compounds were classified as highpermeable and only one high permeable compound wasclassified as low permeable. With this model it has beenshown that the in silico prediction models for Caco-2 cellpermeability has taken a step closer to meet the expect-ations of a high throughput filter tool applied in earlyphase drug discovery.

1 Introduction

During the last decade, absorption, distribution,metabolismand excretion (ADME) related questions have beenintroduced earlier in the discovery programs than whathas traditionally been the case. One of the reasons is that

ADME and pharmacokinetic related issues are majorreasons for failure in clinical trials [1]. If these issues areaddressed early, it is possible that drugs with higher chancesof success can be brought forward through the clinical trials.Therefore, various in vitromethods have been introduced topredict in vivo behavior [2]. Today, the in vitromethods canno longermatch the demand in throughput as the number ofcompounds that can be generated has increased dramati-cally. A rising need for introducing selection filters prior todesign and purchase of large compound libraries has turnedthe interest towards in silico predictions for ADME relatedproperties. Hence, giving the chemists an opportunity toscreen compounds in silico prior to synthesis. The impor-tance of good starting points, i.e. molecules with goodADMEproperties entering adiscovery program, has indeedbeen emphasized [3], since it is often difficult to improvebadADME properties of a specific scaffold.Absorption is one of the fundamental ADME properties

that is screened for, and the intestinal epithelium (perme-

QSAR Comb. Sci. 2004, 23 DOI: 10.1002/qsar.200330868 ¹ 2004 WILEY-VCH Verlag GmbH&Co. KGaA, Weinheim 303

* To receive all correspondence: E-mail: [email protected]; Tel.: � 4686973870

Key words: PLS, QSPR, ADME prediction, in silico, passivetransport

Abbreviations: ADME, Absorption Distribution MetabolismExcretion; BVT, Biovitrum; Papp, Apparent Permeability Coeffi-cient; PCA, Principal Component Analysis; PLS, Partial LeastSquare Projection to Latent Structures; PSA, Polar Surface area;RMSEP, Root Mean Square Error of Prediction

A General Model for Prediction of Caco-2 Cell Permeability

� ����������� � ��� �

Page 2: A General Model for Prediction of Caco-2 Cell Permeability

ability) is one of the major barriers to drug absorption.Several in vitro methods are available for permeabilityscreening of which the cell culture model using Caco-2 cellsis themostwidelyused [4]. Previouslyderived in silicomodelsto describe Caco-2 cell permeability includemodels based onthermodynamic [5±11], spatial [5, 9, 12 ±15], topological [6,16], structural [5, 8, 9, 11, 16, 17], electronic [18], electro-topological [7, 16, 17], quantum mechanic [17, 19] and 3D-QSARdescriptors [20 ±22], butmanyof themodels arebasedonly on a few descriptors or one type of descriptors. Polarsurface area (PSA) has been used bymany authors [5, 12±15,17] and it has been shown that the less computer intensivePSA from a single conformation can be used instead of thedynamic PSA approach [17, 23]. Indications of non-linearityare found in some previous models regarding Caco-2 cellpermeability [9, 11] or in situ absorption in rat gut [24] andnon-linear models has been investigated before [5, 18, 25]. Inorder togainacceptanceamongusers themodelmust providepredictionsata reasonable speedaswell asguidanceonwhichmolecules to synthesize. Interpretability also increases thevalidity of the model if the descriptors used can be translatedinto known properties influencing permeability.One of the first correlations using Caco-2 cell perme-

ability was published in 1991 [26] and up until today the sizeof the datasets, data quality and bias towards drugswith highpermeability still limits the use of the derived predictionmodels [27]. Considerable interlaboratory differences [28]and active transport mechanisms [27] further complicatesthe picture of which data that can be used to develop themodels. Only three out of 18 previously published Caco-2cell permeability prediction models include a large dataset(around 50 compounds) from one source [10, 11, 22]. Due tothe limited access of data, good validation sets are an evenlarger problem. This new Caco-2 cell prediction modeladdresses the problem with data quality. Massbalance,monolayer integrity and the presence of apparent activetransport weremonitored. The training set of 46 compoundsis one of the larger published data sets used for modelconstruction. In order to dealwith themultivariate nature ofpermeability many different kinds of previously useddescriptors have been evaluated using partial least squareprojection to latent structures (PLS) statistics.

2 Materials and Methods

2.1 Drug Transport Studies

2.1.1 Chemicals

Unlabelled substances were purchased from Sigma-AldrichCo (St Louis, MO, USA). Radiolabelled compounds wereobtained from Amersham Pharmacia Biotech (Bucking-hamshire, England) or from Perkin Elmer Life Sciences(Boston, MA, USA). [3H]tranexamic acid was purchasedfrom Moravek Biochemicals (Brea, CA, USA)

2.1.2 Cell culture

Caco-2 cells were obtained from American Type CultureCollection (Rockville, MD, USA). All tissue culture mediaand buffers were purchased from Gibco (Paisley, Scotland,UK). Cell culture flasks and filters were obtained fromCostar (NY, USA). The cells were cultured according tostandard protocols [29]. In brief, the cellsweremaintained inDulbecco×s modified Eagles medium, containing 10% heat-inactivated fetal calf serum and 1% nonessential aminoacids, in 95% relative humidity and 10% CO2. 5*105 cellswere seeded on polycarbonate filter inserts (Transwell,0.45 �mmean pore size, 12 mm diameter). Passage number36 ± 46 was used for permeability experiments 14 ± 28 dayspost seeding.

2.1.3 Drug Transport Experiments

The compounds to be tested were dissolved in Hank×sbalanced salt solution supplemented with 25 mM N-[2-hydroxyethyl]piperazine-N�-[2-ethanesulfonic acid] (HEPES)buffer (HBSS; pH 7.2) to a final concentration of 0.05 ±2 mM or a specific activity of 0.1 ± 2 �Ci/ml. HBSS(pH 7.2) was pipetted to the receiver chamber. The perme-ability experiments were carried out with a Robotic System(Multiprobe 204, Packard) [30] equipped with a shaker(Micromix, DPC) and a heating plate (CO 102, LinkamScientific). The monolayers were incubated with HBSS inambient atmosphere at 37 �Cwith a stirring rate of 300 rpm.To initiate permeability experiment sample solution wasadded to the donor side at t� 0 together with [14C]-mannitolto monitor monolayer integrity for unlabeled compoundsonly. Papp for mannitol �0.5*10�6 cm/s was consideredacceptable. Four samples were withdrawn at regular inter-vals from the receiver chamber making the total time of theexperiment 60 ± 120 minutes and a sample was withdrawnfrom the donor side at the last time point. For some highpermeable substances the calculation of Papp was made withfewer time points, last point(s) excluded, to uphold the™sink∫ condition. Transport experiments were run in apicalto basolateral (A/B) as well as basolateral to apical (B/A)direction (n� 3 ± 4). HPLC samples were frozen prior toanalysis.

2.1.4 Calculation of Permeability Coefficients

All experiments were performed under ™sink∫ conditions[28]. The permeability coefficient was calculated accordingto Eq. 1:

Papp ��Q�t

� 1A � C0

�1�

where�Q/�t is the steady state flux (mol/s), A is the area ofthe filter (cm2) and C0 is the initial concentration in thedonor chamber (mol/ml). All compounds with a ratio,

304 ¹ 2004 WILEY-VCH Verlag GmbH&Co. KGaA, Weinheim QSAR Comb. Sci. 2004, 23

A. Nordqvist et al.

� ����������� � ��� �

Page 3: A General Model for Prediction of Caco-2 Cell Permeability

Papp(B/A) divided by Papp(A/B), �0.5 or �2 were consid-ered to be actively transported and were consequentlyremoved from the model building. An exception was madefor amiloride (ratio 2.5) to include more low permeablesubstances. All compounds disturbing the monolayer in-tergity and those showing a poor massbalance were alsoremoved. The massbalance was calculated according toEq. 2:

MB��� �CE �VD � CS�4� �VR � �3

n�1CS i� � �VS

C0 �VD� 100 �2�

Where C0 and CE is the initial and end concentration in thedonor chamber respectively and CS(i) is the concentration(mol/ml) of each sample, one to four, withdrawn from thereceiver chamber, where four is the last measured timepoint. VD, VS and VR are the volume of the donor chamber,sample volume and the volume of the receiver chamber(ml). For some of the compounds a passive permeabilitywasobtained after addition of 0.5 ± 1.0 mM verapamil, Table 1.Monolayer integrity was intact during the saturation experi-ment time of 60 min.

2.1.5 Analytical Methods

Radioactive samples were analyzed using a liquid scintilla-tion counter (Tri-Carb 2700 TR, Packard). Unlabelledsubstances were analyzed on a reversed phase HPLC withUV or fluorescence detection at a suitable wavelength.Gabapentin was analyzed as an o-phthaldialdehyde deriv-ative [31]. The HPLC system was a Hewlett Packard series1100 equipped with a vacuum degasser, a binary pump, anautosampler, an UV detector and a FLD detector. Thesoftware used was HP ChemStation for LC (versionA.06.01, Hewlett-Packard). Separations were made on aZorbax SB-C18 column (150*4.6 mm i.d.). Injection volumewas 50 �l. Elutionwas performed at a flow rate of 1.0 ml/minand the column was maintained at ambient temperature.

The method was mainly isocratic and optimized for eachcompound to give reasonable retention times with water orKH2PO4 buffer as mobile phase A and acetonitrile ormethanol as mobile phase B.

2.2 Training Set and Test Set Selection

Thedata set consisted of 51 passively transportedmolecules,Tables 1 and 2. 10% (five) of the substances were selected toform an external test set (Table 2), and the remaining 46substances were used as a training set (Table 1). Prior toexternal test set selection a principal component analysis(PCA) (five components, R2� 0.7) was made with a set ofdescriptors similar to those used in previously publishedprediction models [5, 6, 9, 16 ± 18]. To ensure a proper rangein permeability in the external test set, the whole data setwas ordered in ascending order of Papp and accordinglydivided into five groups. One molecule was chosen fromeach of the five groups to be included in the test set.Chemical diversity was ensured with help from the PCAscore plots, t1 vs. t2 shown in Figure 1. A second external testset, called Biovitrum (BVT) test set, was formed with Papp

measured in screening mode by a one-point determinationafter two hours, one filter in each direction. Thus the BVTtest set is only used for classification (low� 1.5�medium�12�high (*10�6 cm/s)) in early screening phase and is notexpected to give an absolute value of the permeabilitycoefficient. All actively transported compounds, those withpoor massbalance and those disturbing the monolayerintegrity were excluded. 125 molecules in the BVT test setremained after this filtering.

2.3 Conformational Analysis

Calculations were made on an SGI Origin 2000, with 32processors. A 300-30,000 step Monte Carlo Multiple Mini-mum (MCMM) conformational search included in MacroModel (version 7.1, Schrˆdinger) and in Maestro (version5.0, Schrˆdinger) was used for the conformational analysis

QSAR Comb. Sci. 2004, 23 ¹ 2004 WILEY-VCH Verlag GmbH&Co. KGaA, Weinheim 305

Figure 1. PCA score plot t1 vs. t2 used for the selection of an external test set.

A General Model for Prediction of Caco-2 Cell Permeability

� ����������� � ��� �

Page 4: A General Model for Prediction of Caco-2 Cell Permeability

to find the global energy minimum conformation of all themolecules used in the training set. The force field MM3* orMMFF (Amber for danazol) were used for the energyminimization with simulated water using the truncatedNewton conjugate gradient (TNCG), a maximum of 5000iterations and the convergence threshold set to 0.05 kJ/ä*mol.

2.4 Calculated Descriptors

To describe a compound thoroughly a number of differentdescriptors were calculated using the software Cerius2

(version 4.7, Accelrys Inc.) and MolSurf (version 2003-01-07, Qemist AB). Most of the descriptors have previouslybeen used in Caco-2 prediction models, see introduction.

306 ¹ 2004 WILEY-VCH Verlag GmbH&Co. KGaA, Weinheim QSAR Comb. Sci. 2004, 23

Table 1. List of compounds in the training set, their observed Papp and calculated logPapp.

No. Structure Name Pappa (*10�6 cm/s) Observed logPapp Calculated logPapp

1 Acebutolol 1.0� 0.1 �6.00 �6.332 L-alanineb 3.8� 1.1 �5.42 �6.083 Amiloride 0.7� 0.1 �6.15 �5.734 Antipyrineb 51� 9 �4.29 �4.425 Acetylsalicylic acid 1.2� 0.5 �5.92 �5.486 Atenolol 0.5� 0.1 �6.30 �6.157 AZTb 16� 6 �4.80 �4.638 Benzylpenicillin 0.6� 0.2 �6.22 �5.349 Bremazocined 43� 2 �4.37 �4.4310 Caffeineb 53� 11 �4.28 �4.7411 Chloramphenicol 25� 4 �4.60 �5.2812 Cimetidinec 2.0� 0.5 �5.70 �5.9313 Corticosterone 34� 9 �4.47 �4.8814 Cortisol 24� 5 �4.62 �5.1315 Cortisone 31� 8 �4.51 �4.8716 Danazol 22� 2 �4.66 �4.0917 Dexamethasone 36� 9 �4.44 �4.4218 Diazepamb 48� 8 �4.32 �4.1419 Digoxinc 13� 2 �4.89 �4.6620 Diltiazemd 42� 5 �4.38 �4.1921 L-Dopab 1.5� 0.8 �5.82 �6.1522 Doxorubicinc 3.1� 0.7 �5.51 �5.8223 Gabapentin 0.07� 0.04 �7.15 �6.0824 Ibuprofen 59� 10 �4.23 �4.9025 Inulinb 0.12� 0.02 �6.92 �6.8426 Ketoprofen 50� 7 �4.30 �4.3627 Labetalold 36� 7 �4.44 �5.0428 Lactic acidb 1.1� 0.4 �5.96 �5.7229 Methyl scopolamine 0.85� 0.09 �6.07 �5.7130 Methyldopa 0.24� 0.16 �6.70 �6.1631 Metolazonec 16� 4 �4.80 �4.7232 Morphineb 10� 2 �5.00 �5.0033 Nadololc 0.31� 0.05 �6.51 �6.0034 Naloxone 48� 7 �4.32 �4.6235 Nicotineb 35� 10 �4.46 �4.8236 Ouabain 0.11� 0.02 �6.96 �6.7137 Phenytoinb 44� 8 �4.36 �4.0738 Propranolol 44� 9 �4.36 �4.5139 Ranitidine 1.7� 0.6 �5.77 �6.0440 Sumatriptan 3.1� 0.3 �5.51 �5.1941 Timolold 39� 6 �4.41 �4.8142 Tiotidinec 2.2� 0.3 �5.66 �5.7143 Tranexamic acid 0.91� 0.08 �6.04 �6.2244 Ureab 6.4� 1.9 �5.19 �4.8745 Verapamilc 51� 5 �4.29 �4.0246 Warfarinb 40� 11 �4.40 �4.20

a Average calculated from PappA�B and PappB�A.b Data originally published in [37].c Passive transport was measured after adding 0.5 ± 1.0 mM verapamil.d Partly measured at non-sink conditions.

A. Nordqvist et al.

� ����������� � ��� �

Page 5: A General Model for Prediction of Caco-2 Cell Permeability

Noncovalent interactions are important for physicochem-ical properties. These interactions are known to arise fromthe electron distribution of a molecule and are attributesencoded in the E-State descriptors [32, 33]. Topologicaldescriptors are related to properties like polarizability,dipole moment and steric effects [34], which in turn arerelated to permeability. Topological descriptors were there-fore investigated together with descriptors related to PSA,hydrogen bonding and lipophilicity. To account for ioniza-tion of the molecules in a fast way the number of anions andcations at pH 7were included. Squared terms were includedto account for possible non-linearity. An SD-file with thelowest energy conformation found during the conforma-tional search was used as inputfile for the compounds in thetraining set. For the two external test sets a 2D to 3Dconversion using Corina (version 2.4, Molecular NetworksGmbH) and a following minimization in Maestro (version5.0, Schrˆdinger) was conducted. All acids and bases werecalculated in their neutral form except for methyl scopol-amine, which was calculated as charged species. PLS wasused to establish a prediction model using Simca-P� (version 10, Umetrics AB) with unit variance scalingand mean centering. A variable selection was made wherevariables were excluded alternately, if an increase in Q2 wasobserved the variable was kept out of the model.

3 Result and Discussion

Today all prediction models are based on existing drugs, butwill be used to predict permeability for future discoverycompounds from novel compound classes. Thus, it isimportant to have a general model and to be able torecognize prediction failures. PLS analysis was chosen sincethe ability to detect strong and moderate outliers is verygood, it can handle many and correlated descriptors as wellas missing values and it offers useful interpretation possi-bilities. Only descriptors that can be calculated withreasonable speed were used, which makes the modelsuitable for large virtual libraries. Relationships with singlephysico-chemical descriptors tend to impair when structuraldiversity is introduced [12, 35] and therefore many differentdescriptor types were used in the present model (Table 3).The PLS analysis of the training set (n� 46) using 70

descriptors resulted in a three component model with goodstatistical qualities (R2� 0.79 and Q2� 0.65). PLS modelswith scrambled y-values produced Q2 that were negative orvery low. The five compounds in the external test set werepredicted with good accuracy, RMSEP� 0.45 (Figure 2).This model was developed as a tool for in silico screening.82% of the compounds in the test sets were correctlyclassified. 18% were classified wrong (Figure 3), no lowpermeable compounds were predicted to have a highpermeability and only one high permeable compound waspredicted to be low permeable. To our knowledge the BVTtest set is the largest external test set with compoundsmeasured in the same laboratory on the sameCaco-2 cells asthe training set compounds, a criterion for successful modelimplementation [36].In the present model descriptors of different types,

describing hydrogen bonding properties, lipophilicity, size,PSA and charge, may lead to a more general model andinterpretation should therefore be made in wider aspects of

QSAR Comb. Sci. 2004, 23 ¹ 2004 WILEY-VCH Verlag GmbH&Co. KGaA, Weinheim 307

Table 2. List of compounds in the external test set, their observed Papp and predicted logPapp.

No. Structure Name Pappa (*10�6 cm/s) Observed logPapp Predicted logPapp

1 Famotidine 1.2� 0.1 �5.92 �5.582 Ketoconazoled 47� 9 �4.33 �3.563 Mannitol 0.28� 0.39 �6.55 �6.474 Salicylic acid 17� 4 �4.77 �5.215 Testosteroned 58� 5 �4.24 �4.58

a Average calculated from PappA�B and PappB�A.d Partly measured at non-sink conditions.

Figure 2. Observed vs. calculated/predicted Caco-2 cell perme-ability of the training set and the external test set.

A General Model for Prediction of Caco-2 Cell Permeability

� ����������� � ��� �

Page 6: A General Model for Prediction of Caco-2 Cell Permeability

308 ¹ 2004 WILEY-VCH Verlag GmbH&Co. KGaA, Weinheim QSAR Comb. Sci. 2004, 23

Table 3. Descriptors considered in the Caco-2 cell PLS model.

No. Descriptor Type Source

1 S dCH2 Electrotopological state index [38] Cerius2 a

2 S sssCH Electrotopological state index [38] Cerius2 a

3 S tsC Electrotopological state index [38] Cerius2 a

4 S ssssC Electrotopological state index [38] Cerius2 a

5 S sNH2 Electrotopological state index [38] Cerius2 a

6 S ssNH Electrotopological state index [38] Cerius2 a

7 S dsN Electrotopological state index [38] Cerius2 a

8 S aaN Electrotopological state index [38] Cerius2 a

9 S dO Electrotopological state index [38] Cerius2 a

10 S ssO Electrotopological state index [38] Cerius2 a

11 S aaO Electrotopological state index [38] Cerius2 a

12 S sF Electrotopological state index [38] Cerius2 a

13 Wiener Topological Cerius2 a

14 Kappa-1 Topological Cerius2 a

15 Kappa-2 Topological Cerius2 a

16 Kappa-1-AM Topological Cerius2 a

17 Kappa-2-AM Topological Cerius2 a

18 Density Topological Cerius2 a

19 PMI-mag Topological Cerius2 a

20 AlogPc Lipophilicity Cerius2 a

21 AlogP98 Lipophilicity Cerius2 a

22 Jurs-PPSA-1 Polar surface area [39] Cerius2 a

23 Jurs-PPSA-2 Polar surface area [39] Cerius2 a

24 Jurs-PNSA-2 Polar surface area [39] Cerius2 a

25 Jurs-PPSA-3 Polar surface area [39] Cerius2 a

26 Jurs-PNSA-3 Polar surface area [39] Cerius2 a

27 Jurs-DPSA-3 Polar surface area [39] Cerius2 a

28 Jurs-FPSA-2 Polar surface area [39] Cerius2 a

29 Jurs-FNSA-2 Polar surface area [39] Cerius2 a

30 Jurs-FNSA-3 Polar surface area [39] Cerius2 a

31 Jurs-WNSA-2 Polar surface area [39] Cerius2 a

32 Jurs-WNSA-3 Polar surface area [39] Cerius2 a

33 Jurs-RNCGc Polar surface area [39] Cerius2 a

34 Jurs-RPCS Polar surface area [39] Cerius2 a

35 Jurs-TPSAc Polar surface area [39] Cerius2 a

36 Jurs-TASA Polar surface area [39] Cerius2 a

37 Jurs-RPSAc Polar surface area [39] Cerius2 a

38 Jurs-RASAc Polar surface area [39] Cerius2 a

39 Largest polar regionc Polar surface area Molsurfb

40 Polar surfacec Polar surface area Molsurfb

41 MR Size Cerius2 a

42 Shadow-XZfracc Size Cerius2 a

43 Shadow-nuc Size Cerius2 a

44 Shadow-Zlengthc Size Cerius2 a

45 Hbond donorc Hydrogen bond donating Cerius2 a

46 �(H)c Hydrogen bond donating Molsurfb

47 Hbond acceptorc Hydrogen bond accepting Cerius2 a

48 �(H)c Hydrogen bond accepting Molsurfb

49 HOMO Electronic Cerius2 a

50 Apol Electronic Cerius2 a

51 Polarity Electronic Molsurfb

52 Nuclephilicty olephinsc Electronic Molsurfb

53 Electrophilicity olephins Electronic Molsurfb

54 # anions at pH7 Electronic Molsurfb

55 # cations at pH7 Electronic Molsurfb

56 Papp Apparent permeability Determined in house

a Cerius2 (version 4.7, Accelrys Inc.)b Molsurf (version 2003-01-07, Qemist AB)c Squared terms also included

A. Nordqvist et al.

� ����������� � ��� �

Page 7: A General Model for Prediction of Caco-2 Cell Permeability

the PLS components rather than trying to simplify therelationship to include only a few descriptors. The majorityof the descriptors related to hydrogen bonds and PSA arecaptured in the first component and are negatively corre-lated to logPapp, Figure 4 and 5. Hydrophobic surface areaand logP are strongly positively correlated with logPapp

accounted for in all components, component 1 and 2 showninFigure 5.The two calculated logPvalues, no. 20 and21, are

not perfectly correlated, indicating that theyboth contributewith information to stabilize the model. The E-Statedescriptors S sNH2 and S ssNH, no. 5 and 6 respectively,are connected to groups with hydrogen bond donor ability,thus negatively correlated to permeability, whereas E-Statedescriptors related to lipophilic atom types such as �F,�CH�and �CH2 are positively correlated to logPapp. Thetopological descriptors related to shape and size, no. 13 ± 19,are captured by the second component and all positivelycorrelated to logPapp. Two large negative contributions topermeability is the number of cations and anions captured in

QSAR Comb. Sci. 2004, 23 ¹ 2004 WILEY-VCH Verlag GmbH&Co. KGaA, Weinheim 309

Figure 3. Model performance by assigning class limits (low,medium and high) to predicted logPapp, compared to theexperimental class.

Figure 4. PLS coefficient plot for the descriptors included in the model. The descriptors HOMO, AlogP, Apol, Hbond donor, AlogP98,Jurs-TPSA, # cations at pH7, betha(H), e-phil. Olephin and the square terms of Shadow-XZfrac., Shadow-Zlength, alpha(H) were foundsignificant.

Figure 5. Loading plot w*c[1] vs. w*c[2]. Variables numberedaccording to Table 3.

A General Model for Prediction of Caco-2 Cell Permeability

� ����������� � ��� �

Page 8: A General Model for Prediction of Caco-2 Cell Permeability

the second and third component (Figure 5), thus it isimportant to account for ionization in permeability model-ling. These two descriptors together with Hbond donors,squared Shadow-Zlength, AlogP98 and AlogP make thesingle largest contributions (Figure 4). Preliminary results(unpublished) indicate that a wider range of descriptorsintroduce amore general applicability as compared to whenonly a subset of the descriptors are utilized. Simple modelsderived with this dataset and only combinations of MW,TPSA, AlogP, Hbond donors and acceptors had a Q2 below0.5, result not shown. General descriptors are also empha-sized in a recent paper analyzing failures of AMDE/Toxmodels [36].

4 Conclusions

This new Caco-2 cell prediction model has good statisticalproperties (R2� 0.79 and Q2� 0.65, RMSEP� 0.45) andprovides a general tool to predictCaco-2 cell permeability oflarge, structurally diverse virtual libraries. It has been testedon two external test sets and has proven its usefulness inclassification of discovery compounds. 82% of the com-pounds in the test sets were correctly classified and no lowpermeable compounds were classified as high permeableand only one high permeable compound was classified aslow permeable. By the use of many different descriptortypes this model deals with the multivariate nature ofpermeability and provides a general interpretation intoknown properties such as lipophilicity, hydrogen bonding,PSA, size and charge affecting epithelial cell permeability.

Acknowledgements

Wewould like to thankprofessorMichael SjˆstrˆmatUmea University for valuable discussions and input to this manu-script. We would like to thank Ulf Martens, Maria Mastejand Helena Flatow for skilful experimental assistance.

References

[1] T. Kennedy, Drug Discov. Today 1997, 2, 436 ± 444.[2] A. P. Li, Drug Discov. Today 2001, 6, 357 ± 366.[3] C. A. Lipinski, F. Lombardo, B. W. Dominy, P. J. Feeney, Adv.

Drug Deliv. Rev. 2001, 46, 3 ± 26.[4] P. Stenberg, K. Luthman, P. Artursson, J. Control. Release

2000, 65, 231 ± 243.[5] T. I. Oprea, J. Gottfries, J. Mol. Graph. 1999, 17, 261 ± 274.[6] A. Kulkarni, Y. Han, A. J. Hopfinger, J. Chem. Inf. Comput.

Sci. 2002, 42, 331 ± 342.[7] U. Norinder, T. ÷sterberg, J. Pharm. Sci. 2001, 90, 1076 ±

1085.[8] T. ÷sterberg, U. Norinder, J. Chem. Inf. Comput. Sci. 2000,

40, 1408 ± 1411.[9] H. van de Waterbeemd, G. Camenisch, G. Folkers, O. A.

Raevsky, Quant. Struct.-Act. Relat. 1996, 15, 480 ± 490.

[10] M. Yazdanian, S. L. Glynn, J. L. Wright, A. Hawi, Pharm.Res. 1998, 15, 1490 ± 1494.

[11] S. Ren, S. E. J. Lien, Prog. Drug Res. 2000, 54, 1 ± 23.[12] K. Palm, K. Luthman, A. L. Ungell, G. Strandlund, F. Beigi, P.

Lundahl, P. Artursson, J. Med. Chem. 1998, 41, 5382 ± 5392.[13] K. Palm, K. Luthman, A. L. Ungell, G. Strandlund, P.

Artursson, J. Pharm. Sci. 1996, 85, 32 ± 39.[14] L. H. Krarup, I. T. Christensen, L. Hovgaard, S. Frokjaer,

Pharm. Res. 1998, 15, 972 ± 978.[15] C. A. Bergstrˆm, M. Strafford, L. Lazorova, A. Avdeef, K.

Luthman, P. Artursson, J. Med. Chem. 2003, 46, 558 ± 570.[16] F. Yamashita, S. Wanchana, M. Hashida, J. Pharm. Sci. 2002,

91, 2230 ± 2239.[17] P. Stenberg, U. Norinder, K. Luthman, P. Artursson, J. Med.

Chem. 2001, 44, 1927 ± 1937.[18] S. Fujiwara, F. Yamashita, M. Hashida, Int. J. Pharm. 2002,

237, 95 ± 105.[19] U. Norinder, T. ÷sterberg, P. Artursson, Pharm. Res. 1997, 14,

1786 ± 1791.[20] G. Bravi, J. H. Wikel, Quant. Struct.-Act. Relat. 2000, 19, 39 ±

49.[21] G. Cruciani, P. Crivori, P. A. Carrupt, B. Testa, Theochem-J.

Mol. Struct. 2000, 503, 17 ± 30.[22] G. Cruciani, M. Pastor, W. Guba, Eur. J. Pharm. Sci. 2000, 11,

S29 ± S39.[23] D. E. Clark, J. Pharm. Sci. 1999, 88, 807 ± 814.[24] V. Merino, J. Freixas, M. Delvalbermejo, T. M. Garrigues, J.

Moreno, J. M. Pladelfina, J. Pharm. Sci. 1995, 84, 777 ± 782.[25] M. D. Wessel, P. C. Jurs, J. W. Tolan, S. M. Muskal, J. Chem.

Inf. Comput. Sci. 1998, 38, 726 ± 735.[26] P. Artursson, J. Karlsson, Biochem. Biophys. Res. Commun.

1991, 175, 880 ± 885.[27] W. J. Egan, G. Lauri, Adv. Drug Deliv. Rev. 2002, 54, 273 ±

289.[28] P. Artursson, K. Palm, K. Luthman, Adv. Drug Deliv. Rev.

1996, 22, 67 ± 84.[29] P. Artursson, J. Karlsson, G. Ocklind, N. Schipper, Studying

transport processes in absorptive epithelia, in: A. J. Shaw,(Ed.), Epithelial Cell Culture. A Practical Approach, IRLPress, 1996, pp. 111 ± 133.

[30] P. Garberg, P. Eriksson, N. Schipper, B. Sjˆstrˆm, Pharm. Res.1999, 16, 441 ± 445.

[31] P. Lindroth, K. Mopper, Anal. Chem. 1979, 51, 1667 ± 1674.[32] K. Rose, L. H. Hall, L. B. Kier, J. Chem. Inf. Comput. Sci.

2002, 42, 651 ± 666.[33] L. H. Hall, L. B. Kier, J. Chem. Inf. Comput. Sci. 1995, 35,

1039 ± 1045.[34] M. Charton, B. I. Charton, J. Comput. Aided Mol. Des. 2003,

17, 211 ± 221.[35] K. Palm, P. Stenberg, K. Luthman, P. Artursson, Pharm. Res.

1997, 14, 568 ± 571.[36] T. R. Stouch, J. R., Kenyon, S. R., Johnson, X.-Q. Chem, A.

Doweyko, Y. Li, J. Comput. Aided Mol. Des. 2003, 17, 83 ± 92.[37] P. Garberg, N. Borg, R. Cecchelli, R. D. Hurst, T. Lindmark,

A. Mabondzo, J. E. Nilsson, T. J. Raub, D. Stanimirovic, T.Terasaki, J.-O. ÷berg, T. ÷sterberg, Toxicol. in Vitro,accepted.

[38] L. B. Kier, L. H. Hall, Molecular structure description, Aca-demic Press, San Diego 1999, pp. 245.

[39] D. T. Stanton, P. C. Jurs, Anal. Chem. 1990, 62, 2323 ± 2329.

Received on December 23, 2003; Accepted on May 19, 2004

310 ¹ 2004 WILEY-VCH Verlag GmbH&Co. KGaA, Weinheim QSAR Comb. Sci. 2004, 23

A. Nordqvist et al.

� ����������� � ��� �