ligand-supported homology modelling of protein binding-sites using knowledge-based potentials

19
Ligand-supported Homology Modelling of Protein Binding-sites using Knowledge-based Potentials Andreas Evers 1 , Holger Gohlke 1,2 and Gerhard Klebe 1 * 1 Institute of Pharmaceutical Chemistry, University of Marburg, Marbacher Weg 6 D-35032 Marburg, Germany 2 Department of Molecular Biology, The Scripps Research Institute, 10550 N. Torrey Pines Rd., La Jolla, CA 92037 USA A new approach, MOBILE, is presented that models protein binding-sites including bound ligand molecules as restraints. Initially generated, homo- logy models of the target protein are refined iteratively by including information about bioactive ligands as spatial restraints and optimising the mutual interactions between the ligands and the binding-sites. Thus optimised models can be used for structure-based drug design and virtual screening. In a first step, ligands are docked into an averaged ensemble of crude homology models of the target protein. In the next step, improved hom- ology models are generated, considering explicitly the previously placed ligands by defining restraints between protein and ligand atoms. These restraints are expressed in terms of knowledge-based distance-dependent pair potentials, which were compiled from crystallographically deter- mined protein – ligand complexes. Subsequently, the most favourable models are selected by ranking the interactions between the ligands and the generated pockets using these potentials. Final models are obtained by selecting the best-ranked side-chain conformers from various models, followed by an energy optimisation of the entire complex using a common force-field. Application of the knowledge-based pair potentials proved efficient to restrain the homology modelling process and to score and optimise the modelled protein–ligand complexes. For a test set of 46 protein–ligand complexes, taken from the Protein Data Bank (PDB), the success rate of producing near-native binding-site geometries (rmsd , 2.0 A ˚ ) with MODELLER is 70% when the ligand restrains the homology modelling process in its native orientation. Scoring these complexes with the knowledge-based potentials, in 66% of the cases a pose with rmsd , 2.0 A ˚ is found on rank 1. Finally, MOBILE has been applied to two case studies modelling factor Xa based on trypsin and aldose reductase based on aldehyde reductase. q 2003 Elsevier Ltd. All rights reserved. Keywords: homology modelling; protein – ligand interaction; knowledge- based potentials; DrugScore; docking *Corresponding author Introduction The genome sequencing projects provide us with an increasing number of fully sequenced genomes, including those of humans and vertebrates (e.g. mouse) and important microbial pathogens. 1–4 Structural genomics is expected to yield a large number of experimentally determined protein structures, in the long run hopefully resulting in a complete coverage of fold space. 5–8 Thus, referring to suitable reference structures (so-called tem- plates) as a basis (well spread in sequence and folding space), it will become increasingly possible to generate realistic models for any given protein sequence using comparative modelling techniques. 9 This technique can be considered sufficiently mature, given that there has been only a marginal improvement in the comparative modelling results 0022-2836/$ - see front matter q 2003 Elsevier Ltd. All rights reserved. E-mail address of the corresponding author: [email protected] Abbreviations used: PDB, Protein Data Bank; rmsd, root-mean-square deviation; GPCR, G-protein-coupled receptor; QSAR, quantitative structure – activity relationship; MD, molecular dynamics; AR, aldose reductase. doi:10.1016/j.jmb.2003.09.032 J. Mol. Biol. (2003) 334, 327–345

Upload: andreas-evers

Post on 25-Oct-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ligand-supported Homology Modelling of Protein Binding-sites using Knowledge-based Potentials

Ligand-supported Homology Modelling of ProteinBinding-sites using Knowledge-based Potentials

Andreas Evers1, Holger Gohlke1,2 and Gerhard Klebe1*

1Institute of PharmaceuticalChemistry, University ofMarburg, Marbacher Weg 6D-35032 Marburg, Germany

2Department of MolecularBiology, The Scripps ResearchInstitute, 10550 N. TorreyPines Rd., La Jolla, CA 92037USA

A new approach, MOBILE, is presented that models protein binding-sitesincluding bound ligand molecules as restraints. Initially generated, homo-logy models of the target protein are refined iteratively by includinginformation about bioactive ligands as spatial restraints and optimisingthe mutual interactions between the ligands and the binding-sites. Thusoptimised models can be used for structure-based drug design and virtualscreening.

In a first step, ligands are docked into an averaged ensemble of crudehomology models of the target protein. In the next step, improved hom-ology models are generated, considering explicitly the previously placedligands by defining restraints between protein and ligand atoms. Theserestraints are expressed in terms of knowledge-based distance-dependentpair potentials, which were compiled from crystallographically deter-mined protein–ligand complexes. Subsequently, the most favourablemodels are selected by ranking the interactions between the ligands andthe generated pockets using these potentials. Final models are obtainedby selecting the best-ranked side-chain conformers from various models,followed by an energy optimisation of the entire complex using a commonforce-field.

Application of the knowledge-based pair potentials proved efficient torestrain the homology modelling process and to score and optimise themodelled protein–ligand complexes. For a test set of 46 protein–ligandcomplexes, taken from the Protein Data Bank (PDB), the success rateof producing near-native binding-site geometries (rmsd , 2.0 A) withMODELLER is 70% when the ligand restrains the homology modellingprocess in its native orientation. Scoring these complexes with theknowledge-based potentials, in 66% of the cases a pose with rmsd,2.0 A is found on rank 1. Finally, MOBILE has been applied to two casestudies modelling factor Xa based on trypsin and aldose reductase basedon aldehyde reductase.

q 2003 Elsevier Ltd. All rights reserved.

Keywords: homology modelling; protein–ligand interaction; knowledge-based potentials; DrugScore; docking*Corresponding author

Introduction

The genome sequencing projects provide us withan increasing number of fully sequenced genomes,including those of humans and vertebrates (e.g.

mouse) and important microbial pathogens.1 – 4

Structural genomics is expected to yield a largenumber of experimentally determined proteinstructures, in the long run hopefully resulting in acomplete coverage of fold space.5 – 8 Thus, referringto suitable reference structures (so-called tem-plates) as a basis (well spread in sequence andfolding space), it will become increasingly possibleto generate realistic models for any given proteinsequence using comparative modelling techniques.9

This technique can be considered sufficientlymature, given that there has been only a marginalimprovement in the comparative modelling results

0022-2836/$ - see front matter q 2003 Elsevier Ltd. All rights reserved.

E-mail address of the corresponding author:[email protected]

Abbreviations used: PDB, Protein Data Bank; rmsd,root-mean-square deviation; GPCR, G-protein-coupledreceptor; QSAR, quantitative structure–activityrelationship; MD, molecular dynamics; AR, aldosereductase.

doi:10.1016/j.jmb.2003.09.032 J. Mol. Biol. (2003) 334, 327–345

Page 2: Ligand-supported Homology Modelling of Protein Binding-sites using Knowledge-based Potentials

while going from CASP3 to CASP4.10 Its rate ofsuccess depends on the degree of sequence identitywith the template structure (as a rule of thumb.30%) and the reliability of the underlyingsequence alignment.11,12 As an advantage, theregions predicted best are often the biologicallyimportant ones,13 because they are structurallymost conserved by evolution. This provides theperspective that structural genomics will supportbiology and medicine through the annotation ofprotein function.14 – 17 Furthermore, the proteinstructural models can be used for virtual screeningto discover potential new lead structures for drugtherapy.18

Indeed, drug design is frequently faced with thesituation that a ligand should be discovered for atarget protein for which no experimentally deter-mined structure is yet available. The most promi-nent examples are probably the G-protein coupledreceptors, which play an important role in manyphysiological and pathophysiological processes.At present, 50% of all recently launched drugs aretargeted against G-protein coupled receptors.19

As a prerequisite for the MOBILE approachpresented here, we assume that (1) information isavailable about ligands that bind to the target pro-tein and (2) the 3D structures of related proteinswith significant sequence identity are known. Inthis situation, currently two options exist for thedesign process: on the one hand, it is possible toestablish a QSAR model on the basis of a set ofligands to extract 3D features primarily explainingobserved trends in binding affinity; on the otherhand, a homology model of the target protein canbe constructed and subsequently used for thesearch of novel ligands, e.g. by virtual screening.While the QSAR approach is based solely onfeatures derived from the ligands, the secondapproach considers only information availablefrom the related proteins. The latter procedure is,in fact, rather approximate, especially if in thequery protein several amino acids in the activesite are exchanged with respect to the templatestructure. Although the binding characteristics ofligands acting upon a particular protein provideimplicit information about the complementaryfeatures required at the protein active site, to thebest of our knowledge, so far there is no approachthat considers ligand information explicitly duringthe comparative modelling step.

Several studies have been described in the litera-ture that apply homology models of proteins toexplain putative protein–ligand interactions.20 – 34

In some cases, the models were used subsequentlyfor the design of new potent inhibitors.35 – 37

However, in none of these studies was ligandinformation considered explicitly during themodelling process. Usually, only one homologymodel based on one or more template structuresis generated, and the ligand(s) are then placedinto the modelled binding pocket. This is accom-plished either manually or using an automaticdocking tool. Another strategy pursues the spatial

alignment of template and model, followed by thetransfer of the binding orientation of a ligand asfound in the template into the model. In mostcases, the resulting complexes are optimisedfurther, e.g. by using molecular dynamicssimulations.

Finally, ligand information was used by Jansenet al. when modelling the serotonin 5-HT1A recep-tor. The optimisation was performed with theminireceptor modelling program Yak38 based onan extracted active site of a homology modelusing three high-affinity ligands.39

An important task in model-building protein–ligand complexes is the quality assessment of themodels produced, in particular if different orien-tations of active-site residues have to be takeninto consideration. This step is usually performedby visually analysing the interaction geometrybetween protein and ligand functional groups.This step-wise procedure appears rather inefficientand biased by the modeller’s intuition.

Only a few approaches assess the quality of thegenerated complexes in a more sophisticated way:Johnson et al. created a library of protein modelsthat are subsequently screened by rigid liganddocking. The more relevant protein modelsachieved better-scored docking solutions, and thequality of the binding modes generated was thusused to select the most relevant modes.40 Thisapproach has been applied to the modelling of Fv

antibody fragments. However, it requires experi-mental data about the conformation of the dockedligand. Bissantz et al. evaluated generated hom-ology models (agonist and antagonist bindingmodels of three human G-protein-coupled recep-tors) by retrieving known agonists and antagonistsvia docking from a database, which additionallyincluded randomly collected “drug-like” com-pounds.30 Jalaie et al. developed a homologymodel of spinach photosystem II. After dockinginhibitors, a highly predictive CoMFA model wasderived from the resulting alignment that helpedto score the quality of the homology model.41 Asimilar approach was followed in our group bySchafferhans & Klebe. Structurally distinctthrombin inhibitors were docked into models ofthrombin generated from a set of serine proteaseswith 28% to 40% sequence identity. With respectto the crystal structures of known thrombin com-plexes, ligand-binding modes were obtained withan average rmsd of 1.4 A.42 Based on the generatedalignment of 88 thrombin inhibitors, a significant3D-QSAR model could be established.

The approach followed by Schafferhans & Klebehighlights another important aspect for the appli-cation of homology models in the context of struc-ture-based drug design: instead of placing ligandsinto one particular homology model, the mean ofseveral relevant homology models was consideredfor docking. As shown previously by averagingstructural details and, hence, smoothing the energylandscape, it is possible to circumvent localminima of an otherwise rugged energy surface.

328 Modelling Protein Binding-sites

Page 3: Ligand-supported Homology Modelling of Protein Binding-sites using Knowledge-based Potentials

This results in a faster convergence of the dockingproblem.43,44 Furthermore, the treatment of proteinstructures as ensembles has the advantage that itcompensates for structural deficiencies of singleprotein models. In addition, potential proteinflexibility induced upon ligand binding can beaccounted for implicitly.45 – 47

DragHome was developed especially for thepurpose of ligand docking into approximatehomology-modelled proteins.42 The binding-sitemodels are analysed in terms of putative inter-action sites, which are predicted by LUDI.48 – 54

They are then translated via Gaussian functionsinto arithmetically or geometrically averaged bind-ing-site descriptions representing physico-chemicalproperties. The use of “soft” Gaussian functions todescribe protein–ligand interactions smoothesthe potential energy surface and thus takes intoaccount the limited accuracy of the modelled struc-tures for the purpose of docking. The ligands areexpressed similarly by property densities based onGaussian functions. The docking is performed bymaximising the overlap between the functionaldescriptions for ligand and binding-site represen-tations using the ligand alignment programSEAL.55 –57

Several other approaches have been developedthat succeed in docking ligands into ensembles ofprotein structures. Knegtel et al. used either simpleor energy-weighted averaging for the descriptionof interactions between a ligand and each receptorstructure from the ensemble by generating compo-site grids. These were subsequently used forscoring within DOCK.58 Both averaging methodsperformed equally well in their test cases.Osterberg et al. extended this approach by testingfour methods for merging multiple ensembleentries into a single grid-based lookup table ofinteraction energies using AutoDock.59 For theirtest set, mean and minimum averaging methodsperformed poorly, but two weighted averagingmethods yielded consistent and accurate liganddocking.

Another approach to handle protein flexibility isrealised in FlexE,60 a variant of the FlexX program.FlexE is based on a united protein descriptiongenerated from the superimposed structures of anensemble. For the structurally deviating parts ofthe protein, discrete alternative conformations aretaken into account explicitly during the incremen-tal construction of the ligand in the binding-site.These geometric alternatives are then joined in acombinatorial fashion to create new valid proteinstructures. Thus, conformations of the protein arenot limited to those present explicitly in theensemble.

Anticipating that several of the above-describedmethods can produce a realistic ligand orientationwith respect to the target protein, as a next step ofthe approach presented here, improved models ofthe protein binding-site are generated by includingthis ligand information explicitly as additionalrestraint in the homology modelling process.

Principally, there are several possibilities to con-sider ligand information in a homology-modellingtool such as MODELLER. For example, upper andlower boundaries for bond distances can beconstrained between protein and ligand atoms orinteractions can be approximated by van derWaals, Coulomb, or H-bond potentials. The latterprocedures, however, require explicit assignmentof charges and protonation states in the active site.Alternatively, protein–ligand interactions can bedescribed in terms of knowledge-based atom pairpotentials. They have already been applied suc-cessfully in the field of protein-fold prediction.61 – 63

Using the concept of the “inverse Boltzmannlaw”,64 the frequency distributions of interatomiccontacts, as found in protein crystal structures,are converted into “potentials of mean force” or“knowledge-based potentials”. These potentialshave been proven successful for the prediction ofprotein–ligand interactions.65 – 69 The scoring func-tion DrugScore was developed originally to dif-ferentiate near-native ligand poses from decoybinding modes of the same protein-ligand pair.Through appropriate scaling, quantitative esti-mates of binding affinities are possible.70 Drug-Score pair potentials have been used successfullyas objective function in docking71 and were tailoredfor one particular protein by considering structuraland energetic ligand information in a CoMFA-typeapproach.72

Motivated by the general applicability of Drug-Score potentials to describe protein–ligand inter-actions, we decided to include those as additionalrestraints in MODELLER to consider interactionsbetween fixed ligand and flexible protein atoms.Here, the implementation of these restraints willbe described to assist the modelling of geometric-ally improved protein binding-sites.

Strategy and Computational Realisation

General overview

Below, we describe in detail how we comple-ment data about related proteins with informationabout the binding modes of bioactive ligands togenerate more realistic homology models of pro-tein binding-sites. An overview of our strategy,which was initiated by the development of theDragHome concept and is followed by modellingbinding-sites including ligand information explicitly(MOBILE), is given in Figure 1. Starting with the(crystal) structure of one or more template pro-teins, we generate several preliminary homologymodels of our target protein (step 1). After placingone or more ligands, known to bind to the targetprotein, into an averaged binding-site represen-tation of the generated binding-site models(step 2), we generate new protein models, nowconsidering explicitly the docked ligand(s) (step3a). After scoring the generated complexes withDrugScore, a final model is obtained by selecting

Modelling Protein Binding-sites 329

Page 4: Ligand-supported Homology Modelling of Protein Binding-sites using Knowledge-based Potentials

the model that explains best the observed ligand-binding (affinities) (step 3b). The modelled com-plexes can be refined further, considering the com-posite picture of the best side-chain conformerstaken from different models and minimising theside-chain-to-ligand interactions using a commonforce-field (step 4).

Step 1. Generation of preliminaryprotein models

The program MODELLER9,73,74 is used to gener-ate initial homology models in the first step of ourapproach (Figure 1). MODELLER generates pro-tein 3D structures by satisfying spatial restraintsimposed by the sequence alignment with thetemplate structure and applying the terms of theCHARMM-22 force-field.75 A 3D protein model isobtained by optimising the molecular probabilitydensity function while simultaneously minimisinginput restraint violations. To guarantee sufficientconformational sampling of each active-site resi-due, several homology models are generated inthis step. Preliminary tests showed that a numberbetween 10 and 100 models provides a satisfactorysampling. To optimise the local interactions,all models obtained are subjected to a crude simu-lated annealing refinement protocol available inMODELLER.

Step 2. Placing the ligand(s) into thehomology models

As a next step, proper ligand orientations needto be generated. Three scenarios are described,characterised by a decreasing amount of experi-mental information available.

1. One or more ligands are known to bind to thetarget protein, and the complex crystalstructures by related template proteins areavailable. It can be assumed that the ligand-binding modes are similar in the target andthe template protein. Accordingly, ligandsare then transferred among these structures

keeping their orientation as a restraint for thesubsequent modelling process.

2. One or more ligands are known to bind to thetarget; however, no complex crystal structurewith the template is available. In this case,the ligand(s) can be placed into the templateprotein structure by docking, and the result-ing orientation can then be used to restrainthe following protein modelling process.Alternatively, the coordinates of a similarligand, crystallised together with the templateprotein, serves as a reference to restrain theprotein modelling process. The known ligandis then transferred into the modelled proteinsas described in the following section.

3. If no structural information about ligandsbinding to the template protein is available,one or more ligands (known to bind to thetarget protein) are docked into the homologymodels of the target protein. Since a hom-ology modelling program generates a set ofdifferent models with similar energies, liganddocking is attempted as a placement intoensembles of the modelled protein structures.Here, we combined two different approachesto place ligands into ensembles of model-built protein structures. Following Sotrifferet al.,71 DrugScore potential grids were calcu-lated in the binding pocket of each homologymodel by evaluating protein–ligand inter-actions between a predefined probe atom,placed at each grid point, and the surround-ing protein environment. At short interatomicdistances, the pair potentials were supple-mented by a Gaussian-type repulsive term,as described by Gohlke et al.72 Grids of identi-cal size were used for each homology model.Their dimensions were adjusted to fullyembed the ligand in its crystallographicallydetermined binding mode with an additionalmargin of at least 4 A. The ligands were thendocked into the merged binding pocketsusing AutoDock 3.0 after averaging the gridmaps representing the potential energy usingthe clamped grid method as described byOsterberg et al.59 The Lamarckian geneticalgorithm was applied using the dockingprotocol as given by Sotriffer et al.71

Step 3a. Incorporating ligand information intothe homology modelling process

Having placed the ligand(s) in a near-nativeorientation into the consensus binding-site of themodelled protein, new models are generated byadditionally incorporating information about theseligand(s). During this modelling step, the ligandsare kept fixed in space. The presence of theligand(s) is included into the homology modellingprocess in terms of user-defined restraints.Scaled DrugScore pair potentials are added tothe MODELLER force-field to provide information

Figure 1. An overview over the approach presentedhere: after generating preliminary models of the targetprotein (1), the ligand is docked into the superimposedensemble of homology models (2). In the next step (3),new homology models of the target protein are gener-ated with explicit consideration of the ligand in itsdocked orientation. Finally (4), the modelled complexesare optimised further by combining fragments fromdifferent models and subsequent energy minimisationof the entire complex.

330 Modelling Protein Binding-sites

Page 5: Ligand-supported Homology Modelling of Protein Binding-sites using Knowledge-based Potentials

about the interactions experienced between fixedligand(s) and flexible protein atoms. The scalingof DrugScore potentials with respect to theMODELLER force-field is described in detail inMaterials and Methods. No further interactionsbetween protein and ligand atoms are considered.To make the DrugScore potentials suitable for aminimisation procedure, we approximate them bycubic splines (assigning a range from 0 A to 6 Aand a bin size of 0.1 A). This can be realisedthrough the MODELLER interface. To includethe repulsive interactions at short distances, theabove-mentioned Gaussian repulsion term hasbeen added.72

The protein modelling process is not necessarilyrestrained to one ligand. If several ligands areknown to occupy distinct parts of the bindingpocket, a combination to a composite “super-ligand” can be attempted.

Step 3b. Scoring the generated models

Having generated a set of ligand-supportedhomology models, the next objective is to identifythe best one(s). Quality assessment of homologymodels usually applies fold plausibility criteria ortries to assess local features considering proteinatom interactions only.76 – 85 For our purpose, weare interested primarily in obtaining near-nativemodels of protein binding-sites; accordingly,the standard protocols for evaluating proteinhomology models would be insensitive and non-conclusive. Also, the MODELLER objective func-tion would not provide a proper criterion, as itassesses matching with all requested inputrestraints. Assuming that the modelled protein–ligand geometry corresponds to a near-nativegeometry, we require a scoring function suitableto evaluate protein–ligand interactions. As Drug-Score shows good performance to identify near-native ligand poses from a set of decoy bindingmodes in rigid binding pockets, we decided touse this method in turn to identify near-nativebinding-site geometries with respect to residueside-chain orientations towards the ligand(s).

Step 4. Optimising and refining thehomology models

To optimise the modelled binding-sites, wepursue a strategy of combining good solutions ona per-residue basis from different homologymodels. In the case of identical main-chain orien-tations, the most appropriate side-chain rotamersare assembled from the different models. As theligand(s) have already been placed in the previousmodelling step, the DrugScore rankings betweenligand atoms and individual side-chain rotamersare used to select the most appropriate solutionfrom the set of generated protein side-chain orien-tations. In this context, we reduce the number ofside-chain conformers for each residue by perform-ing a complete linkage clustering, merging two

conformers within a user-defined threshold (bydefault 1.0 A). We then select the conformer withthe best DrugScore value as cluster representativeand eliminate those with unfavourable rankings.Finally, all combinations between the remainingcluster representatives are generated. Solutionsthat produce intramolecular clashes are discarded.The total DrugScore scores of the combined pock-ets are obtained by summing the individual scoresof the considered side-chain conformers. Finally,the model with the best total DrugScore value ischosen.

Since DrugScore pair-potentials implementedinto MODELLER consider directionality of inter-actions only implicitly, a subsequent structuraloptimisation using the MAB force-field inMOLOC is performed. This force-field handlesH-bonds using explicit angular dependencies.86,87

In addition, this step finally removes strained inter-actions within the binding-site residues.

Results and Discussion

In the following, we will demonstrate thatthe MOBILE approach produces more reliablehomology modelled protein–ligand complexes ifligand information is included in this process. Sub-sequently, two “real life” applications will be givento assess the scope and demonstrate the power ofour new method.

Analysis of the generated binding-site models

Comparison of binding-site models generated withand without ligand information

To assess the influence of ligand information onthe protein modelling step, root-mean-squaredeviations (rmsd) of the modelled binding-siteresidues with respect to orientations found inreference crystal structures were evaluated forthe test data set (46 protein–ligand complexes, seeMaterials and Methods, and Table 1). Models weregenerated (1) without ligand information and (2)considering ligands in terms of the DrugScore pairpotentials. For each of the 46 test set proteins, tenmodels were generated with new side-chain andbackbone orientations. Of these, the one with thelowest rmsd with respect to the crystal structurewas selected, and an average rmsd considering allatoms of all binding-site residues was computed.It amounts to 1.90 A if ligand information hasbeen considered (strategy 2) and increases to

Table 1. PDB codes of the proteins in the test data set

121P 1ABE 1ABF 1ACJ 1AHA 1APT 1ATL1AZM 1BBP 1BLH 1BUG 1BYB 1CBX 1CIL1CPS 1CTR 1DID 1DIE 1ELA 1EPB 1F3E1HDC 1HEF 1HFC 1HSL 1HYT 1ICN 1IMB1LAH 1LMO 1LNA 1MLD 1MRG 1MRK 1POC1PPL 1PSO 1RBP 1RDS 1RNT 1ROB 1SNC1SRJ 1STP 1TLP 1TMN

Modelling Protein Binding-sites 331

Page 6: Ligand-supported Homology Modelling of Protein Binding-sites using Knowledge-based Potentials

2.08 A if no ligand information has been used(strategy 1). A paired t-test88 indicates (with a sig-nificance level of 0.01) that these mean values aresignificantly different. With respect to the averagermsds of each of the 46 binding pockets, in 28cases better models were obtained considering theligand in terms of DrugScore potentials (see Figure2). In 17 cases better models were generated whenneglecting the ligand. In one case, a model ofequal quality (with respect to rmsd) was obtained.According to the paired t-test, the respective 46mean rmsd values show a significance level of 0.1in favour of the ligand-supported models. Remark-ably, when the ligand is considered in terms of vander Waals potentials only, the resulting binding-sitemodels are even worse (in terms of rmsd values)than those generated when neglecting ligand infor-mation. The mean rmsd value over all atoms fromall binding-sites amounts to 2.34 A if the ligand is

considered in terms of van der Waals potentials. Apossible explanation for this different performancemight be attributed to the significant difference inthe steepness of the DrugScore versus van derWaals potentials. Soft potentials are more tolerantwith respect to the slight structural deficienciesthat generally occur with model-built structures.

Figure 3 illustrates the benefit of includingligand information into the protein modellingprocess. Here, the side-chain and backbone orien-tations of all 11 binding-site residues of glycosidasecomplexed with adenine (1aha) were predictedeither neglecting (Figure 3(a) and (b)) or consider-ing (Figure 3(c) and (d)) ligand information.Regarding all binding-site residues, the best of theten generated models showed an overall rmsdvalue of 1.25 A (neglecting ligand information)and 0.8 A (considering ligand information),respectively.

Even for the model that was generated withoutregarding the ligand, the overall rmsd value israther satisfying. With 1.25 A it is even better thanthe average value found for all 46 test set com-plexes that were modelled including ligandinformation (1.90 A). However, three modelledresidues (Tyr70, Tyr111 and Ile155, see Figure 3)will clash with a bound ligand if it is inserted inits crystallographically determined orientation.Besides visual inspection, the obtained DrugScorerankings potentially indicate the quality of thegenerated binding-site models. While the complexgenerated considering ligand information scoresonly slightly worse than the native complex, themodel generated neglecting ligand informationexhibits a strongly unfavourable score.

Assessing the side-chain prediction accuracy ofMODELLER

To assess MODELLER’s power to correctly pre-dict side-chain orientations in protein binding-sites, again, we generated homology modelsfor all members of our 46 test set. Deviating fromthe previous test, however, we now generated 100models (to sample search space more exhaustively),

Figure 2. The differences betweenrmsd values exhibited by binding-site models (including side-chainand main-chain atoms) generatedwith ligand information (þ ligand)and without ligand information(2 ligand).

Figure 3. Binding-site of glycosidase complexed withadenine (1aha). The residue orientation taken from thecrystal structure (colour-coding according to atom type)and those generated by homology modelling (cyan oryellow) are shown (a) neglecting ligand informationand (c) including ligand information. (b) and (d) Themolecular surfaces of the binding-site residues in theirmodelled orientation. The bound adenine base has beenconsidered in its orientation as found in the crystalstructure.

332 Modelling Protein Binding-sites

Page 7: Ligand-supported Homology Modelling of Protein Binding-sites using Knowledge-based Potentials

modelling only the side-chain orientations of thebinding-site residues (thus keeping their backbonecoordinates fixed). The orientations of all binding-site side-chains were generated simultaneously.Ligand information was (1) included in terms ofthe DrugScore potentials, (2) included in terms ofvan der Waals potentials and (3) fully ignored.The quality of the protein models was validated intwo ways: (1) the computed binding-site modelswere considered in total and (2) for each singleresidue, we considered only the conformer thathad the lowest rmsd compared to the crystalstructure.

The results are summarised in Table 2. Regard-ing the modelled binding pockets in total, the bestsolutions are obtained when including ligandinformation in terms of DrugScore potentials(1.74 A rmsd). Paired t-tests88 indicate that thismean value differs significantly (with a signifi-cance level of 0.05) from the mean values that areobtained when neglecting ligand information(1.82 A rmsd) or including it in terms of van der

Waals potentials (1.88 A rmsd). Considering thebest conformer of each predicted side-chain, allthree approaches (ligand included in terms of (1)DrugScore potentials, (2) van der Waals potentials,(3) ignored) seem to generate equally good results(mean rmsd values of 1.04, 1.06, and 1.03 A). Thisis probably due to the fact that the conformationalspace of each residue is screened exhaustively by100 probe conformers, irrespective of whether aligand is present.

Comparing the rmsds for each of the 46 modelledbinding pockets in turn, in 30 cases better modelswere obtained when considering the ligand interms of the DrugScore potentials (see Figure 4).In 14 cases models with a lower rmsd wereobtained when neglecting the ligand, and in twocases models with equal rmsds were generated.According to the paired t-test, the respective meanrmsds are significantly different (with a signifi-cance level of 0.1).

Table 3 gives a detailed list of the deviations ofthe multiple binding-site models generated in thepresence of ligand information (in terms of Drug-Score potentials). MODELLER computed geome-tries with rmsd ,2.0 A in 32 of 46 cases.Obviously, the prediction accuracy does notdepend on the number of residues to be modelledbut rather on the type of residues for which confor-mations have to be generated. If a binding pocketcontains rigid, space-filling amino acid residues(in particular Phe, Tyr, Trp or His), remarkablylarge deviations are modelled compared to thecrystal structure: among the 32 satisfactorilymodelled pockets, on the average 2.8 Phe, Tyr, Trpor His residues are present, whereas the 14 caseswith rmsd $2.0 A comprise 5.8 residues of thistype. This is probably due to the fact that anincorrect geometry of a bulky residue provokesincorrect geometries of adjacent residues. Thisinfluence increases with a growing number ofbulky residues in an active site.

Identification of the best binding-site modelsusing DrugScore

The above-described homology models generated

Table 2. Results for predicting side-chains located in theactive sites of the test set proteins

Active site

Predictions with DrugScore potentialsMINa 1.74 ABESTb 1.04 A

Predictions with van-der-Waals potentialsMIN 1.88 ABEST 1.06 A

Predictions without ligand informationMIN 1.82 ABEST 1.03 A

Side-chain predictions were performed for all binding-siteresidues of the test data set (Table 1) simultaneously, keepingthe ligand and the remaining part of the protein fixed. In eachcase, the rmsd is calculated for all atoms in the given category(i.e. no averaging over residues or structures).

a The MIN values consider the binding pockets as entities.For each protein, the binding pocket with the rmsd value closestto the crystal structure was considered.

b The BEST values consider the best side-chain conformer foreach single residue of the generated binding-site models com-pared to the residue in the crystal structure.

Figure 4. The differences betweenrmsd values observed for binding-site models (including only side-chain atoms) generated with ligandinformation (þ ligand) and withoutligand information (2 ligand).

Modelling Protein Binding-sites 333

Page 8: Ligand-supported Homology Modelling of Protein Binding-sites using Knowledge-based Potentials

to assess the prediction accuracy of MODELLERwere used subsequently to evaluate DrugScore’sability to identify near-native complex geometries(“near-native” requires an rmsd ,2.0 A over allbinding-site residues with respect to the corre-sponding crystal structure).

For each of the 46 test cases, DrugScore rankings

were calculated for the crystal structure and the100 model-built complexes. The crystal structures,assumed to represent the global optimum, shouldobtain the best score. In fact, DrugScore was ableto retrieve the crystal structures on rank 1 in 32out of 46 cases (70%). Data in Tables 3 and 4 indi-cate good correlation between DrugScore ranks

Table 3. Results for binding-site models of 46 protein–ligand complexes generated by MODELLER and scored withDrugScore

PDB code rmsd of first DS (A)a Best rmsd value (A)b Residues in the binding-sitec

,1.0 A1ABF 0.23 0.13 KQEWFCDDMTRMNN1LAH 0.96 0.96 DYFSSLSRLSTQD

,1.5 A1LNA 1.02 1.02 NNAFLVHEILRH1IMB 1.13 0.90 EDIDGTEGSGTAYEID1HSL 1.15 1.15 YLSSLSRLGTTQD1F3E 1.17 0.98 DYDCIQGGLAVMG121P 1.18 1.18 AGGVGKSAFVDEDPTTAGNKDLSAK1MLD 1.32 1.32 IRRINLRHGTVSAM1BLH 1.41 1.28 ASKYSNEINGQAI

,2.0 A1LMO 1.50 1.05 DQINYWVDNAW1PPL 1.52 1.52 EENDGSSYGDSQFLFIDGTTLLYLFI1PSO 1.66 1.42 MEVDGSTYGTFFIIYDGTSLQMLI1BUG 1.71 1.21 HHFHHIHMGNFAFH1HDC 1.72 1.63 STGMSLLTYPGMTMTTW1APT 1.75 1.49 ENDGYGDSFLFIDGTTLLYLFI1POC 1.85 1.69 IYWCGHGCHDHTLFFVMYI1CTR 1.87 1.09 EFILEMEAVMA1AHA 1.89 1.27 VYIFGNYIAER1HEF 1.89 1.25 RDGADDVIGGIPVI1ABE 1.96 1.96 KQEWFDDMLTRMNN1ROB 1.97 1.39 QHKVNTDRHFDAS

,2.5 A1MRG 2.05 1.41 IYIFGDYIAER1BYB 2.21 2.21 MLDWIHNVDAERYQWFKSGHWTCMEALLR1EPB 2.21 1.81 FIFWMVLAFKVVAIIKY1ICN 2.21 2.20 YFMMIKFVFFYLADLWFLQQY1STP 2.24 1.57 NLSYSVGNWASTWWLD1TLP 2.25 1.88 YNNAFWFLVHEHYELRDH1ATL 2.26 1.32 EETLGTHEHHCIRPGL1RDS 2.27 2.13 YHEYHDYEEPGARHGDDF1CPS 2.32 1.79 HRERNRHSYLIIYAGTEF1ELA 2.34 1.85 HTVAWTGCQGDSTSFVSR1TMN 2.34 1.98 YNNAFFLVHEHYEILRH1HYT 2.36 2.16 NAFFLVHEHYEIGLRH1RBP 2.42 1.65 LFLAFATAVLMVGMYLQHYFF1RNT 2.43 2.43 NYHKYNNYEERHNNF1AZM 2.46 2.42 FHHEHLLVSLTHW

.2.5 A1SRJ 2.54 2.54 NLSYSAVGNAYWASTWWLD1BBP 2.55 2.09 ENVEGWANYHYFIHLYNFYKFWL1DIE 2.62 2.62 WHTTFVWEEHDD1DID 2.64 2.48 WHMTFVWENEHDHD1SNC 2.66 2.10 DTRLLDEDKYRLYY1CBX 2.76 1.49 HERNRHSLIIYAGTE1HFC 2.89 1.62 GNLAHYVHEHHYPSY1CIL 3.15 2.94 WNHQHHEHVFVLVSLTTPPW1MRK 3.26 2.22 YIMFEGNYIEREW1ACJ 3.31 2.85 GWGGYEFYWIHGY

The rmsd values are calculated for all atoms of each model (i.e. no averaging over structures). The models are treated as entities (i.e.no further optimisation by combining fragments from different models). Rigid, space-filling amino acid residues (Phe, Tyr, Trp, His)are given in bold.

a rmsd of the binding-site model found on rank 1 by DrugScore, with respect to the crystal structure.b rmsd of the binding-site model with the least deviation from the crystal structure.c Residues in the active site (given in the one-letter code) for which new geometries were computed.

334 Modelling Protein Binding-sites

Page 9: Ligand-supported Homology Modelling of Protein Binding-sites using Knowledge-based Potentials

and deviations of model-built versus crystallo-graphically determined binding-sites. If a near-native geometry (,2.0 A) was generated byMODELLER (32 cases), DrugScore was able toidentify a pose with rmsd ,2.0 A on rank 1 in 21cases (66%).

Combination of side-chain conformers incarboxypeptidase A

Even in the overall best model, MODELLERdoes not necessarily generate the best possibleorientation for all binding-site residues. Thus, an

improved model can be obtained by combiningconformers from different models. Figure 5(a)shows the crystallographically determined bind-ing-site of carboxypeptidase A complexed withL-benzylsuccinate together with 100 generatedmodels (yellow). The three best-scored models(DrugScore) are shown in Figure 5(b). Each indi-vidual model contains at least one residue rotamerthat differs significantly from the crystal structure.In contrast, a combination of rotamers consideringonly the individually best-scored ones retrievedfrom the entire ensemble matches the crystal struc-ture to a greater extent (Figure 5(c)). To assess

Table 4. Results for scoring multiple solutions of 46 protein–ligand complexes generated by MODELLER

% Complexes with solutions exhibiting rmsd of the crystal structure

,1.0 A ,1.5 A ,2.0 A $2.0 A

All ranksa 9 43 70 301st rankb 44 47 66 34

a All solutions of each modelling experiment for the test data set (Table 1) are considered. The number expresses the portion of allcomplexes for which at least one solution with the given rmsd value was computed by MODELLER.

b Only the binding-site geometry scored to be on the first rank by DrugScore is considered. The numbers are related to those in thefirst line.

Figure 5. Binding-site residues of carboxypeptidase A complexed with L-benzylsuccinate (1cbx); (a) the crystal struc-ture (colour-coding according to atom type) together with an ensemble of 100 models (yellow). The ligand in its orien-tation as found in the crystal structure is coloured in red; (b) the three models that obtained the best DrugScore values(green . cyan . yellow). (c) The model that results as the best combination of all binding-site residues retrieved indi-vidually from the 100 generated models is depicted in violet. (d) The best docking solution obtained for theligand based on the combined and subsequently minimised model. (Orientation of docked ligand in yellow, crystallo-graphically determined orientation in red.)

Modelling Protein Binding-sites 335

Page 10: Ligand-supported Homology Modelling of Protein Binding-sites using Knowledge-based Potentials

whether the thus generated binding pocket iscapable of reproducing a correct ligand pose, weflexibly docked the ligand present in 1cbx intoeither the crystal structure or into our model.Prior to this, we minimised the modelled bindingpocket with MOLOC in the presence of the ligand.Of course, this procedure does not correspond to arealistic scenario in real-life modelling case studies,because we restrained the modelling process withthe ligand in its orientation known from the crystalstructure, which will not usually be given. TheLamarckian genetic algorithm was applied inAutoDock using DrugScore grids to describe theprotein binding-sites (Figure 5(d)),71 and ten inde-pendent runs were performed. In both cases,docking produced two different ligand placements(AutoDock scoring energies for crystal structures:28.03 kcal/mol and 27.62 kcal/mol and model-built complex: 28.39 kcal/mol and 28.10 kcal/mol, respectively). In case of the experimentallyresolved protein structure, the first solution has an

rmsd of 1.21 A with respect to the crystal coordi-nates. For the model, the second solution deviatesfrom the crystal coordinates by 0.75 A rmsd. Theclose energy ranks and the small rmsds indicate ahigh degree of similarity between the model andthe original crystal structure. The ligand orien-tation, albeit found on rank 2, could be reproducedsatisfactorily via docking into the generated model.This convincing result stimulated us to embark onsome real-life modelling applications.

Modelling case studies

Modelling factor Xa based on trypsin

In the previous test examples, entire bindingpockets have been modelled. Furthermore, themodelling process was restrained by the ligand inits orientation known from the crystal structure. Inreal-life applications, the protein to be modelledmay differ by only several mutations with respect

Figure 6. (a) Superimposed crystal structures of trypsin (beige) and factor Xa (grey), both complexed with the ligandRPR128515. (b) The ensemble of relevant binding-site residues of factor Xa (modelled without considering ligand infor-mation using trypsin as template), together with the backbone of the crystal structure of factor Xa. The ligand isdepicted in yellow (native orientation known from the crystal structure) and red (solutions from docking into theensemble of the homology models). (c) The 100 new homology models of factor Xa that were generated based on tryp-sin regarding the ligand during the modelling process. (d) The finally optimised binding-site model (generated bycombining side-chains from different models) is shown in beige, together with structures of factor Xa crystallisedwith RPR128515 (1ezq, cyan) or with other ligands (grey).

336 Modelling Protein Binding-sites

Page 11: Ligand-supported Homology Modelling of Protein Binding-sites using Knowledge-based Potentials

to the template protein(s) and the ligand geometrymight not be known exactly beforehand. To applysuch a scenario, we generated homology modelsof factor Xa using bovine trypsin as template. Bothproteins are members of the class of trypsin-likeserine proteases and share 38% sequence identity(see Figure 6(a)). They are known to bind theligand RPR128515 in a similar orientation.89 Asequence alignment was produced byMALIGN3D.90 In order to mimic a scenario wherea ligand orientation is not given by crystallogra-phy, we generated ten preliminary homology fac-tor Xa models based on the crystal structure oftrypsin (1f0u) excluding ligand information. Wethen docked RPR128515 flexibly into the mergedbinding pockets using AutoDock. The dockingsolutions, together with the ensemble of allmodelled residues known to be crucial for theligand binding, are depicted in Figure 6(b). Ninedeviating docking solutions were obtained. Thebest solution (rmsd 0.97 A with respect to theligand orientation in the crystal structure 1ezq) isfound on rank 6 (AutoDock energy score214.35 kcal/mol). While the solution found onrank 1 (with an energy score of 214.80 kcal/mol)deviates by 3.33 A rmsd, four additional solutionswere generated with an rmsd ,2.0 A. The solutionfound on rank 2 differs by 1.64 A from thecrystal coordinates and has an energy score of214.27 kcal/mol. As indicated in Figure 6(b), theorientations of some modelled binding-site resi-dues (mainly those that are mutated compared tothe original binding pocket of trypsin) are distribu-ted over a large area. Nevertheless, the mappedconfiguration space for ligand orientations is ratherrestricted, since all generated solutions clusterabout the native orientation.

Following our strategy outlined above, wesubsequently generated new factor Xa modelsexplicitly considering ligand information. Weincluded RPR128515 (see Figure 6(a))89 as anadditional restraint in the protein modelling pro-cess. Taking the crystal structure of bovine trypsin(1f0u)89 with the ligand in its co-crystallised orien-tation as template, 100 new factor Xa models weregenerated (Figure 6(c)). Again, side-chain confor-mational space is mapped considerably; however,solutions penetrating into the ligand hardly occur.A final factor Xa model was obtained by com-bining rotamers retrieved from different homologymodels (Figure 6(d)) (beige). For comparison, thecrystal structure of factor Xa with boundRPR128515 (1ezq, cyan) and nine other crystalstructures of factor Xa crystallised with differentligands (grey) are shown.89,91 – 95 Although thermsd between the final model and crystal structure(1ezq) amounts to 1.66 A, the features primarilyresponsible for binding are well reproduced, apartfrom Glu147, Gln192 and Glu97, which do notalign perfectly with the crystal structure. In thecase of Glu147, this is due to the fact that the back-bones of factor Xa and trypsin do not align in thisarea. However, as there is no specific interaction

between the ligand and Glu147, this deviation isof no relevance. The ester group of the ligandforms a H-bond to Gln192-NH. Since the backbonetraces match well in the template and the model,deviations in side-chain orientation of Gln192 arenot important for the ligand pose. The same holdsfor Glu97, which establishes a strong H-bond(2.5 A) through its backbone carbonyl oxygenatom and an amino group of the ligand. The otherresidues, in particular Tyr99 and Phe174, that con-tribute to binding and determine the specificity ofthe S4 pocket in factor Xa,94 are modelled almostperfectly. Regarding the fact that some of the dis-cussed binding-site residues in factor Xa exhibitconsiderable side-chain flexibility upon binding ofdifferent ligands, as indicated by multiple struc-ture determinations (see Figure 6(d)), the generatedmodel appears rather convincing.

To assess whether the generated binding-sitemodel could be used successfully for virtualscreening, we tried to reproduce the bindingmode of ten ligands that have been co-crystallisedwith factor Xa.89,91 – 95 For reasons of comparison,we also docked these ligands into the bindingpocket of a crystallographically determined factorXa structure (1ezq).

The results with respect to rmsd and AutoDockenergy score are given in Table 5. The overall suc-cess rate is slightly higher when docking into thefactor Xa crystal structure. Considering the solu-tions with the lowest rmsd value with respect tothe experimental structure, in six (out of ten) casesa better solution is obtained when docking intothe crystal structure instead of our model.However, the differences expressed in terms ofrmsd are not large, in particular, taking the gridapproximation within AutoDock and positionaluncertainties in the experimental structures intoconsideration. Also, the differences in energy

Table 5. Statistics on the docking experiments on factorXa

Crystal structurea Modelb

Ligand PDB codec

rmsd(A)

Energy(kcal/mol)

rmsd(A)

Energy(kcal/mol)

1EZQ 0.78 216.17 0.33 217.241F0R 1.81 215.08 2.04 215.351F0S 1.15 213.50 1.75 213.511FAX 1.98 215.67 1.89 215.201FJS 1.57 215.48 2.21 216.351G2L 1.96 215.70 1.95 215.761G2M 1.98 215.03 1.83 215.401KSN 1.03 215.99 1.28 216.321XKA 1.88 214.71 2.38 215.121XKB 1.90 214.27 2.53 214.30

All values refer to the least deviating solution with respect tothe crystal structure.

a Results for docking the ligands into the crystal structure offactor Xa (1ezq).

b Results for docking the ligands into the homology model offactor Xa.

c Data set of ten factor Xa ligands.

Modelling Protein Binding-sites 337

Page 12: Ligand-supported Homology Modelling of Protein Binding-sites using Knowledge-based Potentials

scores are negligible. In only two cases, theyamount to more than 0.5 kcal/mol. Remarkably,docking RPR128515 either back into our model(rmsd 0.33 A; score 217.24 kcal/mol) or into thecrystal structure (rmsd 0.78 A; score: 216.17 kcal/mol) reveals a better result for the model. Thisshows, not unexpectedly, that the model is tailoredslightly towards the ligand used to restrain themodelling. Nevertheless, since convincing resultsare obtained for all ligands considered, the modelgenerated appears to be well suited for structure-based drug design purposes.

Modelling aldose reductase based onaldehyde reductase

The previous case study demonstrated that ourapproach generates sufficiently accurate geome-tries of protein residues to establish specific inter-actions with ligands. In the following example, wewill investigate how well binding modes can bereproduced for a protein known from crystal struc-ture analysis to exhibit pronounced induced-fitadaptations upon ligand binding.

Aldose reductase (AR), an NADPH-dependentenzyme, catalyses the reduction of glucose alongthe sorbitol pathway and, therefore, representsa promising drug target in diabetes therapy ofsecondary complication.

AR shares 49.5% sequence identity with alde-hyde reductase. In particular, the cofactor binding-sites and the regions where the hydrid transferfrom NADPH to the carbonyl carbon atom of thesubstrates occur (anion-binding pocket), are struc-turally highly conserved (see Figure 7). However,aldehyde reductase exhibits an additional loop,comprising 11 residues, that is responsible fordifferences in substrate specificity. Interestingly, inAR, this segment is composed of only four residues(Ala299-Ser302). Here, it is part of the hydrophobicspecificity pocket and shows the most strikingadaptations upon ligand binding. An MD simu-lation performed on the ultra high-resolutioncrystal structure of human aldose reductase com-plexed with IDD59496 revealed the most pro-nounced flexibility in this region with the largestside-chain mobility exhibited by Leu300. A verydistinct binding-site conformation (compared tothe IDD594 complex) is observed for tolrestat bind-ing (1ah3) to the porcine enzyme. Superimpositionwith the IDD594 complex (Figure 8) reveals identi-cal orientations of the ligand’s carboxylate groupsin the anion-binding pocket, whereas tolrestatwould clash into Leu300 in the IDD594 structure.

To examine whether these specific binding-sitegeometries could be modelled by the MOBILEapproach, we generated two different sets of ARmodels including either tolrestat (1ah0) or IDD594as ligand-derived restraints. According to ourstrategy, we initially generated 100 preliminaryAR models based on the crystal structure of alde-hyde reductase (1hqt) neglecting ligand infor-mation. The coordinates of the cofactor (being

identical in AR and aldehyde reductase) weretransferred from aldehyde reductase to the ARmodels. Next, we placed tolrestat and IDD594 intothe ensemble of preliminary homology modelsusing AutoDock. In the case of tolrestat, a gooddocking solution (2.05 A rmsd) was found on rank2. For the IDD594 complex, a solution with 2.53 Armsd was obtained on rank 3. To refine the

Figure 7. Superimposed crystal structures of aldosereductase (AR, cyan) and aldehyde reductase (marine)with the NADPþ cofactor (shown in beige) in itsorientation from aldehyde reductase. The loop regionscomposing the specificity pockets are coloured yellow(AR) and red (aldehyde reductase), respectively.

Figure 8. Conformational changes in the AR bindingpocket in consequence of inhibitor binding. The nicotin-amide ring of the cofactor is shown in red. In blue, theorientations of tolrestat and the side-chains of Leu300are displayed (as observed in the corresponding crystalstructure 1ah3), the latter residue is affected mainly bythe conformational rearrangement of the binding pocketupon ligand binding. The ligand IDD594, together withthe geometry of the corresponding binding-site residue,is depicted in orange.

338 Modelling Protein Binding-sites

Page 13: Ligand-supported Homology Modelling of Protein Binding-sites using Knowledge-based Potentials

modelled complexes further, we performed anadditional iteration with our approach. Therefore,the ligand orientations of tolrestat and IDD594obtained were used to restrain the subsequenthomology modelling. Considering each of thedocked inhibitors separately, two sets of 100homology models based on aldehyde reductase(1hqt) as template were generated. Next, wedocked the two ligands into the ensembles of theprotein models produced. For tolrestat, a solutionwith 0.84 A rmsd with respect to the orientationobserved in the crystal structure was obtained(found on rank 3), the best solution for IDD594exhibited 1.22 A rmsd (also on rank 3). Comparedto the respective rms deviations for the dockinginto the preliminary homology models (as shownabove: 2.05 A (tolrestat) and 2.53 A (IDD594)),these improvements for the docking into therefined models are strongly significant.

Besides producing a near-native ligand geome-try, our prime interest is focussed on prediction ofa correct loop geometry, since the remaining partof the binding pockets in AR and aldehydereductase are rather similar. Accordingly, wescored only the interactions formed between bothdocked ligands and the residues in the sequencestretch Ala299-Ser302 of the models generatedusing DrugScore. For both cases, loop geometriesclosely approximating the crystal structures (acomparison is shown in Figure 9(a) and (b)) werefound among the top-scored solutions. In the caseof tolrestat, the most convincing loop orientationwas found on rank 2 (rmsd considering the side-chain atoms of Leu300, 1.22 A), for IDD594 theloop conformer on rank 2 deviates by 1.49 A.

AR provides an example of ligand-inducedprotein adaptations affecting even the backboneconformation. This case study demonstrates thatrealistic protein–ligand geometries can be gener-ated by applying the MOBILE approach to thisrather complex system, where ligands reinforce

different loop conformations upon binding.Furthermore, we have shown that the mutualorientations between the protein and a particularligand can be adjusted in a stepwise fashion. Eventhough the initial starting protein–ligand geome-tries deviated considerably from the orientationsfound in the referring crystal structures, near-native geometries could be generated for both thetolrestat and the IDD594 complex after performinga second cycle of our approach.

Conclusion and Outlook

We present a novel strategy (MOBILE) to con-sider information about the binding mode of bio-active ligands during the homology modellingprocess. It starts with a combined set of homologymodels, and ligands are placed into a crude bind-ing-site representation via docking onto averagedproperty fields derived from knowledge-basedpotentials. Once the ligands are placed, a new setof homology models is generated. However, inthis step, ligand information is considered as anadditional restraint in terms of knowledge-basedpair potentials. Consulting a large ensemble ofmodels exhibiting different side-chain rotamersfor the binding-site residues, a composite pictureis assembled considering the individually best-scored rotamers with respect to the ligand. After alocal force-field optimisation, the binding-sitemodels are used for flexible docking. As a result,protein binding-site models of greater accuracyand relevance are generated. The applicationof DrugScore pair potentials proved efficient torestrain the homology modelling process, and toscore and optimise the modelled protein–ligandcomplexes. This was demonstrated by using a testdata set of 46 complexes and further validated byapplying the new strategy to relevant modellingscenarios.

Figure 9. Superimposition of the crystal structures (blue) and the modelled complexes (cyan) of AR with (a) tolrestatand (b) IDD594. The side-chain orientations of Leu300 are indicated. The nicotinamide ring of the cofactor is shown inred.

Modelling Protein Binding-sites 339

Page 14: Ligand-supported Homology Modelling of Protein Binding-sites using Knowledge-based Potentials

A commonly applied protocol for modellingbinding-sites of unknown proteins usually startswith generating one preliminary homology modelof the uncomplexed protein, occasionally opti-mised by molecular dynamics. After placing aligand into the modelled active site, usually theentire complex is subjected to a further refinement.As pointed out by Schonbrun et al.,97 it is debatablewhether a refinement by molecular dynamicsactually improves the predicted structure. Indeed,none of the top comparative modelling groupsinvolved in CASP498 used such protocols, probablybecause previous experience did not suggest anyadvantage in predictive power. Most likely this isdue to the limited sampling of configuration spaceusing standard molecular dynamics, although thislimitation could be overcome by generalized-ensemble simulations.99 Accordingly, it is ratherunlikely that the global minimum geometry of aprotein–ligand complex can be obtained if thestarting geometry is distant from the near-nativeone and separated by a high energy barrier.Supposedly, this limitation does not occur in ourapproach because the configuration space ofthe modelled binding pocket is sampled moreexhaustively while simultaneously considering theligand in a near-native orientation. The globalminimum is approximated as a composite pictureby identifying optimally scored rotamers from alarge set of generated models. The individualrotamers are scored with respect to a given ligandpose using DrugScore. This function has beendemonstrated to identify efficiently native andnear-native protein–ligand configurations.65

Similar to protein–ligand docking, the strategyto detect near-native complex geometries involvestwo equally important steps: (1) computing rele-vant geometries; and (2) identifying the poseclosest to the experimentally given situation(scoring). The program MODELLER used in ourapproach produces relevant geometries of proteinbinding-sites even in the absence of a ligand, par-ticularly if the search space for side-chain rotamersis small and, thus, can be sampled efficiently.However, our approach shows that the efficiencyand accuracy of the modelling process is clearlyenhanced by considering ligand information. Thesecond goal, identifying complexes with near-native geometries, inevitably requires the presenceof a ligand in a realistic orientation.

It has been shown that relevant bindingmodes can be produced by docking ligands intoensembles of protein structures.60 Smoothing thepotential energy surface results in an even fasterconvergence of the docking problem. Nevertheless,due to the approximate nature of the binding-siterepresentations derived from an ensemble ofmodelled protein geometries, the use of confor-mationally restricted ligands is advisable. If a 3Dsuperposition of ligands, e.g. in the context of apreviously performed 3D-QSAR study, is available,these aligned ligands could be docked rigidly intothe homology models. This will further reduce the

search space of the docking problem. The mutualsimilarity of different ligands in their dockedorientations can be used as an additional criterionto assess the quality of the docking solutions.42

In our approach, ligand information is used onlyin structural terms. Additionally, affinity data forthe ligands might be considered to assess thequality of the homology models generated. Suchconcepts will result in a “QSAR-refined homologymodelling”. The first option would be to use agiven set of ligands, docked into several homologymodels, and the affinity of all resulting complexeswould be predicted. The model that yields thebest correlation between calculated and exper-imental affinities is rendered prominent. A possiblelimitation of this strategy might be that thepresently available scoring functions cannotpredict affinities accurately enough. Interestingly,3D-QSAR models based on superimposed ligandsreveal surprisingly high predictive power inaffinity estimation, provided a correct super-imposition is given. In consequence, a secondalternative to assess the quality of the modelsproduced would be to generate multiple QSARmodels based on distinct ligand alignmentsobtained from the docking into the varioushomology models. In analogy to the procedurefollowed by several authors,41,42,100 the statisticalsignificance of the generated QSAR models is thusused to assess the relevance of the different proteinmodels. A further possibility to reliably predict theaffinities between homology models and ligandswould be to establish an AFMoC model.72 AFMoCtailors protein-specifically adopted DrugScore pairpotentials to one particular protein by consideringadditional ligand-based information in a CoMFA-type approach. The statistical significance of anAFMoC model thus explicitly reflects the qualityof the underlying protein model. A further advan-tage is that AFMoC allows the user to graduallymove from general knowledge-based potentialsto protein-specifically adopted ones, depending onthe confidence in the generated protein model andthe amount of ligand data available for training.

To assess the predictive power of proteinhomology modelling techniques, usually the rmsdbetween the model-built and the correspondingcrystal structure is determined. Here, we followthe same procedure. However, one has to regardintrinsic accuracy limits. X-ray structures obtainedfor the same protein in different laboratories ordetermined in two different crystal forms canshow deviations in main-chain atoms of about0.5 A rmsd. The solvent-exposed side-chains candiffer by as much as 1.5 A, while for more buriedside-chains, the difference can amount to 1.0 A.101

Exploring the theoretical prediction limit of com-monly applied force-fields, Petrella et al. suggesteda limit for side-chain prediction of 0.8 A.102 Xianget al. assumed accuracy limits of 0.7 A for theside-chains of core residues.103 In light of theseestimates, the accuracies achieved by our approachon the test set for binding-site residues (<1.0 A)

340 Modelling Protein Binding-sites

Page 15: Ligand-supported Homology Modelling of Protein Binding-sites using Knowledge-based Potentials

are quite convincing. This becomes even more pro-nounced when considering that in the two above-mentioned studies, all residues were kept fixedexcept the one being predicted, whereas in ourapproach the orientations of all protein side-chainsin the active site were predicted simultaneously.Finally, as noted by Tramontano et al.,10 the rmsdcriterion is accepted widely, but is not necessarilyalways a perfect figure-of-merit. Criteria that rankproper side-chain orientations with respect toneighbouring side-chains would be more con-clusive. In particular, this is important whileevaluating the side-chain orientations of a pro-tein with respect to a preoriented ligand. In ourapproach, DrugScore convincingly supports thisstep in particular.

For proteins exhibiting pronounced induced-fitadaptations, homology modelling based on asingle crystal structure of a related protein is diffi-cult and results could be misleading. Even thoughcrystal structures are our most reliable source forlearning about protein geometry, they provideonly a frozen snap-shot of a dynamically fluctua-ting system. Local effects such as the applied pHconditions or impacts imposed by crystal packingdo influence binding modes.104 Through ligandbinding, different local minima experienced by theuncomplexed protein under dynamic conditionscan be stabilised and observed as favourablebinding-site geometries in a crystal.105 – 107

Homology modelling using MODELLER isbased on a reference template structure and theapproach tries to carry over as much informationas possible from the template into the model; inparticular, in regions with high levels of sequenceidentity and structural conservation. To performan exhaustive side-chain screening by ourapproach, such regions must be excluded from thedirect homology matching step. As an alternative,

structural variability can be introduced in themodelling process by considering multiple tem-plate structures exhibiting deviating conformationsin the flexible regions.

Materials and Methods

Test data set

A test data set of 46 protein–ligand complexes wascompiled to validate the performance of our approach(see Table 1). This set has been extracted from the 91 pro-tein–ligand complexes used for DrugScore validation,65

considering the following criteria: (1) as MODELLER isintended primarily for homology modelling, we selectedonly structures that do not contain cofactors next to theactive site. (2) The only hetero atoms allowed (besidesthose in the ligands) were metal ions. (3) We eliminatedall water molecules, as their positions will generally not(yet) be predicted realistically in modelling scenarios.We used this test data set for scaling the DrugScorepotentials to the MODELLER force-field, comparinghomology models generated with and without ligandinformation, assessing the side-chain prediction accuracyof MODELLER, and evaluating DrugScore’s power toidentify near-native complex geometries.

Generation of binding-site models of the testdata set

For validation studies performed with the test data set(see Table 1), only the geometries of residues next to theactive site (within a distance of 4.5 A to the ligand) weremodelled. Here, the crystal structure of the respectivePDB entry served itself as template for the modellingprocess (Figure 10). The coordinates of all but thebinding-site residues of the protein were kept unchangedwith respect to the templates. New geometries for bind-ing-site residues (including side-chain and main-chainatoms) were forced to be generated by MODELLER bykeeping the binding-site residues unmatched in the

Figure 10. Generation of binding-site models for the proteins of thetest set (Table 1). An alignedsequence stretch of a protein isshown in (A). Five residues, belong-ing to the active site, are shadedgrey. The procedure for generatingbinding-site models works as fol-lows: the available structure of thetemplate protein serves as basis tomodel the structure of the modelsequence. Generating a homologymodel with the identically alignedsequences as represented in (A)would result in a model structurethat is identical with the template.To generate a model with newside-chain orientations of the bind-

ing-site residues (while keeping the remaining part of the protein as in the template structure), sequence alignment(B) is used. Here, the binding-site residues in the template structure are mutated to Gly. Thus, information aboutthe respective side-chain coordinates cannot be inferred by homology and must be predicted de novo by MODELLER.For modelling the main-chain and side-chain orientations of the model, the binding-site residues are unaligned inthe sequence alignment (C), leading to a complete neglect of information in MODELLER about these residues fromthe template structure.

Modelling Protein Binding-sites 341

Page 16: Ligand-supported Homology Modelling of Protein Binding-sites using Knowledge-based Potentials

sequence alignment (see Figure 10(C)). For the de novoprediction of side-chain geometries only, the referringresidues in the template structures were mutated to Gly(see Figure 10(B)).

Scaling the DrugScore potentials with respect to theMODELLER force-field

In order to incorporate the DrugScore pair potentialsas additional restraints into the MODELLER force-field,they were scaled empirically with respect to the remain-ing force-field terms. Apart from the terms adaptedfrom CHARMM to constrain stereochemical properties,the MODELLER force-field consists of probability den-sity functions of purely empirical origin. Therefore, itappears justified to weight the DrugScore potentialsempirically. This adjustment was accomplished usingour test data set of 46 complexes (see Table 1) by system-atically varying the contribution of the DrugScore pairpotentials to the MODELLER molecular probability den-sity function. For this parameterisation study, the coordi-nates of the ligand atoms were adopted from thereferring PDB entries. For each parameter setting (i.e.scaling factor), we generated ten homology models. Inthese “models”, all coordinates apart from residueswithin 4.5 A distance to the ligand were kept identicalwith the crystal coordinates. The binding-site residues,however, were generated de novo by MODELLER, i.e.without considering information taken from the templatestructure. The models generated using the actual para-meter setting were assessed with respect to their spatialdeviation from the corresponding crystal structures bycomputing the rmsd between the residues of the mod-elled and crystallographically determined binding-site.As a further criterion to consider the similarity betweenmodel and crystal structure, grids based on DrugScorepotentials were calculated in the modelled and crystallo-graphic binding-sites and their mutual similarity wasassessed by evaluating the Hodgkin index.108 A scalingfactor of 7.5 £ 1025 was finally found as best solution toadjust the DrugScore potentials to the MODELLERforce-field. A similar scaling factor was obtained bySotriffer et al. scaling DrugScore to the intramolecularforce-field implemented into AutoDock.71

Acknowledgements

The authors are grateful to Dr A. Schafferhans(Lion Biosciences, Heidelberg, Germany) for stimu-lating discussion, in particular in the beginning ofthis project. We thank Dr P. Sanschagrin for pro-viding code for the complete-linkage clusteringalgorithm. We acknowledge helpful discussionswith Dr C. Sotriffer. Finally, the authors thankDr A. Podjarny (IGBMC, Illkirch, France) forproviding the coordinates of the crystal structureof the IDD594 complex.

References

1. Marcotte, E. M., Pellegrini, M., Ng, H. L., Rice,D. W., Yeates, T. O. & Eisenberg, D. (1999). Detect-ing protein function and protein–protein inter-

actions from genome sequences. Science, 285,751–753.

2. Broder, S. & Venter, J. C. (2000). Sequencing theentire genomes of free-living organisms: the foun-dation of pharmacology in the new millennium.Annu. Rev. Pharmacol. Toxicol. 40, 97–132.

3. Lander, E. S., Linton, L. M., Birren, B., Nusbaum, C.,Zody, M. C., Baldwin, J. et al. (2001). Initial sequen-cing and analysis of the human genome. Nature,409, 860–921.

4. Rubin, G. M., Yandell, M. D., Wortman, J. R., GaborMiklos, G. L., Nelson, C. R., Hariharan, I. K. et al.(2000). Comparative genomics of the eukaryotes.Science, 287, 2204–2215.

5. Sanchez, R., Pieper, U., Melo, F., Eswar, N., Marti-Renom, M. A., Madhusudhan, M. S. et al. (2000).Protein structure modeling for structural genomics.Nature Struct. Biol. 7, 986–990.

6. Brenner, S. E. (2000). Target selection for structuralgenomics. Nature Struct. Biol. 7, 967–969.

7. Brenner, S. E. & Levitt, M. (2000). Expectations fromstructural genomics. Protein Sci. 9, 197–200.

8. Holm, L. & Sander, C. (1996). Mapping the proteinuniverse. Science, 273, 595–603.

9. Marti-Renom, M. A., Stuart, A. C., Fiser, A.,Sanchez, R., Melo, F. & Sali, A. (2000). Comparativeprotein structure modeling of genes and genomes.Annu. Rev. Biophys. Biomol. Struct. 29, 291–325.

10. Tramontano, A., Leplae, R. & Morea, V. (2001).Analysis and assessment of comparative modelingpredictions in CASP4. Proteins: Struct. Funct. Genet.Suppl. 5, 22–38.

11. Moult, J., Hubbard, T., Fidelis, K. & Pedersen, J. T.(1999). Critical assessment of methods of proteinstructure prediction (CASP): round III. Proteins:Struct. Funct. Genet. Suppl. 3, 2–6.

12. Venclovas, C., Zemla, A., Fidelis, K. & Moult, J.(2001). Comparison of performance in successiveCASP experiments. Proteins: Struct. Funct. Genet.Suppl. 5, 163–170.

13. Bates, P. A., Kelley, L. A., MacCallum, R. M. &Sternberg, M. J. (2001). Enhancement of proteinmodeling by human intervention in applying theautomatic programs 3D-JIGSAW and 3D-PSSM.Proteins: Struct. Funct. Genet. Suppl. 5, 39–46.

14. Thornton, J. M., Orengo, C. A., Todd, A. E. & Pearl,F. M. (1999). Protein folds, functions and evolution.J. Mol. Biol. 293, 333–342.

15. Thornton, J. M., Todd, A. E., Milburn, D., Borkakoti,N. & Orengo, C. A. (2000). From structure to func-tion: approaches and limitations. Nature Struct.Biol. 7, 991–994.

16. Skolnick, J., Fetrow, J. S. & Kolinski, A. (2000). Struc-tural genomics and its importance for gene functionanalysis. Nature Biotechnol. 18, 283–287.

17. Andrade, M. A., Brown, N. P., Leroy, C., Hoersch,S., de Daruvar, A., Reich, C. et al. (1999). Automatedgenome sequence analysis and annotation. Bio-informatics, 15, 391–412.

18. Russell, R. B. & Eggleston, D. S. (2000). New rolesfor structure in biology and drug discovery. NatureStruct. Biol. 7, 928–930.

19. Klabunde, T. & Hessler, G. (2002). Drug designstrategies for targeting G-protein-coupled receptors.ChemBiochem, 3, 928–944.

20. Bourdon, H., Trumpp-Kallmeyer, S., Schreuder, H.,Hoflack, J., Hibert, M. & Wermuth, C. G. (1997).Modelling of the binding site of the human m1

342 Modelling Protein Binding-sites

Page 17: Ligand-supported Homology Modelling of Protein Binding-sites using Knowledge-based Potentials

muscarinic receptor: experimental validation andrefinement. J. Comput. Aided Mol. Des. 11, 317–332.

21. Lozano, J. J., Lopez-de-Brinas, E., Centeno, N. B.,Guigo, R. & Sanz, F. (1997). Three-dimensionalmodelling of human cytochrome P450 1A2 and itsinteraction with caffeine and MeIQ. J. Comput.Aided Mol. Des. 11, 395–408.

22. Garcia-Nieto, R., Perez, C. & Gago, F. (2000).Automated docking and molecular dynamics simu-lations of nimesulide in the cyclooxygenase activesite of human prostaglandin-endoperoxide syn-thase-2 (COX-2). J. Comput. Aided Mol. Des. 14,147–160.

23. Zhang, X. P., Sjoling, S., Tanudji, M., Somogyi, L.,Andreu, D., Eriksson, L. E. et al. (2001). Mutagenesisand computer modelling approach to study deter-minants for recognition of signal peptides by themitochondrial processing peptidase. Plant J. 27,427–438.

24. Marhefka, C. A., Moore, B. M., 2nd, Bishop, T. C.,Kirkovsky, L., Mukherjee, A., Dalton, J. T. & Miller,D. D. (2001). Homology modeling using multiplemolecular dynamics simulations and dockingstudies of the human androgen receptor ligandbinding domain bound to testosterone and non-steroidal ligands. J. Med. Chem. 44, 1729–1740.

25. Le Novere, N., Grutter, T. & Changeux, J. P. (2002).Models of the extracellular domain of the nicotinicreceptors and of agonist- and Ca2 þ -binding sites.Proc. Natl Acad. Sci. USA, 99, 3210–3215.

26. Gieldon, A., Kazmierkiewicz, R., Slusarz, R. &Ciarkowski, J. (2001). Molecular modeling of inter-actions of the non-peptide antagonist YM087 withthe human vasopressin V1a, V2 receptors and withoxytocin receptors. J. Comput. Aided Mol. Des. 15,1085–1104.

27. Escherich, A., Lutz, J., Escrieut, C., Fourmy, D., vanNeuren, A. S., Muller, G. et al. (2000). Peptide/benzodiazepine hybrids as ligands of CCK(A) andCCK(B) receptors. Biopolymers, 56, 55–76.

28. Bathelt, C., Schmid, R. D. & Pleiss, J. (2002). Regio-selectivity of CYP2B6: homology modeling, molecu-lar dynamics simulation, docking. J. Mol. Model.(Online), 8, 327–335.

29. Lopez-Rodriguez, M. L., Murcia, M., Benhamu, B.,Olivella, M., Campillo, M. & Pardo, L. (2001).Computational model of the complex betweenGR113808 and the 5-HT4 receptor guided by site-directed mutagenesis and the crystal structure ofrhodopsin. J. Comput. Aided Mol. Des. 15, 1025–1033.

30. Bissantz, C., Bernard, P., Hibert, M. & Rognan, D.(2003). Protein-based virtual screening of chemicaldatabases. II. Are homology models of G-proteincoupled receptors suitable targets? Proteins: Struct.Funct. Genet. 50, 5–25.

31. Vaidehi, N., Floriano, W. B., Trabanino, R., Hall,S. E., Freddolino, P., Choi, E. J. et al. (2002). Predic-tion of structure and function of G protein-coupledreceptors. Proc. Natl Acad. Sci. USA, 99,12622–12627.

32. Gouldson, P. R., Snell, C. R. & Reynolds, C. A.(1997). A new approach to docking in the beta2-adrenergic receptor that exploits the domainstructure of G-protein-coupled receptors. J. Med.Chem. 40, 3871–3886.

33. Moro, S., Li, A. H. & Jacobson, K. A. (1998). Molecu-lar modeling studies of human A3 adenosineantagonists: structural homology and receptordocking. J. Chem. Inf. Comput. Sci. 38, 1239–1248.

34. Moro, S., Guo, D., Camaioni, E., Boyer, J. L.,Harden, T. K. & Jacobson, K. A. (1998). HumanP2Y1 receptor: molecular modeling and site-directed mutagenesis as tools to identify agonistand antagonist recognition sites. J. Med. Chem. 41,1456–1466.

35. Tiraboschi, G., Jullian, N., Thery, V., Antonczak, S.,Fournie-Zaluski, M. C. & Roques, B. P. (1999). Athree-dimensional construction of the active site(region 507-749) of human neutral endopeptidase(EC.3.4.24.11). Protein Eng. 12, 141–149.

36. Rong, S. B., Zhang, J., Neale, J. H., Wroblewski, J. T.,Wang, S. & Kozikowski, A. P. (2002). Molecularmodeling of the interactions of glutamate carboxy-peptidase II with its potent NAAG-based inhibitors.J. Med. Chem. 45, 4140–4152.

37. Kiyama, R., Tamura, Y., Watanabe, F., Tsuzuki, H.,Ohtani, M. & Yodo, M. (1999). Homology modelingof gelatinase catalytic domains and docking simu-lations of novel sulfonamide inhibitors. J. Med.Chem. 42, 1723–1738.

38. Vedani, A., Zbinden, P. & Snyder, J. P. (1993).Pseudo-receptor modeling: a new concept for thethree-dimensional construction of receptor bindingsites. J. Recept. Res. 13, 163–177.

39. Jansen, J. M., Koehler, K. F., Hedberg, M. H.,Johansson, A. M., Hacksell, U., Nordvall, G. &Snyder, J. P. (1997). Molecular design using theminireceptor concept. J. Chem. Inf. Comput. Sci. 37,812–818.

40. Johnson, M. A., Hoog, C. & Pinto, B. M. (2003). Anovel modeling protocol for protein receptorsguided by bound-ligand conformation. Biochemistry,42, 1842–1853.

41. Jalaie, M. & Erickson, J. A. (2000). Homology modeldirected alignment selection for comparativemolecular field analysis: application to photosystemII inhibitors. J. Comput. Aided Mol. Des. 14, 181–197.

42. Schafferhans, A. & Klebe, G. (2001). Dockingligands onto binding site representations derivedfrom proteins built by homology modelling. J. Mol.Biol. 307, 407–427.

43. Trosset, J. Y. & Scheraga, H. A. (1998). Reaching theglobal minimum in docking simulations: a MonteCarlo energy minimization approach using Beziersplines. Proc. Natl Acad. Sci. USA, 95, 8011–8015.

44. Vakser, I. A. (1996). Long-distance potentials: anapproach to the multiple-minima problem inligand–receptor interaction. Protein Eng. 9, 37–41.

45. Sotriffer, C. A., Klebe, G., Stahl, M. & Bohm, H. J.(2003). Docking and scoring functions/virtualscreening. Burgers Handbook of Medicinal Chemistry,vol. 1, chapt. 7, pp. 281–333, Wiley, New York.

46. Halperin, I., Ma, B., Wolfson, H. & Nussinov, R.(2002). Principles of docking: an overview of searchalgorithms and a guide to scoring functions. Pro-teins: Struct. Funct. Genet. 47, 409–443.

47. Ma, B., Kumar, S., Tsai, C. J. & Nussinov, R. (1999).Folding funnels and binding mechanisms. ProteinEng. 12, 713–720.

48. Bohm, H. J. (1998). Prediction of binding constantsof protein ligands: a fast method for the prioritiza-tion of hits obtained from de novo design or 3D data-base search programs. J. Comput. Aided Mol. Des. 12,309–323.

49. Bohm, H. J. (1996). Towards the automatic design ofsynthetically accessible protein ligands: peptides,amides and peptidomimetics. J. Comput. Aided Mol.Des. 10, 265–272.

Modelling Protein Binding-sites 343

Page 18: Ligand-supported Homology Modelling of Protein Binding-sites using Knowledge-based Potentials

50. Bohm, H. J. (1994). On the use of LUDI to search theFine Chemicals Directory for ligands of proteinsof known three-dimensional structure. J. Comput.Aided Mol. Des. 8, 623–632.

51. Bohm, H. J. (1994). The development of a simpleempirical scoring function to estimate the bindingconstant for a protein–ligand complex of knownthree-dimensional structure. J. Comput. Aided Mol.Des. 8, 243–256.

52. Bohm, H. J. (1993). A novel computational tool forautomated structure-based drug design. J. Mol.Recogn. 6, 131–137.

53. Bohm, H. J. (1992). LUDI: rule-based automaticdesign of new substituents for enzyme inhibitorleads. J. Comput. Aided Mol. Des. 6, 593–606.

54. Bohm, H. J. (1992). The computer program LUDI: anew method for the de novo design of enzymeinhibitors. J. Comput. Aided Mol. Des. 6, 61–78.

55. Kearsley, S. K. & Smith, G. M. (1990). An alternativemethod for the alignment of molecular structuresmaximizing electrostatic overlap. TetrahedronComput. Methodol. 3, 615–633.

56. Klebe, G., Mietzner, T. & Weber, F. (1994). Differentapproaches toward an automatic structural align-ment of drug molecules: applications to sterolmimics, thrombin and thermolysin inhibitors.J. Comput. Aided Mol. Des. 8, 751–778.

57. Klebe, G., Mietzner, T. & Weber, F. (1999). Methodo-logical developments and strategies for a fastflexible superposition of drug-size molecules.J. Comput. Aided Mol. Des. 13, 35–49.

58. Knegtel, R. M., Kuntz, I. D. & Oshiro, C. M. (1997).Molecular docking to ensembles of protein struc-tures. J. Mol. Biol. 266, 424–440.

59. Osterberg, F., Morris, G. M., Sanner, M. F., Olson,A. J. & Goodsell, D. S. (2002). Automated dockingto multiple target structures: incorporation of pro-tein mobility and structural water heterogeneity inAutoDock. Proteins: Struct. Funct. Genet. 46, 34–40.

60. Claussen, H., Buning, C., Rarey, M. & Lengauer, T.(2001). FlexE: efficient molecular docking con-sidering protein structure variations. J. Mol. Biol.308, 377–395.

61. Vajda, S., Sippl, M. & Novotny, J. (1997). Empiricalpotentials and functions for protein folding andbinding. Curr. Opin. Struct. Biol. 7, 222–228.

62. Jernigan, R. L. & Bahar, I. (1996). Structure-derivedpotentials and protein simulations. Curr. Opin.Struct. Biol. 6, 195–209.

63. Torda, A. E. (1997). Perspectives in protein-foldrecognition. Curr. Opin. Struct. Biol. 7, 200–205.

64. Sippl, M. J. (1995). Knowledge-based potentials forproteins. Curr. Opin. Struct. Biol. 5, 229–235.

65. Gohlke, H., Hendlich, M. & Klebe, G. (2000).Knowledge-based scoring function to predict pro-tein–ligand interactions. J. Mol. Biol. 295, 337–356.

66. Mitchell, J. B., Laskowski, R. A., Alex, A. &Thornton, J. M. (1999). BLEEP—potential of meanforce describing protein–ligand interactions.I. Generating potential. J. Comput. Chem. 20,1165–1176.

67. Mitchell, J. B., Laskowski, R. A., Alex, A., Forster,M. J. & Thornton, J. M. (1999). BLEEP—potential ofmean force describing protein–ligand interactions.II. Calculation of binding energies and comparisonwith experimental data. J. Comput. Chem. 20,1177–1185.

68. Muegge, I. & Martin, Y. C. (1999). A general and fastscoring function for protein–ligand interactions:

a simplified potential approach. J. Med. Chem. 42,791–804.

69. DeWitte, R. S. & Shakhnovich, E. I. (1996). Smog: denovo design method based on simple, fast, accuratefree energy estimates.1. Methodology supportingevidence. J. Am. Chem. Soc. 118, 11733–11744.

70. Gohlke, H., Hendlich, M. & Klebe, G. (2000).Predicting binding modes, binding affinities and“hot spots” for protein–ligand complexes using aknowledge-based scoring function. Persp. DrugDiscov. Des. 20, 115–144.

71. Sotriffer, C. A., Gohlke, H. & Klebe, G. (2002).Docking into knowledge-based potential fields: acomparative evaluation of DrugScore. J. Med. Chem.45, 1967–1970.

72. Gohlke, H. & Klebe, G. (2002). DrugScore meetsCoMFA: adaptation of fields for molecular compari-son (AFMoC) or how to tailor knowledge-basedpair-potentials to a particular protein. J. Med. Chem.45, 4153–4170.

73. Sali, A. & Blundell, T. L. (1993). Comparative pro-tein modelling by satisfaction of spatial restraints.J. Mol. Biol. 234, 779–815.

74. Fiser, A., Do, R. K. & Sali, A. (2000). Modeling ofloops in protein structures. Protein Sci. 9, 1753–1773.

75. Brooks, B. R. (1983). A program for macromolecularenergy, minimization, and dynamics calculations.J. Comput. Chem. 4, 187–217.

76. Bowie, J. U., Luthy, R. & Eisenberg, D. (1991). Amethod to identify protein sequences that fold intoa known three-dimensional structure. Science, 253,164–170.

77. Colovos, C. & Yeates, T. O. (1993). Verification ofprotein structures: patterns of nonbonded atomicinteractions. Protein Sci. 2, 1511–1519.

78. Dominy, B. N. & Brooks, C. L. (2002). Identifyingnative-like protein structures using physics-basedpotentials. J. Comput. Chem. 23, 147–160.

79. Eisenberg, D., Bowie, J. U., Luthy, R. & Choe, S.(1992). Three-dimensional profiles for analysingprotein sequence-structure relationships. FaradayDiscuss. 93, 25–34.

80. Laskowski, R. A., Moss, D. S. & Thornton, J. M.(1993). Main-chain bond lengths and bond anglesin protein structures. J. Mol. Biol. 231, 1049–1067.

81. Luthy, R., Bowie, J. U. & Eisenberg, D. (1992).Assessment of protein models with three-dimen-sional profiles. Nature, 356, 83–85.

82. Melo, F. & Feytmans, E. (1998). Assessing proteinstructures with a non-local atomic interactionenergy. J. Mol. Biol. 277, 1141–1152.

83. Simons, K. T., Ruczinski, I., Kooperberg, C., Fox,B. A., Bystroff, C. & Baker, D. (1999). Improvedrecognition of native-like protein structures using acombination of sequence-dependent and sequence-independent features of proteins. Proteins: Struct.Funct. Genet. 34, 82–95.

84. Wang, Y., Zhang, H. & Scott, R. A. (1995). A newcomputational model for protein folding based onatomic solvation. Protein Sci. 4, 1402–1411.

85. Wang, Y., Zhang, H., Li, W. & Scott, R. A. (1995).Discriminating compact nonnative structures fromthe native structure of globular proteins. Proc. NatlAcad. Sci. USA, 92, 709–713.

86. Gerber, P. R. (1998). Charge distribution from asimple molecular orbital type calculation and non-bonding interaction terms in the force field MAB.J. Comput. Aided Mol. Des. 12, 37–51.

87. Gerber, P. R. & Muller, K. (1995). MAB, a generally

344 Modelling Protein Binding-sites

Page 19: Ligand-supported Homology Modelling of Protein Binding-sites using Knowledge-based Potentials

applicable molecular force field for structure model-ling in medicinal chemistry. J. Comput. Aided Mol.Des. 9, 251–268.

88. Zar, J. H. (1999). Biostatistical Analysis, 4th edit.,Prentice Hall Inc., Upper Saddle River, NJ.

89. Maignan, S., Guilloteau, J. P., Pouzieux, S., Choi-Sledeski, Y. M., Becker, M. R., Klein, S. I. et al.(2000). Crystal structures of human factor Xacomplexed with potent inhibitors. J. Med. Chem. 43,3226–3232.

90. Sali, A. & Blundell, T. L. (1990). Definition of gen-eral topological equivalence in protein structures.A procedure involving comparison of propertiesand relationships through simulated annealing anddynamic programming. J. Mol. Biol. 212, 403–428.

91. Nar, H., Bauer, M., Schmid, A., Stassen, J. M.,Wienen, W., Priepke, H. W. et al. (2001). Structuralbasis for inhibition promiscuity of dual specificthrombin and factor Xa blood coagulation inhibi-tors. Structure (Camb.), 9, 29–37.

92. Guertin, K. R., Gardner, C. J., Klein, S. I., Zulli, A. L.,Czekaj, M., Gong, Y. et al. (2002). Optimization ofthe beta-aminoester class of factor Xa inhibitors.Part 2: identification of FXV673 as a potent andselective inhibitor with excellent in vivo anticoagu-lant activity. Bioorg. Med. Chem. Letters, 12,1671–1674.

93. Kamata, K., Kawamoto, H., Honma, T., Iwama, T. &Kim, S. H. (1998). Structural basis for chemical inhi-bition of human blood coagulation factor Xa. Proc.Natl Acad. Sci. USA, 95, 6630–6635.

94. Adler, M., Davey, D. D., Phillips, G. B., Kim, S. H.,Jancarik, J., Rumennik, G. et al. (2000). Preparation,characterization, and the crystal structure of theinhibitor ZK-807834 (CI-1031) complexed withfactor Xa. Biochemistry, 39, 12534–12542.

95. Brandstetter, H., Kuhne, A., Bode, W., Huber, R.,von der Saal, W., Wirthensohn, K. & Engh, R. A.(1996). X-ray structure of active site-inhibitedclotting factor Xa. Implications for drug designand substrate recognition. J. Biol. Chem. 271,29988–29992.

96. Howard, E., Sanishvili, R., Cachau, R. E., Mitschler,A., Chevrier, B., Barth, P. et al. (2003). Human aldosereductase—inhibitor complex at 0.66 A: experimen-tally observed protonation states and atomic inter-

actions have implications for the inhibitionmechanism. Proteins: Struct. Funct. Genet. 66. In thepress.

97. Schonbrun, J., Wedemeyer, W. J. & Baker, D. (2002).Protein structure prediction in 2002. Curr. Opin.Struct. Biol. 12, 348–354.

98. Moult, J., Fidelis, K., Zemla, A. & Hubbard, T.(2001). Critical assessment of methods of proteinstructure prediction (CASP): round IV. Proteins:Struct. Funct. Genet. Suppl. 5, 2–7.

99. Mitsutake, A., Sugita, Y. & Okamoto, Y. (2001).Generalized-ensemble algorithms for molecularsimulations of biopolymers. Biopolymers, 60, 96–123.

100. Kim, K. H. (1998). 3D QSAR. In Drug Design: RecentAdvances (Richardson, C. C., ed), vol. 3, pp.233–255, Kluwer Academic, Dordrecht, The Neth-erlands.

101. Levitt, M., Gerstein, M., Huang, E., Subbiah, S. &Tsai, J. (1997). Protein folding: the endgame. Annu.Rev. Biochem. 66, 549–579.

102. Petrella, R. J., Lazaridis, T. & Karplus, M. (1998).Protein sidechain conformer prediction: a test ofthe energy function. Fold. Des. 3, 353–377.

103. Xiang, Z. & Honig, B. (2001). Extending theaccuracy limits of prediction for side-chain confor-mations. J. Mol. Biol. 311, 421–430.

104. Stubbs, M. T., Reyda, S., Dullweber, F., Moller, M.,Klebe, G., Dorsch, D. et al. (2002). pH-dependentbinding modes observed in trypsin crystals: lessonsfor structure-based drug design. Chembiochem. 3,246–249.

105. Freire, E. (1998). Statistical thermodynamic linkagebetween conformational and binding equilibria.Advan. Protein Chem. 51, 255–279.

106. Ma, B., Shatsky, M., Wolfson, H. J. & Nussinov, R.(2002). Multiple diverse ligands binding at a singleprotein site: a matter of pre-existing populations.Protein Sci. 11, 184–197.

107. Tsai, C.-J., Ma, B. & Nussinov, R. (1999). Foldingand binding cascades: shifts in energy landscapes.Proc. Natl Acad. Sci. USA, 96, 9970–9992.

108. Hodgkin, E. E. & Richards, W. G. (1987). Molecularsimilarity based on electrostatic potential and elec-tric field. Int. J. Quantum Chem.: Quant. Biol. Symp.14, 105–110.

Edited by J. Thornton

(Received 1 July 2003; received in revised form 12 September 2003; accepted 12 September 2003)

Modelling Protein Binding-sites 345