outstanding challenges in protein-ligand docking and structure-based virtual screening

Advanced Review

Outstanding challenges inprotein–ligand docking andstructure-based virtual screeningBohdan Waszkowycz, David E. Clark∗ and Emanuela Gancia

With an ever-increasing number of protein structures being solved by X-ray crys-tallography, the use of protein–ligand docking algorithms to assess candidateligands for a binding site has become commonplace. In particular, over the lastdecade, high-throughput docking has been widely applied to the virtual screeningof large chemical databases for supporting hit-finding programs in drug discov-ery. However, the techniques and practice of protein–ligand docking in general,and of structure-based virtual screening in particular, are still evolving and sig-nificant limitations remain to be addressed. In this review, we seek to highlightsome of the active areas of research and debate in this promising, but challenging,field. C© 2011 John Wiley & Sons, Ltd. WIREs Comput Mol Sci 2011 1 229–259 DOI: 10.1002/wcms.18

INTRODUCTION

P rotein–ligand docking has become one of themost widely used tools in modern computer-

aided drug design. This reflects the successful ap-plication of protein structural data, as generated byX-ray crystallography and nuclear magnetic reso-nance (NMR) spectroscopy, to support the rationaldesign and optimization of novel drug candidates—anapproach commonly known as structure-based drugdesign.1 Although these techniques are not readily ap-plicable to every protein, for many targets they pro-vide invaluable information on novel and unexpectedprotein–ligand binding modes and highlight the essen-tial interactions shared by chemically diverse ligands(Figure 1).

Given the technical challenges of protein crystal-lography, in silico docking methods are routinely em-ployed in the drug design process to predict the mostlikely binding mode of a small molecule at a particularreceptor, to explore the specific interactions that maybe formed, and to estimate the ligand binding affinity.Docking has become a mainstay in virtual screening(VS) of large chemical databases, and, in recent years,has also been applied to much broader problems suchas predicting the potential protein targets of a partic-ular molecule (‘inverse docking’2).

∗Correspondence to: [email protected]

Argenta, 8/9 Spire Green Centre, Flex Meadow, Harlow CM195TR, UK

DOI: 10.1002/wcms.18

A large number of protein–ligand docking meth-ods are available from academic groups or commer-cial software vendors, and it is not possible in thisarticle to describe all of these in detail (many compre-hensive reviews have been published recently3–7). Inthis review, we offer a broad survey of the methodsemployed, their strengths and failings, and discusspractical considerations in the application of thesetechniques, particularly in the field of virtual screen-ing. Subsequent sections focus on selected topics thatare the subject of intense research, such as improve-ments to scoring and analysis procedures, handling ofprotein flexibility, and validation of docking perfor-mance. The final section presents some recent exam-ples from the literature of successful applications ofdocking to lead identification and optimization.

GENERAL APPROACHES ANDPRACTICAL CONSIDERATIONS

Basic Concepts and AssumptionsProtein–ligand docking methods have two main com-ponents: a searching algorithm to generate plausi-ble three-dimensional (3D) configurations of a lig-and bound to a protein and a scoring (or fitness)function to evaluate the quality of the protein–ligandinteraction.4,8 By way of terminology, the combina-tion of a specific ligand conformation and its orienta-tion with respect to the protein is generally referredto as a pose. Most modern docking programs aim todock a single ligand to a single protein in a timescale of

Volume 1, March /Apr i l 2011 229c© 2011 John Wi ley & Sons , L td .

Advanced Review wires.wiley.com/wcms

N

H O

N

S

OO

N

H

MeH

N

NN

N

HNH2

O

N

N

NH

OH

OMe

OMeN

N

N

H

Me

O

N

NMe

1KE5(blue)

1E1V(purple)

1DI8(green)

1OIQ(yellow)

FIGURE 1 | Crystal structures of a selection of cyclin-dependent kinase 2 (CDK2) inhibitors from the Protein Data Bank (PDB), exemplifying howa diverse range of chemotypes bind to the ATP-binding site. For clarity, only a single receptor structure is shown (cream carbon atoms), with asolvent accessible surface colored by electrostatic potential.

minutes. In virtual screening applications, where verylarge databases of the order of 104–106 molecules areto be docked, it is desirable to reduce the timescalefor docking each ligand to the order of seconds. Asit is not feasible to perform a fine-grained systematicsearch of all degrees of freedom, it is essential to havean efficient optimization method coupled with a fastscoring function in order to ensure that adequate ex-ploration of the search space has been achieved.

As a practical limit on the search space, the tar-get binding site is generally predefined by the user,rather than have the docking algorithm search theentire protein surface.7,9 In most drug design appli-cations this is a valid assumption, as the binding siteis broadly known from structural considerations, al-though there are instances where the modeler mayneed to search more widely for novel binding sites.The other major assumption to facilitate searching isto treat the receptor model as conformationally rigidduring docking, although progress is now being madetoward introducing a limited degree of flexibility (see‘Incorporation of Protein Flexibility During Dock-ing’ section). As some degree of protein flexibility andinduced fit is known to be a common feature of lig-and binding, the use of a rigid receptor model maybe seen as one of the most severe approximationsin current docking methods. It is also usual to re-move solvent molecules from the active site, althoughthere are many instances where water-mediated inter-actions are critical to ligand binding (see ‘Treatmentof Active Site Water Molecules’). Although the major-ity of docking methods now search ligand conforma-

tions on-the-fly, some programs favor rigid dockingof a set of pregenerated conformers for each ligand(see ‘Ligand Preparation’).

Although the scoring function serves primarilyas a fitness function for the searching algorithm (i.e.,to identify the most favored pose generated for anindividual ligand), it also provides an estimate of thebinding affinity, and as such is used to rank multipleligands in virtual screening. It is common for differentscoring functions to be applied to these two tasks, inrecognition of the difficulty in achieving accuracy inthe prediction of binding affinity.10–12 Scoring func-tions cover a wide range of methods, from classicalmolecular mechanics through to empirically derivedenergy functions, typically describing hydrogen bond-ing, hydrophobic and entropic contributions to ligandbinding (see ‘Scoring Functions’). Improved parame-terization of scoring functions is an active area ofresearch, as is the development of more computation-ally intensive rescoring procedures (see ‘Postprocess-ing and Analysis of Results’).

Application of Docking Within a DrugDiscovery EnvironmentWith the introduction of fast protein–ligand dock-ing programs and relatively cheap workstations andclusters in the 1990s, the modeling community wasquick to apply these methods to the screening of largechemical databases. Today, structure-based virtualscreening has become a commonplace task in industryand academia to support compound acquisition and

230 Volume 1, March /Apr i l 2011c© 2011 John Wi ley & Sons , L td .

WIREs Computational Molecular Science Outstanding challenges in protein–ligand docking and structure-based virtual screening

Ligand database preparation-drug-like property profilling

-3D model generation

Ligand–receptor docking-multiple docking strategies-multiple receptor models

Analysis-visualization, sorting, filtering

-selection of favored compounds

Post processing-rescoring , clustering,

binding mode classification

Receptor model preparation-selection of X-ray/homology model

-structural checking/refinement

FIGURE 2 | Schematic of a typical workflow in docking and virtual screening.

library synthesis programs across a broad range ofbiological targets. However, it is worth bearing inmind that docking is widely used in a variety of otherdrug discovery scenarios. For example, during leadoptimization, docking may be applied to prioritizesmall sets of designs in terms of binding modes andbinding affinities to the target receptor, or to assessoff-target binding to structurally related proteins orto drug-metabolizing enzymes.1,13

These various project applications place differ-ent demands on the performance of the docking pro-gram. In virtual screening, the emphasis is typicallyon speed rather than accuracy in order to dock large,chemically diverse databases, and thus identify inter-esting compounds for experimental screening. Underthese circumstances, the objective is usually to findsome, rather than all, possible hits and to identify adiverse set of chemotypes rather than many examplesof a few chemotypes. To achieve this, the modelermay choose the fastest available docking protocol, asingle model of the protein, and a simple scoring andselection strategy. The quality of pose prediction andbinding affinity prediction may be far from perfect aslong as a subset of compounds can be chosen thatreturns a satisfactory number of experimental hits.

By contrast, in a lead optimization scenario, amodeler is more interested in obtaining as accuratea docking as possible for a small number of homol-

ogous compounds. This may necessitate comparingseveral docking programs or scoring functions, ana-lyzing many poses per ligand, and docking to severalmodels of the receptor. It is important that consistentdockings are obtained across a homologous series,so that deviations in pose are a reliable reflectionof structure–activity relationships (SAR) and are notmerely artefactual. In particular, far more elaboratescoring procedures may be applied in order to achievea more reliable estimate of relative binding affinities.

In all practical applications, docking is closelyintegrated with a wide range of other molecular mod-eling and cheminformatics methods. This is exempli-fied by the flow chart in Figure 2. For example, dock-ing methods require at least one representative modelof the protein receptor, which has been appropriatelyprepared, typically from an X-ray crystal structure.Likewise, ligand structures are often extracted from atwo-dimensional (2D) electronic database and requireseveral stages of cheminformatic processing and filter-ing to yield a drug-like set of 3D models for docking.Following docking, high-scoring poses may be ana-lyzed and rescored using a variety of simulation anddata mining methods. These topics are reviewed indepth in subsequent sections.

It is also common practice to run structure-based virtual screens alongside ligand-based virtualscreens, e.g., by assessing similarity to known ligands



using substructural fingerprints, pharmacophores, orshape similarity, as these methods have been reportedto have higher enrichments and to retrieve differentchemotypes compared with docking.14,15 The inte-gration of these various modeling and virtual screen-ing procedures has been facilitated in recent years bythe application of data pipelining software.16 Theseproducts facilitate the rapid assembly and prototyp-ing of complex workflows and support integration ofweb services and distributed computing resources. Assuch, they may become more widely used for extend-ing the scope of docking experiments and enablingmore routine use of complex analysis procedures.

OVERVIEW OF COMMONLY USEDDOCKING METHODS

A large number of docking programs have been de-scribed in the literature—Moitessier et al.5 referencemore than 60 docking programs and over 30 scor-ing functions. A very wide range of algorithms havebeen implemented for the core tasks of the genera-tion and scoring of ligand poses within the receptorbinding site. As docking is basically an example ofan optimization problem, many different searchingalgorithms can be applied to the task of identifyingthe optimum (highest scoring) pose as efficiently aspossible. These range from general stochastic opti-mization methods, such as simulated annealing andevolutionary algorithms, through to more systematicor directed searching procedures.4–6

The initial stages of docking may be drivensolely by the scoring function or by some form offeature matching within the binding site and may befollowed by local geometry refinement of the protein–ligand complex. In many programs, the extent of thebinding site is defined by the user. For example, inGlide,17 a 3D box encompassing the binding site istypically generated from a representative co-crystalstructure of the protein complexed with a ligand. Thebox defines the size of the energy grids, which are usedto precalculate components of the docking score; aninner grid box is also used to limit the placement ofthe center of mass of the docked ligand. The box di-mensions can be inspected and readily adjusted via agraphical interface.

Ligand PlacementMany docking methods apply a feature-matching al-gorithm to drive the initial placement of the ligandinto the binding site. This approach is conceptuallyattractive because it promotes the efficient localiza-

tion of the search space toward structural features inthe binding site that are likely to favor ligand binding.Thus, DOCK18,19 fills the binding site with spheres,which are subsequently clustered to define the shapeand extent of the pocket. Ligand placement is pri-marily shape driven, achieved by matching distancesbetween sphere centroids with interatomic distanceswithin the ligand.

Other programs characterize the binding site interms of accessible chemical features in the proteinactive site, e.g., hydrogen bonding, hydrophobic, aro-matic, and metal sites. In FlexX,20,21 these featuresare constructed geometrically around receptor atomsand are then used to guide the placement of a ligandfragment, e.g., by matching the geometry of at leastthree complementary features in the ligand. In a sim-ilar approach, Surflex22 constructs an idealized lig-and (a ‘protomol’) by mapping out favorable bindingsites for small chemical probes (e.g., methyl, amino,and carbonyl groups), and this is subsequently usedto evaluate overall similarity to the docked ligand.GOLD23,24 and AutoDock25,26 are based on geneticalgorithms (GAs) in which the chromosomes encodeboth the orientational and conformational degrees offreedom of the ligand, which are optimized by thestandard GA operators of crossover and mutation.The positioning of the ligand is encoded as a matchbetween a particular hydrogen bonding or hydropho-bic site in the ligand and a complementary site in theprotein.

By contrast, in other docking programs, theplacement of the ligand is driven solely by the scoringfunction, rather than directed toward specific interac-tion sites. Thus, Glide17 is an example of a systematicplacement strategy: the ligand is positioned system-atically on each point of a grid box encompassingthe binding site and a hierarchical series of shape fil-ters and coarse-grained scoring functions is appliedto reject unpromising poses quickly. Surviving posesare then refined and scored more comprehensively.ICM27 and LigandFit28 apply variants of Monte Carlosimulated annealing procedures to achieve efficientsampling of conformational space, followed by localgeometry refinement.

In practical applications, it can be beneficialto exploit any prior knowledge of preferred bind-ing interactions or privileged ligand chemotypes todirect the docking, provided this is not so restrictivesuch that, e.g., no novel structures are found duringvirtual screening. Thus, specific receptor hydrogen-bonding sites or pharmacophore features can be de-fined as constraints to be matched by the ligandduring docking. One common example is to con-strain one or more of the highly conserved hinge



hydrogen bonding interactions of kinase inhibitors.29

When applied to a virtual screen, this can significantlyreduce the number of hits by ensuring that all ligandsachieve a plausible contact to the hinge region (al-though at the potential risk of missing ligands thatachieve a more unusual binding mode). More gener-alized methods for substructure placement (typicallyencoded via SMARTS pattern matching) can be par-ticularly useful for docking combinatorial libraries,where it is reasonable to constrain the scaffold tobind in a consistent pose, so that the conformationalspace of the various substituents can be explored moreefficiently.

Ligand Conformational SamplingFrom the ligand-based viewpoint, there are two broadclasses of approach to conformational sampling,which might be termed ‘off-line’ and ‘on-the-fly’. Inthe former, conformational searching is carried outas a separate exercise prior to docking and the result-ing conformers are then docked ‘rigidly’ (i.e., withoutfurther conformational exploration) into the proteinbinding site. This class of docking algorithm is ex-emplified by FRED30 and FLOG.31 The advantageof this approach is that rigid docking is very fast,despite having to dock multiple conformations foreach ligand. In addition, the modeler is not restrainedby the conformational sampling algorithm imple-mented within the docking package, but can constructa conformer database in a way that is most appro-priate for the ligands in question. A disadvantage isthat a suitably large number of conformations are re-quired to ensure that the likely binding conformationis present, as no fine tuning of the conformation ispossible during docking.

The second class of docking algorithms exploresthe ligand’s conformational space as an integral partof the docking process and represents the most com-mon type of docking program encountered. ‘On-the-fly’ methods can be further subdivided into twogroups. In the first, the conformational explorationis carried out using an incremental build-up ap-proach (as in, e.g., FlexX20,21 and later versions ofDOCK18,19) in which the ligand is assembled piece-wise in the active site starting from a ‘seed’ or ‘base’fragment that is docked first. Alternatively, the ligandis treated as a whole and the conformational search iscarried out using either a stochastic searching method,such as a GA (e.g., GOLD23,24) or simulated anneal-ing (e.g., ICM27), or a more systematic rotamer sam-pling procedure (e.g., Glide17).

To the authors’ knowledge, there has been onlylimited comparison of on-the-fly and off-line confor-

mational exploration in the context of structure-basedvirtual screening.32 It seems that, as with 3D databasesearching, in which a similar dichotomy prevails(compare, for instance, Tripos’ Unity33—on-the-flysampling—with Accelrys’ Catalyst34—precomputedconformers), both are used with success. One advan-tage of on-the-fly exploration is that it can provide ameans by which a program can be ‘tuned’ to differenttasks. For instance, when docking just one ligand, ora few ligands, it may be appropriate to spend moretime in the conformational search. However, whendealing with a large set of compounds, such as in vir-tual screening, a more limited exploration is required.Schrodinger’s Glide program17 represents one exam-ple of this kind of approach, with the XP (Extra Pre-cision) version of the program carrying out a morethorough, and time-consuming, sampling than the SP(Standard Precision) or HTVS (High-Throughput Vir-tual Screening) versions.

It is generally reckoned that ligand conforma-tional sampling is handled fairly well (at least formolecules with up to about eight rotatable bonds32)and the blame for docking failures is often laid atthe door of the scoring function. An analysis carriedout at Glaxo Smith Kline (GSK) demonstrated thatin many cases, the docking algorithm generates goodposes (i.e., ones that resemble the presumed bioac-tive conformation), but that the scoring function failsto assign the highest rank to the best pose for eachcompound.4,35

SCORING FUNCTIONS

If the purpose of the docking algorithm is to gener-ate a ligand pose that resembles the bioactive bind-ing mode, then the job of the scoring function is torecognize such a pose from among the many otherscreated during the search (the ‘decoys’) and to assigna high score to it, so that it is likely to be examinedby the user. The scoring function should also pro-vide a measure of the binding affinity for the ligandin question.12 The scoring function should be able todistinguish between known ligands for a target andnonbinders and, ideally, produce a correct ranking ofknown ligands in terms of binding affinity.11 Histor-ically, two broad classes of scoring function can beidentified.

Force Field-based Scoring FunctionsThe earliest docking programs used simple geometricchecks to assess shape complementarity of the lig-and and binding site (as in DOCK’s Contact Score).



These were supplemented by the nonbonded terms ofmolecular mechanics force fields to provide a morecomplete description of the electrostatics and van derWaals attractive/repulsive interactions between theligand and the receptor (e.g., DOCK’s Energy Score).These ‘physics-based’ approaches are conceptually at-tractive because they are based on widely used po-tential energy functions, which have been extensivelyparameterized and tested in the context of protein–ligand simulations. As such, they are unlikely to bebiased toward any particular protein family or lig-and chemotype. Recently, more sophisticated treat-ments of electrostatics, desolvation, and conforma-tional entropy have been included. An example of amodern force field-based scoring function (‘Medusa-Score’) has been described by Yin et al.36 Medusa-Score couples traditional van der Waals interactionterms with an explicit hydrogen-bonding model anda pairwise implicit solvation model and showed supe-rior performance when compared with 11 other scor-ing functions. However, the approach performs lesswell with larger ligands perhaps due to the accumula-tion of force field errors and the failure to account forthe phenomenon of enthalpy–entropy compensation.

Empirical Scoring FunctionsEmpirical scoring functions were developed as an al-ternative to the force field-based approaches in thehope of exploiting the growing number of protein–ligand complexes solved by X-ray crystallography.They typically represent ‘softer’ energy functions thanthose used in molecular mechanics and therefore maybe more tolerant of small clashes or suboptimal in-teraction geometries in the receptor–ligand complex.Two types of empirical scoring function can be dis-tinguished.

Regression-based Scoring FunctionsRegression-based scoring functions are created by re-lating experimental ligand binding affinities to des-criptors derived from the appropriate protein–ligandcomplex (e.g., relating to hydrogen bonding and non-polar surface burial). The earliest example of thistype of scoring function was published by Bohm37

and perhaps the most popular has been the Chem-Score function,38 which has been implemented invarious docking programs. In ChemScore, the bind-ing energy is evaluated as a simple function of thehydrogen-bonding geometry and lipophilic contactarea between ligand and receptor, with the addi-tion of a penalty term representing the loss of con-formational entropy on binding. Other scoring func-tions use additional terms to represent effects such as

desolvation and inappropriate burial of polar atomtypes.11 If a regression-based function is to be usedto guide a docking algorithm (as well as to score adocked pose), it is necessary that an ad hoc clash termis added, so that unfavorable steric interactions arepenalized.

Regression-based methods are fast to compute,but the parameterization of the energy terms is limitedby the availability of appropriate ligand binding data,as well as issues concerning data consistency whenactivity data are compiled from different sources. Anadditional limitation is that the protein–ligand com-plexes that underpin regression-based methods donot typically contain many unfavorable intermolec-ular contacts and so, by default, scoring functionsderived from them have no way of penalizing suchcontacts when they appear in docked poses. Finally,regression-based methods require the specification ofthe molecular interaction terms to be used, which canonly be as complete as our understanding of molecu-lar recognition.

Knowledge-based Scoring FunctionsThe knowledge-based approach is also based on ex-perimental protein–ligand complexes but avoids theuse of ligand binding affinities. Instead, the distribu-tions of distances between pairs of ligand and pro-tein atom types (e.g., aromatic carbon in ligand toaliphatic carbon in protein) are collated and theninverted to generate a potential of mean force (pi-oneered in the field of atom–atom potentials for guid-ing protein folding simulations). The basic supposi-tion is that if a particular ligand atom is found at acertain distance from a particular protein atom veryfrequently, this is likely to represent a favorable inter-action. The two best known instances of knowledge-based scoring functions are PMF39 and DrugScore.40

In principle, knowledge-based scoring functions arevery attractive because they do not rely on bind-ing affinity data or the prescription of any particularfunctional form. Nonetheless, they have not provenas successful or as popular as the regression-basedfunctions suggesting that, as yet, the available datasetof protein–ligand complexes does not capture all theinformation about binding required for an accuratescoring function.

Limitations of Current Scoring FunctionsThe present generation of scoring functions gener-ally performs well in identifying plausible poses ofan individual ligand—e.g., in retrospective studiesacross a diverse set of protein targets, the best docking



FIGURE 3 | Illustration of distribution ofdocking scores in a virtual screening experiment.Left: The ideal scenario, where the small numberof actives (blue) are well separated from the muchlarger number of inactives (red) – ranking ondocking score readily separates the majority of theactives with little contamination from inactives.Right: The more usual situation encountered in VSapplications: the actives significantly overlap withthe highest-scoring inactives. In this case, rankinggives an enrichment of actives but not completeseparation.

packages can typically reproduce 70–80% of the crys-tallographic binding modes to within 2 Å root meansquare deviation (RMSD).41 The more fundamentalproblem is that, despite intense research (e.g., thework of Friesner et al.42), scoring functions performpoorly in ranking compounds correctly in order ofactivity for the target in question, let alone in pre-dicting their absolute free energies of binding.10,35 Itis true that it is possible to obtain good enrichmentfactors, which may be sufficient for virtual screen-ing, but for docking in later phases of drug discovery,the inability of most functions to differentiate reli-ably between nanomolar and micromolar ligands isseverely limiting. Even within a virtual screening sce-nario, compounds of interest (particularly those withlow molecular weight) are often placed a long waydown the ranking and are easily missed within thelarge number of ‘false positives’ (Figure 3). It is forthis reason that there has been great interest in recentyears in more computationally expensive methods forrescoring the docked poses (see later section).

There are many reasons for the limited successof scoring functions in predicting binding energies.Empirical functions are intentionally simple in form(for speed of computation) and relatively ‘soft’ (tocompensate for the lack of receptor flexibility). Manypenalty terms (e.g., steric and electrostatic clash, inter-nal ligand strain) are difficult to parameterize accu-rately. As a result of these various approximations,scoring functions have a tendency to be too gen-erous in rewarding implausible ligands, leading tomany false positives. In addition, it is common prac-tice in virtual screening to use a single rigid recep-tor model to score the ligand poses, thereby neglect-ing the role of induced fit in optimizing hydrogenbonding and hydrophobic contacts. Finally, it mustbe recognized that our understanding of the com-plex processes of molecular recognition is far fromcomplete.43 In particular, entropy and desolvationare difficult to treat accurately even within a rigorousmolecular mechanics formalism. As more X-ray struc-

tures of protein–ligand complexes are solved, aware-ness is growing of previously unappreciated kinds ofmolecular interactions that can contribute to bind-ing affinity (e.g., halogen bonds44 and aromatic C Hhydrogen-bond donors45). In addition, the potentialfor structural artefacts in protein–ligand complexesto influence scoring function behavior is beginning tobe appreciated.46

Current Avenues of ResearchGiven the current perception that scoring functionsremain the ‘Achilles’ Heel’ of structure-based virtualscreening, there is clearly a need for continued mo-mentum in research in this area.5 The desiderata for anext generation scoring function have been delineatedby Pearlman et al.47 In their view, the characteristicsof an ideal scoring function are that it:

• be simple and intuitive in form;

• be applicable (without reparameterization) tomultiple target systems;

• be able to be evaluated rapidly for any poten-tial ligand;

• takes account of binding site flexibility.

These are clearly demanding goals and Pearl-man et al.47 describe their attempts to meet them inthe development of the FURSMASA scoring function,based on an MD-averaged potential energy grid in-corporating a desolvation term. The search for greatersimplicity in scoring functions is a theme that can alsobe found in other work.48 The NScore (Naıve Score)function is purely physics-based and uses no exper-imental ligand–protein binding or structural data inits derivation. Surprisingly, in tests, it performed atleast as well as several more complex empirical func-tions. The advantage of not being based on a par-ticular training set should make the NScore functionmore transferable between different targets than typ-ical empirical functions.



Given the ‘target-family’ approach to drug dis-covery adopted by many companies in recent years, apractical alternative to developing ‘universal’ scoringfunctions is to develop scoring functions that are spe-cific for a given target class.49,50 Another pragmaticapproach pioneered by Charifson et al.51 is the use of‘consensus’ scoring functions, seeking to exploit thecomplementary strengths of several different scoringmethods. This approach has some potential for im-proving scoring function performance,52 but cannotbe viewed as a panacea or the ultimate solution tothe problem.53 It is still the case that some scoringfunctions perform better on some targets than othersand, with the wide array of functions currently avail-able, selecting the right combination is a nontrivialtask. Another potential hazard is that scoring func-tions may be correlated so that rather than cancelingout each other’s weaknesses, the combination actuallyaccentuates errors.54

MODEL PREPARATION—TREATMENT OF LIGANDSAND PROTEINS

The preparation of both the receptor model and theligand database can have a major impact on the out-come of a virtual screening campaign. The followingsection describes the typical tasks to be performedand important considerations to be borne in mind.

Ligand PreparationThe ligand structures to be used in a virtual screeningcampaign may originate from various sources, suchas a corporate database or vendors’ electronic cat-alogues. In most cases, only 2D coordinates will beavailable, possibly augmented by some stereochemi-cal information. Given that the molecular recognitionof a ligand by a protein crucially depends on shapeand electrostatic complementarity, it is vital that theligand structures are prepared correctly prior to theiruse in VS.

• Filtering on drug-like properties—In most, al-though not all, cases, ligand databases are fil-tered using drug-like physicochemical prop-erty profiles prior to docking to eliminatemolecules unlikely to be successful as drugcandidates. Many calculated properties can beapplied, e.g., limits on the range of molecularweight, log P, hydrogen-bond donor/acceptorcounts, rotatable bond counts, and the num-ber of chiral centers.55 In addition, poten-tially reactive and toxic chemotypes can be

removed. Filtering can be tuned to the prop-erties of the target protein, e.g., guided byproperty profiles of known ligands or 3Dpharmacophores generated from the targetreceptor.56

• 3D structure generation—During the late1980s and early 1990s, a great deal of re-search was invested in fast, rule-based meth-ods for 2D to 3D structure conversion.57 Pro-grams such as Concord58 and Corina59 weredeveloped with the aim of generating a single,low-energy conformer for use in 3D databasesearching. More recently, other methods suchas Omega60 and LigPrep61 have been devel-oped. As docking algorithms do not varybond angles or bond lengths during the sam-pling process, these are presumed to be cor-rectly set by the structure-generating code.

• Enumeration of stereoisomers—Althoughnearly half of marketed drugs62 and, typ-ically, over 70% of compounds in vendorcatalogues63 are achiral, the remainder re-quire their stereochemistry to be taken intoaccount during ligand preparation. At firstsight, this might appear to be a relativelystraightforward task: simply enumerate theunspecified stereocenters. However, this naıvetreatment, as well as leading to unneces-sarily large database sizes, also hides somerather complex issues, particularly regardingthe specification of relative, rather than abso-lute, stereochemistry in the 2D input file.62

• Enumeration of tautomers—Numerouschemical groups may undergo prototropictautomerism and different tautomers maydiffer in their shape and hydrogen-bondingpattern.64,65 It is thus important in prepa-ration for virtual screening that the possibletautomeric forms of each ligand are enumer-ated. This should minimize the risk of falsenegatives, i.e., missing a hit; however, it canalso lead to false positives. Thus, the problemextends beyond simply enumerating thepossibilities to an evaluation of the stabilityof each tautomer and thus its likelihoodof existence at the specified pH value.64

Methods that begin to address this level ofcomplexity are starting to emerge, such asMoKa.66

• Enumeration of protomers—Finally, ligandswith ionizable centers need to be correctly(de)protonated depending on the specifiedpH. Whilst this might usually be presumed to



be the physiological pH (7.4), in some cases,target-dependent values may be adopted. Forinstance, in a study with beta-secretase 1(BACE1), protomers were enumerated at pHvalues of 6.6 (at which the crystals of the pro-tein were grown) and 4.5 (at which the opti-mal beta-secretase 1 (BACE1) activity in theassay can be obtained).67

• Conformational sampling—As discussed ear-lier, most docking algorithms explore the lig-and’s conformational degrees of freedom on-the-fly, whereas others such as FRED30 andFLOG31 require pregenerated conformer li-braries, which are docked rigidly. It shouldbe noted that even docking programs whichincorporate ligand flexibility do not nec-essarily handle all cases of conformationalexpansion—e.g., puckering of saturated rings,cis/trans rotation of vinyl or amide bonds, orinversion of chiral centers and sp3-hybridizednitrogen atoms. Therefore, these systemsmay need to be generated explicitly dur-ing database construction. For the interestedreader, a review of conformational searchingin general has been published recently.68

Impact of Ligand Preparationon Docking SuccessTaking all of the above factors into consideration, it isclear that unless a database compound is achiral, hasno ionizable centers and no tautomeric capabilities,it may give rise to several structures, each of whichrequires docking. In our hands, it is not unusual fora database at least to double in size during ligandpreparation. Although it intuitively makes sense toprepare databases taking account of the issues above,there have been relatively few studies that have in-vestigated these matters systematically or quantita-tively. In studies using the oestrogen receptor, Knoxet al. showed that the incorporation of stereochemi-cal, protonation, and tautomeric information actuallyled to a ‘seeded’ compound having a lower rank whenscreened in a set of 10,000 drug-like compounds.Conformational expansion was much more impor-tant in this case.69 On the basis of a larger set of testcases, ten Brink and Exner70 concluded that the useof all stereoisomers should be recommended whileconceding that inclusion of different protomers pro-duced slightly lower success rates in docking and en-richments in virtual screening. More recently, a studyusing AutoDock with 19 targets concluded that com-parable enrichment can be obtained by the use of a

single predicted form, instead of a fully enumeratedset of all possible tautomers and protomers.71 In thiscase, both the single predicted form and the full enu-meration were calculated by the MoKa program.66

The need for improvement in conformationalsampling methods during docking has been high-lighted by a recent publication, which showed thatdocking programs are strongly dependent on the in-put conformation of the ligands under study.72 Thiswould not be the case if sampling was perfect. Theauthors of the paper suggest that the effects of thisphenomenon can be mitigated to some extent by us-ing multiple (more than five) input conformations.Nonetheless, this remains a worrying result for prac-titioners of docking and one which will hopefully pro-vide an impetus to further research into conforma-tional sampling methods. Thus, as with many aspectsof virtual screening, the preparation of ligands is not acompletely solved problem. Further discussion of thistopic can be found in a number of reviews.4,73,74

Protein PreparationThe preparation of the protein structure into whichthe potential ligands are to be docked is equally as im-portant as the correct treatment of the ligands them-selves. Typically, the modeler begins with one or morestructures of the target protein obtained by X-raycrystallography. It is important to be aware of thelimitations of the models derived by X-ray crystal-lography, and these have been helpfully detailed inrecent years by Davis et al.75,76 In particular, the fol-lowing points should be checked, perhaps with theaid of one of the software tools that have emerged forthis purpose, e.g., WHAT CHECK77 or the ProteinPreparation Wizard from Schrodinger.78

• Structural integrity—Is the structure completeor are there missing residues or segments, par-ticularly in loop regions? Often, these will notbe important, but if a loop forms part of thebinding site region, then some careful mod-eling to predict possible conformations willbe required. As hydrogen atoms are not ob-served in the majority of X-ray experiments,these may need to be added and the correctbond orders assigned to protein residues andany ligands or cofactors.

• Protonation and tautomeric states of ioniz-able residues—Usually, it is fairly straight-forward to make correct decisions about thelikely protonation states of Asp, Glu, Arg,and Lys residues, with acids typically being



deprotonated and the bases protonated. His-tidines, being weaker bases, often presentmore of a challenge and, in the active siteregion, need to be considered on a case-by-case basis taking into account their immedi-ate environment. The possibility of differenthistidine tautomers adds to the complexity.

• Orientation of Asn and Gln residues—Usually, X-ray crystallography cannot unam-biguously assign an orientation to the side-chain amides of Asn and Gln residues. Quiteoften, the orientation observed in the ‘raw’X-ray structure is suboptimal and ‘flipping’ aresidue (to switch the positions of the amideO and N atoms) can lead to a more com-plete hydrogen-bonding network in the pro-tein and/or change the ligand-binding possi-bilities.

• Active site water molecules—These need to beassessed one-by-one to try to determine if theyare real or just an artefact of crystallographicrefinement.75 Consideration of the B factorsand occupancies is helpful here, as is an in-spection of the number of hydrogen bondsmade by a particular water molecule. If a wa-ter molecule makes no interactions with otherwaters or protein atoms, it is very unlikely tobe an integral part of the active site and maywell be an artefact. Similar conclusions maybe drawn for water molecules with low occu-pancies or with B factors that are significantlygreater than the surrounding protein residues.The treatment of water molecules is a signifi-cant challenge in protein–ligand docking andis considered more fully in a later section.

• Geometry refinement of the protein–ligandcomplex—It is common to include some de-gree of protein–ligand geometry refinementusing molecular mechanics methods. Thisis particularly useful for refining the posi-tion of hydrogen atoms that are not ob-served in the X-ray structure, especially onhydroxyl groups and water molecules. Itmay also be required to correct small arte-facts in the protein or ligand geometry, e.g.,strained bond lengths or angles or intermolec-ular steric clashes. However, there is a riskthat minimization introduces a bias towardthe complexed ligand, e.g., by improving thehydrogen-bonding contacts and steric com-plementarity, and therefore some practition-ers prefer to retain the original X-ray coordi-nates.

INCORPORATION OF PROTEINFLEXIBILITY DURING DOCKING

Ligands often induce a conformational change in theprotein upon binding (i.e., ligand-induced fit). Com-parisons between apo and holo structures across avariety of proteins in the PDB show that side chainand backbone movements upon ligand binding arequite common; in addition, different ligands can in-duce different changes in the same protein (a reviewof conformational changes caused to protein struc-tures by ligand binding and their effects on bindingaffinities has been published recently79) (Figure 4). Ig-noring protein flexibility is clearly one of the majorapproximations of standard docking programs thatis responsible for poor performance in cross-dockingexperiments (it has been estimated that using a sin-gle rigid receptor could lead to incorrect binding poseprediction for about 50–70% of all ligands80). Overthe years, several approaches have been proposed toaddress the issue of protein flexibility, and in-depthreviews of the subject have been published.81,82 Theavailable approaches can be broadly classified intothe following categories.

Docking Using ‘Soft’ ReceptorsThe simplest way of considering protein flexibility isby decreasing the energy penalties for close contacts(steric clashes) between the ligand atoms and the re-ceptor, thus allowing some degree of overlap betweenthe protein and the ligand. Since the first implemen-tation was reported by Jiang and Kim,83 this methodhas been made available for various widely used dock-ing programs. Glide17 allows scaling of the van derWaals radii of nonpolar receptor atoms during the re-ceptor grid generation. Similarly, users of GOLD23,24

can apply a softer ‘Split van der Waals Potential’ tocertain selected residues.

The ease of implementation and interpretation,and its low computational cost, make this approachsuitable for virtual screening applications. However,soft docking can only model small changes in thebinding site and can sometimes produce incorrectbinding modes if the soft region is too large. It hasbeen suggested that soft docking can be appropriatewhen the protein structure is generated by homologymodeling, where robustness may be more importantthan accuracy.84 Alternatively, a soft-docking modelmay be used for prescreening a large database to re-move compounds that, due to their size or shape, havelittle chance to dock well. More sophisticated dock-ing approaches and more accurate scoring functions



FIGURE 4 | Examples of backbone and side-chain flexibility in protein kinases. Left: In the Protein Kinase B X-ray structure 2JDR (greencarbons), Phe163 (ball-and-stick) folds into the binding site to gain lipophilic contacts with the ligand, whereas in 2JDO (pink carbons), Phe163 isdirected out of the binding site, leaving a lipophilic pocket under the P-loop, which is occupied by a chlorophenyl group on the ligand. Right: In p38MAP kinase, movement of the DFG motif away from its usual binding site (pink) to a ‘DFG-out’ conformation (orange) reveals a lipophilic pocketoccupied by the ligand BIRB-796 (green).

can then be applied to the smaller set to yield the finalselection.85

Docking Using Side-chain FlexibilityAnother way to account for protein flexibility isto allow some rotation of the protein side chainsin the binding site. Rotamer libraries can be usedto define the allowed torsional values and reducethe search space. An early implementation of liganddocking to proteins with discrete side-chain flexibil-ity was reported by Leach.86 Although early versionsof GOLD23,24 restricted flexibility to small terminalgroups such as the hydroxyl groups on serine and ty-rosine side chains, the latest release allows side-chainand backbone flexibility for up to 10 user-definedresidues.87 Rotation of hydroxyl groups during dock-ing has recently been implemented in Glide. A morecomplete treatment of side-chain flexibility is avail-able with Version 4.2 of AutoDock25,26 and Slide.88

The latter, rather than shifting from one rotamer toanother, adjusts the side chains in the binding site byapplying a minimum rotation model (i.e., only smalladjustments of the dihedral angles).

Various docking studies describe the success-ful application of incorporating side-chain flexibil-

ity. Frimurer et al.89 used crystallographic informa-tion to build a rotamer library for protein tyrosinephosphatase 1B and reported better poses and bet-ter binding energy predictions for docking three in-hibitors with FlexX. Using ICM, Cavasotto et al.90

docked retinal into bovine rhodopsin, incorporatingrandomized rotamers for 21 residues in the bindingsite, and obtained good results in docking the nativeligand and in a VS study. Alberts et al.,91 applyingthe program Skelgen to acetylcholinesterase and ma-trix metalloprotease-1, reported that optimizing theside-chain torsional angles gave better results thanusing rotamer libraries.

These methods have a moderate computationalcost and are most successfully applied when the flex-ibility of a binding site is well described by move-ments of the side chains, because they generally donot take account of backbone flexibility. Given thata recent study92 of 206 pairs of binding sites (sameprotein, similar but not identical ligands) highlightedside-chain movements in 50% of the pairs and back-bone movements in only 7% of the pairs, using onlyflexible side chains may be an acceptable strategyin virtual screening applications, when at least onecrystallographic structure of the target protein boundto a ligand is available.



Docking Using an Ensembleof Receptor StructuresIn its simplest implementation, this approach con-sists of running multiple docking experiments usinga set of discrete protein structures. The protein struc-tures can be either generated computationally or ex-perimentally determined by crystallography or NMR.This approach allows the inclusion of any conforma-tional change (backbone and/or side chain) that canbe represented within the set of input structures. Inparticular, it may be beneficial to use an ensemblewhen docking to low-resolution X-ray structures orto homology models (especially, if there are mutationsor regions of low homology to the template near thebinding site). The method is computationally depen-dent on the number of protein conformations and can,therefore, be applied to virtual screening only whenusing a small number of receptor structures. Further-more, the analysis of the results is more complicatedthan for a single docking run because multiple dock-ing poses are produced for each ligand and a highernumber of false positives are likely to occur.

To increase the docking efficiency and simplifythe analysis of the results, some docking methodshave been modified to be able to deal automaticallywith multiple receptor structures. One approach thatworks well with grid-based methods is to combine thestructural information from multiple protein confor-mations in a single interaction grid, where for eachgrid point the average value, the minimum value,or an energy-weighted value across all the structuresis used. Another approach consists of modifying thedocking algorithm to be able to dock the ligands intoa set of predefined protein structures and to selectthe best pose automatically. AutoDock,25,26 ICM,27

Flex-Ensemble,93 the Glide virtual screening work-flow, and more recently GOLD94 are examples ofwidely used programs supporting docking to an en-semble. Flex-Ensemble is also able to recombine flex-ible regions from the ensemble structures to gener-ate new conformations, thus further expanding theprotein conformational space (although there is thedanger that some of the recombined conformationsmay not exist experimentally). The program IFREDA(ICM-Flexible Receptor Docking Algorithm95) wasspecifically developed to account for protein flexibil-ity in virtual screening: The protein conformationsare generated by docking a set of known ligands toa flexible receptor (both side chains and backbonemovements are allowed). The ensemble can then beused to perform multiple docking and scoring runson a VS dataset. The VS scores are finally merged andonly the best rank is kept for each compound.

Hybrid MethodsHybrid methods combine various iterations of dock-ing and protein modeling to model receptor flexi-bility. Well known examples include IFD (InducedFit Docking96) and SCARE (SCan Alanines andREfine97). The IFD protocol consists of three steps:an initial run of ‘soft’ docking using Glide SP (specificbinding site residues can be mutated to alanine or theirvdW radii can be scaled), followed by refinement ofthe active site side chains using the Prime homologymodeling package, and then a final step of redock-ing and scoring. IFD is probably too computationallyintensive to be used in large virtual screening cam-paigns, but it is certainly an interesting methodologyimplemented as an easy-to-use script and has been ap-plied with success to a number of very diverse targets(e.g., nuclear receptors,98,99 kinases,100–102 G-proteincoupled receptors (GPCRs),103,104 the hERG ionchannel,105 and cytochrome P450s106). The SCAREalgorithm automatically generates multiple variantsof the receptor by systematically mutating adjacentresidues in the binding site to alanine, docking a lig-and into each model, and finally refining and scoringeach complex by optimizing the receptor (with fullside chains) around the ligand. SCARE was tested ina cross-docking experiment on a benchmark set of 30ligand–protein pairs and showed significant improve-ments in pose prediction compared with single griddocking.

Molecular Dynamics SimulationsMolecular Dynamics (MD) simulations can be used togenerate the ensemble of protein structures or to refinethe results of a docking run. The latter is a compu-tationally intensive approach and still not exhaustive(the results will be biased by the initial docking poseand the setup parameters of the dynamics run), there-fore it is only applicable in the lead optimization stage,and well beyond the current resources/capabilities ofvirtual screening. A perspective on the use of MD sim-ulations to generate multiple receptor structures hasbeen published by Cozzini and coworkers.107

Comparative Performance of FlexibleDocking MethodsAlthough the method of choice will mostly be dictatedby the timescale of a project and the available compu-tational resources, it is interesting to compare the firstfour types of approaches in terms of enrichment in vir-tual screening. Incorporating some form of receptorflexibility into the docking process generally improves



the binding mode prediction of non-native ligands,although soft docking can sometimes yield less accu-rate docked geometries, probably because soft energypotentials can favor van der Waals interactions overhydrogen bonds and other polar interactions. As thefollowing examples illustrate, there is still an open de-bate on whether using flexible side chains or multiplereceptor structures consistently improves the hit-ratesof virtual screening experiments.

Ferrari et al.108 compared the performances ofrigid docking, soft docking, and multiple receptordocking in two VS experiments on two well-studiedproteins: T4 lysozyme (characterized by small con-formational changes upon ligand binding) and al-dose reductase (characterized by large conformationalchanges upon ligand binding). In both cases, the en-richments were highest when multiple receptor con-formations were used; soft docking provided the nextbest enrichments and the use of a rigid receptor gavethe lowest enrichments. When analyzing the bind-ing poses of known ligands, they found that usingmultiple receptor conformations produced geometriescloser to the crystal structures than soft docking. Arecent study by Kokh and Wenzel85 on 12 receptorsreported significant improvements in enrichment ratesfor flexible receptors compared with rigid docking in11 out of 12 studied targets, whereas soft dockingfailed to significantly increase the enrichment rates.Barril and Morley109 studied two proteins for which alarge number of X-ray structures are available: CDK2and Hsp90. Although they reported a better perfor-mance of flexible receptor docking over rigid dock-ing in binding mode predictions, disappointingly theyfound that the enrichment factors (particularly in thetop 1% of the database) did not improve when us-ing flexible receptor docking due to an increase in thenumber of false positives.

Although in many cases incorporating proteinflexibility leads to an improvement in VS results, thisstudy raises the point that the number and quality ofreceptor structures can significantly affect the results,and flexible docking is no substitute for a careful anal-ysis of the binding site. A judicious choice of a fewcarefully selected conformations may lead to betterresults than the inclusion of every available structure.A simple rule proposed by Rueda et al.110 consistsof selecting the protein conformations bound to thelargest ligands. These data suggest that the results ofa virtual screening experiment are likely to be depen-dent on the target, on the amount of crystallographicinformation available, and on the way receptor flex-ibility is modeled. Although for some targets, mul-tiple crystal structures co-crystallized with a varietyof ligands are available, for the majority of virtual

screening projects, the likelihood is to that one willhave to rely on calculated conformations. Recently, asuggestion was made to submit to the PDB an ensem-ble of models for macromolecular crystal structuresrather than a single model (following the system inplace for NMR-derived structures).111 Incorporatingexperimentally based protein flexibility into dockingmay enable more robust virtual screening. While high-lighting some technical issues, the PDB has welcomedthe proposal and encouraged software developers towork together to arrive at a consensus approach forhandling multimodel data.

TREATMENT OF ACTIVE SITEWATER MOLECULES

Water molecules play an important role in the interac-tion between proteins and ligands. Water moleculesretained during the formation of the complex maymediate hydrogen bonding between the ligand andthe protein, or alter the size, shape and polarity of thebinding site. Alternatively, waters can be displacedby the ligand and this may result in a gain in bind-ing affinity. Targeting displaceable water moleculeshas indeed proven to be a successful strategy in var-ious medicinal chemistry programs. For example, inthe rational design of inhibitors of HIV-1 protease, aseries of cyclic ureas was developed in which the car-bonyl oxygen mimics the hydrogen-bonding featuresof a key structural water molecule.112 The use of anitrile functionality to displace and mimic a crystal-lographic water molecule was successful in two dif-ferent programs: scytalone dehydratase113 (Figure 5)and p38 MAP kinase114 (Figure 6). In all three stud-ies, X-ray crystallography confirmed the displacementof the water and the proposed binding mode of thedesigned ligands.

Incorporation of Water MoleculesDuring DockingIn an analysis of 392 crystal structures of protein–ligand complexes,115 a total of 1829 water moleculeswere found, with each structure having an averageof 4.6 ligand-bound water molecules. The abundanceof water molecules in protein–ligand complexes, withtheir significant effect on the energetics of the ligand-binding process and their potential to be exploited indrug design, has inspired various attempts to includesolvation in docking and scoring. Most docking pro-grams assume that the position of water moleculesis known from crystal structures, and the user isonly required to decide which waters to retain dur-ing docking. Solvation can be then considered as



O

HN

OH

Br

HO

HO

NH

O

H

O

NH

O

H

Tyr50

Tyr30

NH

N

N NO

NH

O

H

O

NH

O

H

Tyr50

Tyr30

FIGURE 5 | Crystal structure of scytalone dehydratase in complex with two different ligands. (Left): A water-mediated interaction of the ligandwith the two tyrosine residues (PDB code: 1STD). (Right): The water has been replaced by a nitrile group that interacts with both tyrosines (PDBcode: 3STD). (Hydrogen atoms added using Maestro from Schrodinger, Inc.)

another aspect of protein flexibility and included bysimply creating multiple receptor models incorporat-ing one or more fixed water molecules. For example,AutoDock25 can use multiple energy grids represent-ing various hydration states of the protein.

In FlexX,116 water molecules can be modeledexplicitly (with water hydrogens predefined by theuser) or as ‘particles’. Particles have the advantagethat they are freely rotatable and can be defined asdisplaceable or not displaceable. Displaceable waterscan form interactions with the ligand and thereforebecome active, or the ligand can overlap with them.Similarly, in GOLD, water molecules are representedby an all-atom model and can be switched on or offand rotated around their three principal axes.117 The

main difference with previous approaches is that thedisplacement of a water molecule by the ligand is re-warded by adding a fixed penalty to the waters thatare switched on. FITTED 1.0118 uses a GA to modelthe flexibility of both the protein and the ligand andto place bridging water molecules in the site. In addi-tion, a switching function allows the user to retain ordisplace key waters.

A study published by Roberts and Mancera119

using the CCDC/Astex dataset found that dockingaccuracy was generally improved by including watermolecules. Huang and Shoichet120 analyzed 24 differ-ent protein targets in virtual screening applications.For 12 targets, better enrichments were found whenincluding waters, whereas for 11 targets the results



OH

H

O

N

N

HN

HN

NN

O

OHN

NH

O

O

NH

H

N

COO-

O

O-

O

HN

O

Glu 71

Hinge (Backbone only)

Asp168

FIGURE 6 | Crystal structure of p38 MAP kinase in complex with anaphthalene-urea (PDB code: 2PUU). The contact to the hinge (grayribbon) is unusually achieved through mediation of a water molecule.(Hydrogen atoms added using Maestro from Schrodinger, Inc.)

were unchanged. On the basis of their results, bothstudies were in favor of including water molecules inthe docking process, although they highlighted somepitfalls. First, when including water molecules, theremay be a need to reparameterize the scoring functions(to avoid counting water-mediated interactions twice,if they were implicitly accounted for in the scoringfunction). Second, when two ligands are docked witha different number of waters, it is not necessarily clearhow to compare their scores. Finally, it is not alwayseasy to make the initial decision about which watersshould be treated as displaceable and which shouldbe treated as fixed. When setting up a docking study,it should be kept in mind that there is potential for astrong bias if too many water molecules are included.Therefore, it is generally advisable to take a cautiousapproach regarding the inclusion of crystallographicwater molecules.

Scoring of Waters in Docked andCrystallographic ComplexesAlthough a full treatment of water molecules dur-ing docking remains an unsolved problem, progresshas been made in some areas. Scoring functions arebeing reparameterized to represent water interac-tions more realistically and specific desolvation energyterms have been introduced.42,121 Methods based ona statistical analysis of parameters derived from crys-tal structures (such as B factors, solvent-contact sur-face areas, number of protein–water contacts, etc.),which can help distinguish between bound and dis-placeable waters, have been available for a while (e.g.,Consolv,122 WaterScore123). More recently, Amadasiet al.124 have proposed a computational protocol toidentify ‘relevant’ water molecules using a combina-tion of two terms: the HINT score gives an estimateof the strength of the interaction between the waterand the protein, whereas the Rank descriptor calcu-lates the number and geometric quality of potentialhydrogen bonds between the water and the protein.WaterMap,125 from Schrodinger, runs an MD sim-ulation of water molecules in a binding site to pro-vide an estimate of their entropy and enthalpy. Watermolecules classified as entropically and enthalpicallyunstable may be targeted for displacement in lead op-timization.

Despite these advances in the field, the correctplacement of water molecules prior to docking ex-periments can still be problematic. There is increas-ing evidence that the number and position of watermolecules in a binding site can vary depending onthe protein conformation and the structure of the lig-ands, particularly for highly flexible and promiscu-ous proteins (e.g., cytochrome P450s). In a study ofthe complex between cytochrome P450 2D6 and asubstrate,126 MD simulations have been used to iden-tify stable water molecules to be included in subse-quent docking experiments with multiple protein con-formations and different substrates. The addition ofthose waters led to an improvement when docking thesame substrate or closely related analogues, but lit-tle improvement or even worse results were obtainedwhen docking different substrates or using differentprotein conformations.

POSTPROCESSING AND ANALYSISOF RESULTS

Given the inability of scoring functions to rank lig-ands reliably, there has been considerable interest inalternative methods to support the analysis of the



large datasets generated by structure-based virtualscreening. These include simulation methods aimedat a more accurate estimation of binding affinity, aswell as procedures for classification of binding modesand detection of misdocked poses.

Physics-based RescoringOne of the most active areas of research has beenthe application of ‘physics-based’ methods for rescor-ing the docked poses selected by the primary scor-ing function. These physics-based approaches employfull molecular mechanics force field simulations cou-pled with sophisticated methods for treating solvationeffects.127 The physics-based methods can be splitbroadly into two groups, depending on the methodused for modeling solvation.

Molecular Mechanics-Poisson–BoltzmannSurface Area ApproachesAmong the leading proponents of the MolecularMechanics-Poisson–Boltzmann Surface Area (MM-PBSA) approach to rescoring is the groupat Abbott.128,129 They have developed a high-throughput version of MM-PBSA which, by using animplicit solvent model, rather than explicit waters,reduces the computational demand by two orders ofmagnitude. The resulting protocol still requires ap-proximately 100 CPU minutes per complex but whenrun on a large cluster or ‘grid’, this brings rescoringof appreciable numbers of docked poses within reach.When tested on a set of 308 ligands binding to threeproteins (urokinase, protein-tyrosine phosphatase 1B(PTP-1B), and Chk-1), the method was able to pro-duce correlation coefficients between actual and pre-dicted binding affinity of 0.72–0.83. A different high-throughput MM-PBSA protocol eschewing MD sim-ulations in favor of simple energy minimization wasinvestigated by Thompson et al.,130 but failed to showsignificant improvement over simpler non-PBSA ap-proaches. It thus appears that there is a limit towhich rescoring procedures can be simplified andspeeded up.

Molecular Mechanics-Generalized BornSurface Area ApproachesOther groups have pursued an alternative formalismfor physics-based rescoring using the MolecularMechanics-Generalized Born Surface Area(MM-GBSA) method, which is regarded as beingcomputationally less intensive and almost as accurateas MM-PBSA approaches.130 The ability of anMM-GBSA approach to improve early enrichment in

virtual screening has been demonstrated usingsets of known binders for nine enzyme targets inconjunction with a collection of almost 100,000drug-like decoy compounds.131 After initial dockingand scoring using the DOCK program, the top25% of the docked compounds were rescored usingan MM-GBSA method. The results showed thatthe rescoring could lead to increased enrichment,particularly early enrichment (i.e., in the top 1%of the database), although in some cases rescoringled to a degradation of results. Reasons for the lessthan perfect performance were suggested, includingthe use of a rigid receptor model and the inabilityto rescue incorrectly docked poses generated by thedocking algorithm. Other workers have reportedsuccessful rescoring of docked poses, leading togood correlations with binding affinities withincongeneric ligand series. Lyne et al.132 exploredthe application of MM-GBSA rescoring to fourkinase targets and obtained correlation coefficientsof 0.71–0.84 between predicted and experimentalbinding affinities. Guimaraes and Cardozo133 used amore diverse set of pharmaceutically relevant targetsand obtained excellent correlations (r values between0.8 and 0.87).

In summary, physics-based rescoring methodsdo seem to be of some value, particularly in rescor-ing virtual screening datasets, where enhancementsin enrichment and scaffold diversity can be obtained.The problem of accurately predicting binding affini-ties remains; some successes have been reported but,anecdotally, such outcomes seem to be the excep-tion rather than the rule, and so further work willbe required in the development and testing of themethods before they can be universally applied withconfidence.

Interaction FingerprintsAn alternative to the use of docking scores to navi-gate through a set of poses is to examine the patternof interactions that each compound makes withinthe binding site. This is the motivation behind thedevelopment of various analysis methods based on‘interaction fingerprints’. At the heart of these meth-ods is the reduction of the 3D structural binding in-formation present in a protein–ligand complex to aone-dimensional binary string.134 Once the poses areall represented in this manner, it becomes simple touse existing cheminformatics techniques to identifyall compounds making a particular set of interactionswith the protein (akin to substructure searching) orto find compounds making similar sets of interactions(akin to similarity searching or clustering).



The archetypal SIFt (Structural Interaction Fin-gerprint) method134 creates a bitstring that encodesall the interactions between the ligand and protein,including information about whether the ligand is incontact with a backbone or side-chain atom, whetherthe interaction is polar or nonpolar in nature andwhether the protein residue provides hydrogen-bonddonors or acceptors. Each protein residue is thus de-scribed by seven bits and the concatenation of theseresidue-based bitstrings for all active site residuesleads to the SIFt for a given pose. SIFts were shown tobe very effective at enabling the retrieval of 16 knownp38 inhibitors from a set of 1000 random molecules,giving better enrichment than some well-known scor-ing functions.134

Various other similarly motivated interactionfingerprints have been described in the literaturein recent years, and these have been reviewed byBrewerton.135 More recently, a development of theoriginal SIFt approach has been published.136 As thename suggests, weighted-SIFt allocates weights to thebits describing the protein–ligand interactions in anattempt to capture the relative importance of each bitto the binding affinity. This does, however, requireaccess to a reasonably large body of binding datato allow the setting of the weights. Another varia-tion on the theme has been described by Perez-Nuenoet al.,137 who have developed an atom pairs-based in-teraction fingerprint (APIF). In APIFs, protein–ligandcontacts are distinguished by the distance betweenthe interacting atoms rather than being described in apurely binary (present/absent) manner.

From the published examples, the various in-teraction fingerprint methods seem to have muchto offer as a complement to other rescoringstrategies.

Machine LearningMachine learning methods have been evaluated in awide range of applications in virtual screening, bothto incorporate new types of descriptor as well as toprovide a more reliable ranking from existing scor-ing functions, e.g., by boosting the signal-to-noiseratio.138 PostDOCK139 is a postprocessing filter de-signed to discriminate between true binders and de-coy poses generated by DOCK. The underlying con-cept (generally applicable to any docking program) iswhether decoy poses can be identified in terms of 3Dstructural features, which may not necessarily be re-flected in the overall docking score. PostDOCK useda set of 32 structural descriptors derived from com-ponents of the Dock and ChemScore scoring func-tions and components of the buried solvent accessible

surface area. Random forest classifier and logistic re-gression methods were trained against a dataset of152 crystallographic complexes matched with a setof decoy poses. On a separate test set of 44 proteins,the best model achieved a 19-fold enrichment (i.e.,for the discrimination of correctly docked complexesover decoys). This type of approach is attractive inthat it attempts to address one of the main deficien-cies of scoring functions, where a particular decoymay score well overall, but it is apparent on visual-ization that the binding mode is not plausible, e.g.,due to poor intermolecular geometries, mismatchedpolar-lipophilic contacts, or inadequate filling of akey binding pocket.

An alternative machine learning approach forpostprocessing virtual screening results is based upona naıve Bayes (NB) classifier.140–143 In its initialform,140 the approach takes the output from a high-throughput docking experiment and, by denoting thetop-scoring ligands as ‘active’ and the rest as ‘inac-tive’, trains an NB classifier to distinguish betweenthe two classes based on the presence and absence ofsubstructural features encoded in the compounds’ fin-gerprints. Typically, ‘active’ compounds are definedas those that possess a docking score better than threestandard deviations from the mean score. This usu-ally results in the ‘actives’ being selected from the top1% of the docked compounds. The docked set of com-pounds is then reranked by the classifier, which is usu-ally found to give an improved ranking. Klon et al.140

describe the application of this methodology to twotest cases using three docking programs. For protein-tyrosine phosphatase 1B (PTP1B), the NB classifierwas able to improve the docking results. However,for protein kinase B, the application of the rerank-ing procedure led to a degradation in performance.On analysis, this was found to be because the highlyranked compounds in this case were all false posi-tives. Thus, there is a caveat to the application of themethod, viz., that the initial docking should producesignificant enrichment. A second publication from thegroup presents a refinement of the approach incorpo-rating consensus scoring of docked poses with the aimof producing a better ranking of the compounds af-ter the docking step.141 It was shown that the revisedapproach was able to overcome the problem previ-ously observed in the PKB example and did not leadto poorer results for protein-tyrosine phosphatase 1B(PTP-1B) than in the previous instance. Thus, thiswould seem to be a more robust and generally ap-plicable method. An attractive feature of this kind ofmachine learning approach is that the computationsthat it requires are very fast, so the reranking can beaccomplished in a matter of minutes.



FIGURE 7 | Example of clustering a virtual screening dataset in terms of the volume overlap of the docked poses. The figure shows threerepresentative clusters, which contain ligands of diverse structure but with a similar 3D shape in the binding site. The example is based on a virtualscreen to identify hinge binders to a protein kinase (hinge region shown in brown).

Other ApproachesThere are numerous other approaches that can aidin the navigation of the large sets of docked posesthat typically result from a virtual screening cam-paign. It is common for some consideration of ligandefficiency144 to be included, as some scoring functionshave a molecular weight bias, which can tend to causesmaller, more efficient ligands to be overlooked.145

The computation of the overlap volume betweenthe individual docked poses and a co-crystallized in-hibitor can also be a useful way of reranking a set ofdocked compounds (Figure 7). More traditional 2Dcheminformatics tools can also provide a rapid meansof grouping the docked poses by structural families,which can be particularly useful if the database beingdocked is rich in analogue series that bind similarly.The combination of a variety of tools, including awell-designed graphical interface, is likely to be thekey to successful rescoring and analysis of the largedatasets generated in virtual screening. Integration ofthese often disparate methods is nowadays facilitatedby workflow tools such as Pipeline Pilot or Knime.16

VALIDATION OF METHODS

Perhaps no subject has sparked so much controversyin recent years as the issue of validation of dockingmethods, particularly when applied to virtual screen-ing. At first sight, it might appear that the objectiveassessment of a docking program would be straight-forward. The desired end points are clear: to repro-duce experimentally determined binding modes andbinding affinities and to demonstrate success in a vir-tual screening scenario by retrieving known ligands

in preference to decoy compounds. Yet the defini-tion of appropriate datasets, metrics, and statisticalanalyses continues to provoke much debate in theliterature.146–149 Fortunately, progress is now beingmade toward a set of recommended procedures forensuring a high quality of experimental design andsufficient statistical rigor.

Objectives and Experimental DesignIt is worthwhile to consider the purpose of validationstudies. For the practicing modeler, a typical objec-tive is to define the most suitable protocol for routineuse in virtual screening campaigns or for a specificproject. However, in real-life applications, success isdependent not only on the accuracy of the dockingprogram but also on many other factors, including thecomposition and preparation of the ligand database,the nature of the receptor, postdocking analysis andrescoring protocols, and even access to appropriatehardware and software.149 In practice, available SARdata may be used to bias heavily the docking andanalysis stages, whereas targets that are less well char-acterized will need to be addressed more objectively.

Docking algorithms and scoring functions differenormously in terms of their methodologies and pa-rameterization. Therefore, a more general objective ofvalidation is to quantify the merits and failings of par-ticular algorithms and thereby propose directions forfurther optimization. This topic seems somewhat ne-glected in the literature: large-scale validation studiesrarely address the issue of why a particular programis underperforming. In particular, specific problemsat a microscopic scale (e.g., in the scoring function)are too easily ignored when the focus is on a moremacroscopic end point (e.g., enrichment).



Validation studies are generally designed toevaluate one or more of the following objectives:

• Docking accuracy—comparison of dockedand crystallographically determined poses fora set of heterogeneous receptor–ligand com-plexes (typically one ligand per receptor or oc-casionally cross-docking of ligands into mul-tiple receptor conformations).

• Scoring accuracy—comparison of predictedand experimentally determined binding affini-ties, either for a heterogeneous set of com-plexes or for multiple ligands against a singlereceptor.

• Virtual screening enrichment—the ability toretrieve known inhibitors objectively from adatabase of random (decoy) compounds, typ-ically achieved by using a scoring function torank all docked compounds.

Of the many validation studies published todate, the paper by Warren et al.35 is a notable exam-ple of good experimental design. Ten docking codesand 37 scoring functions were evaluated across eightprotein targets, using ligand datasets representativeof the GSK corporate database. In particular, liganddatasets were designed to have statistically meaning-ful numbers of actives and inactives, with an appro-priate structural diversity and spread of binding affini-ties. Docking programs were compared on the basisof accuracy of binding mode prediction and ability toretrieve actives in a virtual screening scenario. Alsoof note is that the docking experiments were per-formed by computational chemists experienced withthe particular software and protein target in order toreflect more accurately the real-life application of thedocking methods and the appropriate preparation ofthe protein model. The study notably highlighted thewide variation in docking success between individualprograms, and indeed of any single program appliedacross a range of targets, and also the universally poorperformance in the prediction of binding affinity.

Potential Issues and Recommendationsfor Good PracticeRecently, recommendations for ensuring the robust-ness of validation studies have been discussed indepth in the literature.146,147 The following sectionshighlight topics of particular concern and some newdevelopments.

Standardization, Reproducibility, andPublication of Experimental DetailsA frequent criticism of published studies has been oneof inadequate reporting of experimental details anddata, leading to the situation that accurate reproduc-tion of the findings is usually very difficult, if notimpossible. The problem is not only due to the sizeand complexity of the datasets or even to an under-standable reluctance on the part of researchers to dis-close datasets that have taken much effort to compile.Docking results can be very sensitive to a wide rangeof factors such as the version number of the software,settings of numerous docking parameters,150 and themodeling procedures used for preparing the receptormodel and ligand dataset.70,149,151

Note also that it is not necessarily a question of aparticular modeling procedure being ‘wrong’—thereare many examples where modelers have to choosebetween equally valid alternatives (e.g., different pro-tocols for processing a PDB structure or defining theextent of the binding site). Therefore, there is an un-derstandable difficulty in describing every aspect of avalidation study in sufficient detail. However, thereare many common issues that can be fairly readilyresolved, e.g., 3D models of ligand datasets and re-ceptors should be made available, rather than thepublication of 2D structures for ligands and PDBcodes for receptors.

Progress has been made toward making bench-mark datasets publicly available. Notable examplesare the Astex Diverse Set of high quality receptor–ligand crystal structures for use in evaluating dockingaccuracy152 and the Astex Non-native Set for evalu-ating cross-docking performance.153 The Directory ofUseful Decoys (DUD) virtual screening dataset con-tains substantial numbers of actives and decoys, wellmatched with regards to physicochemical properties,for 40 protein targets.154 However, there remainsthe challenge of ensuring that such datasets are keptup-to-date and indeed extended.155 An interestingdevelopment is the use of a blind test set, wherebyresearchers have the opportunity to predict a set ofexperimental data before the results are released intothe public domain. This approach is well known inseveral areas of molecular modeling, such as proteinhomology modeling, and has recently been applied toprotein–ligand docking.156

Achieving a fair comparison of different dock-ing protocols is a more difficult issue. There are argu-ments to be made either for the use of software withdefault settings (‘out-of-the-box’) or as driven by anexpert user. As a general rule, it would be prefer-able to explore a range of docking options for each



program, in order to be confident that the results weretruly representative of the performance of the dockingalgorithm or scoring function, rather than an artefactof a particular parameter.150

It is worthwhile emphasizing that reproducibil-ity is a matter of not only achieving exactly the sameresults under exactly the same starting conditions butalso achieving similar results under similar startingconditions. Therefore, if docking methods are indeedhighly sensitive to subtle features of receptor repre-sentation or choice of docking parameters, then weshould encourage a greater degree of replication ofindividual studies, as is done in other experimentalsciences, in order to determine the statistical distribu-tion of the results. For example, this may be achievedby changing a random number seed in a stochasticalgorithm or redocking with multiple receptor con-formations.

Sources of Error and Sources of BiasCareful scrutiny of the assumptions and data usedwithin a virtual screening workflow should highlightthe more common sources of error.148 These include:

• experimental data of low quality—e.g., poorcrystallographic resolution or inconsistencyof binding affinities derived from multiplesources or multiple assay formats;

• inconsistencies or erroneous assumptions inthe preparation of receptor models and liganddatasets—e.g., incorrect treatment of proto-nation and tautomerization states.

However, docking studies are susceptible tomany subtle sources of bias, which may favor theselection of actives over decoys, such as:

• poor experimental design of the train-ing/test dataset—e.g., a significant differ-ence in physicochemical property profiles orchemotypes between actives and decoys;

• receptor or ligand preparation protocols,which may unfairly bias the docking results—e.g., assignment of receptor protonation stateson the basis of known ligand binding interac-tions;

• the degree to which the application of a par-ticular docking program is fine-tuned in a waythat may favor particular chemotypes.

Recommendations for the design of lig-and datasets have been presented by severalauthors.146,147 The guiding principle is to ensure thatdecoy compounds are chemically relevant (i.e., drug-

like), diverse, and similar in properties to the actives,with sufficiently large numbers of actives and decoysto support statistical validation. There is also the issueof how far a validation study should reflect realisticproject applications—e.g., a training set containingoptimized, high affinity actives is not representativeof most virtual screening scenarios in which the mod-eler is most likely to identify low molecular weight,weakly active leads.

Appropriate Metrics and Statistical ValidationOver recent years, the practicing modeler has beenfaced by a bewildering array of validation studies dis-cussing the successes and failures of different dockingpackages applied to different datasets. It has gener-ally proven very difficult to assess the reliability ofthe results and to make meaningful comparisons be-tween publications. This is not only due to a lack ofstandardization in experimental design but also dueto disagreement on the most suitable metrics to mea-sure success, and to a widespread omission of anyquantitative reporting of statistical significance.

Many traditional metrics have been criticizedas being poorly behaved or insufficiently robust. Forexample, RMSD appears at first sight to be an obvi-ous descriptor for the accuracy of a docked versus anexperimental pose, but it is not necessarily informa-tive when only part of the ligand is of interest, or tocompare ligands which are misdocked or differ-ing widely in size.146,157 Likewise, the commonlyused threshold of 2 Å RMSD for a ‘correct’ dockingsolution is somewhat arbitrary. Other descriptors ofdocking accuracy have been proposed but have notachieved widespread use, e.g., the scoring of pose ac-curacy in terms of the reproduction of key bindingcontacts is a very relevant parameter but one that ismore difficult to automate.149

Metrics for defining virtual screening enrich-ment have also provoked much controversy. The clas-sic definition of enrichment is the fraction of actives ina selection of the docked dataset compared with thefraction of actives in the complete dataset. The maxi-mum enrichment is clearly dependent on the numbersof actives and decoys and, therefore, is not a useful de-scriptor for comparing docking results across datasetsof different sizes. In addition, the selection of the topranked 1% or 10% of the dataset is somewhat arbi-trary. (Few studies report selections on much smallpercentages, e.g., 0.1% or less, which are more re-alistic criteria when screening libraries of the size of105–106 compounds.)

Therefore, the ideal measure of enrichment isone that can be used to compare differently sizeddatasets, and that describes the discrimination of



FIGURE 8 | Use of ROC curves as a metric of virtual screeningperformance. A high enrichment rate is exemplified by the blue curve(rapid retrieval of actives; all actives retrieved relatively quickly), apoorer enrichment by the pink curve (slow initial retrieval; late retrievalof all actives), and a random retrieval rate by the green line. The ROCcurve is defined as a plot of selectivity versus specificity across theentire ranked dataset, where sensitivity is the fraction of activesretrieved and specificity the fraction of inactives discarded. Enrichmentcan be quantified by the area under the curve (AUC = 1 for perfectretrieval of actives; AUC = 0.5 for a random retrieval rate).

actives/decoys across the whole of the dataset ratherthan at one or more arbitrary thresholds.149 In partic-ular, it is important to describe the ability of differentdocking methods to achieve early enrichment. Otherfeatures that may be desirable are the ability to de-scribe the ranking of actives (within a selection) andthe structural diversity of the hit lists.

Plotting of receiver operating characteristic(ROC) curves has become a favored method for re-porting VS performance. ROC curves depict the truepositive rate (sensitivity) as a function of the falsepositive rate (specificity) across the whole of thedataset158 (Figure 8). The area under the curve (AUC)is commonly used as a simple quantitative parame-ter to describe the overall discrimination of activesfrom decoys and has the advantage that it is not influ-enced by the ratio of actives to decoys in the dataset.However, the AUC does not describe the shape of theROC curve, and hence it provides no information asto the ability of a VS method to retrieve actives ear-lier or later in the ranking. Other parameters havebeen proposed in order to yield a more informativedescriptor of early enrichment, e.g., BEDROC andRIE.149,151,159

Although validation studies routinely quote oneor more of these enrichment parameters, it is stillnot common practice to discuss the statistical signifi-

cance of the results, although this is surely an essentialcomponent of any experimental method. Error barsshould be presented for replicate runs, alongside rou-tine tests of statistical significance.146 It is also infor-mative to compare against decoy scoring functions—e.g., molecular weight or a simple two-dimensionalsimilarity metric.

Future Directions for Validation StudiesValidation has proven to be a challenging area indocking and virtual screening—the above summaryhas highlighted a number of points that have beendiscussed at length in the literature. Perhaps the keymessage is that molecular modeling should not con-sider itself different from any other scientific disci-pline: without careful experimental design and ro-bust statistical validation, it is simply not possible todraw meaningful conclusions as to whether a partic-ular program is performing better or worse than anyother.151 As discussed recently by Kolb and Irwin,41

it is also essential to check that the right resultsare obtained for the right reasons—too often virtualscreening studies focus on the headline enrichment fig-ures without checking the plausibility of the predictedbinding modes, and rarely are crystallographic stud-ies performed to confirm the binding mode of novelhits.

It is perhaps also worth stressing that anotherobjective should be to address the reasons why partic-ular docking algorithms or scoring functions are per-forming better than others, so that future advancesin these technologies can be addressed in a rationalmanner. Finally, it would be good to see more studiesevaluating the use of docking and scoring methodsin lead optimization scenarios, where there is often amore stringent need for accurate prediction of bindingmode and binding affinity than during virtual screen-ing for lead identification.

VIRTUAL SCREENINGSUCCESS STORIES

In what follows, some examples of virtual screeninghave been selected to illustrate its potential to have animpact on drug discovery projects. These have beenchosen as typical case studies reflecting current prac-tice in the pharmaceutical industry and academia.Additional examples can be found in some recentreviews.7,160

Pim-1 KinaseThe group at Vertex has reported a case of vir-tual screening being applied in a project targeting



N N

NH2

N

O

Cl

CO2H

NH

CO2H

S

N

HO

OHNH

CO2H

O

91 nM 550 nM

3.4 μM 4.5 μM

FIGURE 9 | The four Pim-1 kinase hits identified by virtualscreening at Vertex.161

Pim-1 kinase after a high-throughput screening cam-paign (hit rate 0.3%) failed to yield hits worthy ofprogression161—an example of what has been termed‘HTS rescue’. Thus, structure-based virtual screeningwas used in the hope of identifying alternative startingpoints for chemistry optimization.

The Glide program was used to dock approxi-mately 700,000 compounds into a Pim-1 kinase struc-ture. Candidate ligands were constrained to form aninteraction with the kinase ‘hinge’ (including the pos-sibility of aromatic C H· · ·O hydrogen bonds) andalso a key lysine residue. A total of 1200 poses wereselected for visual inspection, these being the 200top-scoring poses from each of six molecular weightranges. As a result, 96 compounds were tested in theprimary assay and four showed Ki values <10 μM,giving a hit-rate of 4.2% (Figure 9). Moreover, each ofthese four compounds was from a different structuralseries. Subsequently, the predicted binding modes oftwo of the compounds were confirmed by X-ray crys-tallography, including the aromatic C H· · ·O inter-actions.

Aldose ReductaseKlebe and coworkers162 have described a virtualscreening campaign seeking aldose reductase (AR)inhibitors starting from a high-resolution (0.66 Å)complex of the enzyme with the inhibitor IDD594(Figure 10). A series of filters was used to select a

OO

OH

F

HN

FBr

S

O2NN

ONOH

O

OO2NN

N

OS

OH

O

IDD594

1 (IC50 = 2.4 μM)

2 (IC50 = 4.1 μM)

FIGURE 10 | Chemical structures of IDD594 and the two mostpotent hits identified by virtual screening using the X-ray structure ofaldose reductase.162

subset of compounds from the Available ChemicalsDirectory for docking with FlexX. In the first step,simple molecular property filters were applied to elim-inate compounds with molecular weights >350 Daand more than eight rotatable bonds. Additionally,compounds were required to possess a carboxylic acid(or an isostere thereof) in order to facilitate interac-tions in the anion-binding pocket. Together, these fil-tering steps reduced the 259,747 compounds in theACD to 12,545. This set was pruned further by re-quiring compounds to match a 3D pharmacophorequery based upon ‘hotspots’ in the AR active site andthe binding mode of IDD594. After this step, 1261compounds remained for docking with FlexX. Thetop-scoring pose of each compound was checked toensure that it complied with the requirements of thepharmacophore used in the previous step. A total of206 compounds met this criterion and after visual in-spection aided by clustering, a set of nine compoundswas purchased for testing. Strikingly, when screened,five of the nine compounds showed IC50 values of lessthan 20 μM, the best being 2.4 μM (Figure 10).

More recently, the structure-based optimizationof these hits has been reported.163 The X-ray struc-tures of the complexes of compounds 1 and 2 withAR were solved and it was noted that, in both cases,there were polar atoms in the ligand that did nothave a counterpart in the active site. Thus, it wasdecided to replace the oxadiazole moiety by a furanor thiophene, leading ultimately to the compounds inFigure 11.

The X-ray structure of the complex of 4 withAR was solved (Figure 12) and has been deposited inthe PDB database (entry: 3DN5).

HIV-1 Reverse Transcriptase InhibitorsNon-nucleoside HIV-RT inhibitors are currently usedto treat AIDS. Unfortunately, their efficacy is limitedby the high rate of mutation of the viral protein: in



O

O2N

OHO

S

O2N

OHO

3 (IC50 = 260 nM) 4 (IC50 = 170 nM)

FIGURE 11 | Chemical structures of compounds derived fromvirtual screening hits.163

FIGURE 12 | X-ray structure of compound 4163 showing thehydrogen bonds between the ligand and the protein as yellow dashedlines. Note also the face-to-face stacking interaction between thenitro-bearing phenyl ring and the tryptophan beneath it.

particular, the mutation Tyr181Cys confers resistanceto many of the HIV-1 reverse transcriptase (HIV-1RT) inhibitors on the market. Therefore, the searchfor new HIV-1 RT inhibitors with a better resistanceprofile is still being pursued. Nichols et al.164 reportedthe discovery of three new active series using a virtualscreening approach. With the resistance issue in mind,they selected three crystal structures of HIV-RT:

• PDB code 1RT4 (wild type).

• PDB code 2BE2 (wild type with different ori-entation of Tyr181).

• PDB code 1JLA (Tyr181 mutation).

They applied two hierarchical VS approaches.In the parallel mode, they docked the ZINC com-mercial library63 (approximately two million struc-tures) into the three protein structures using Glide SPand selected the top-scoring 5000 from each run forGlide XP refinement. In the serial approach, the XPrefinement was applied to the 4684 structures com-mon to the top-100,000 in the three runs. A total of

O

N

HN

O

NC

N

N S

NHHO

N

NH

O

O O

EC50 (WT) 6.2 μMEC50 (Y181C) 12.0 μM EC50 (WT) NA

EC50 (Y181C) 7.5 μM

EC50 (WT) 4.8 μMEC50 (Y181C) NA

FIGURE 13 | Structure and anti HIV-1 activity data for the threehits found by virtual screening against HIV-1 reverse transcriptase.164

1500 structures were visually inspected and nine se-lected for purchase and screening. Remarkably, threeof them showed activity (Figure 13).

The three hits provide suitable starting pointsfor lead optimization and the authors state thatprogress has been made on two of them. This ap-plication highlights some of the benefits of dockingagainst multiple crystal structures. The authors weresuccessful in their search for activity against mutantproteins, but a similar strategy could also be appliedto achieve selectivity.

β2-Adrenergic Receptor LigandsHistorically, and currently, GPCRs represent one ofthe most important classes of drug target. Thus, ithas been a frustration for drug designers that un-til very recently, no X-ray structures of this impor-tant family had been solved because of the difficul-ties in crystallizing these membrane-bound proteins.The publication of the bovine rhodopsin X-ray struc-ture in 2000 provided some basis for structure-baseddesign/discovery of GPCR ligands by means of theconstruction of homology models; but it was not un-til 2007 that the first X-ray structure of a therapeu-tically relevant, ligand-mediated GPCR—the humanβ2-adrenergic receptor—was reported165 (Figure 14).This was followed more recently by structures of theβ1-adrenergic receptor (turkey) and the adenosineA2A receptor (human).

With the structure of the β2-adrenergic recep-tor in hand, two groups have described their at-tempts to exploit the new information it providesfor the purposes of structure-based virtual screening.Topiol and Sabio166,167 used the β2-adrenergicreceptor:corazolol complex together with the Glideprogram to conduct a virtual screen of 400,000



FIGURE 14 | X-ray structure of the β2-adrenergic receptor (PDB code: 2RH1), with the ligand carazolol highlighted in spheres.

TABLE 1 Screening Results Reported by Topiol et al.166,167

Number of Compounds Number of Compounds Yielding Range ofDatabase Selection Method Available for Screening Measurable Binding Affinities Hit Rate (%) Ki Values

Proprietary SBVS 56 20 35.6 0.11 nM–21 μMCommercial SBVS 94 11 11.7 13.7 nM–4.3 μMProprietary Diversity 320 1 0.3 257 nM

SBVS, structure-based virtual screening.

proprietary and four million commercially availablecompounds. The top-scoring 150 compounds fromeach of the databases were selected for biological test-ing in a radioligand binding assay along with a diverseset of 320 compounds chosen from the proprietarycollection. The results are summarized in Table 1. Itcan be seen that both sets of compounds selected bythe virtual screening approach show very high hit-rates, particularly the proprietary collection.

A similarly large-scale virtual screen was under-taken by Kolb et al.,168 who docked 972,608 com-pounds from the ZINC database63 using the DOCKprogram. A total of 25 compounds from amongthe top-scoring 500 were selected for biological test-ing. Of these 25, Ki values were obtained for six(Figure 15), all of which were less than 4 μM withthe most potent showing a Ki value of 9 nM. Thiscorresponds to an overall hit-rate of 24%, which isroughly comparable to the average of the two values

reported by Sabio and coworkers.167 However, it isworth noting that three of the compounds are closelyrelated and so the number of different chemical seriesretrieved (which is the measure of most interest tomedicinal chemists) is four, not six.

Taken together, these studies provide significantencouragement for undertaking virtual screens usingX-ray structures of GPCRs, which will hopefully be-come more commonplace as further therapeuticallyrelevant GPCR–ligand complexes are solved.

CONCLUSION

The application of protein–ligand docking is nowubiquitous in structure-based drug design, whether invirtual screening for hit-finding or for more detaileddocking of synthetic candidates in lead optimization.Docking programs are also increasingly being used by



O O

O

O

HO

HN

ONH

O ONH

OHO

NH

N

N+

S O

O OH

N

O

FIGURE 15 | The six ligands with reported Ki values <4 μM against the β2 adrenergic receptor.168

chemists and biologists with no formal backgroundin modeling, e.g., through access to web servers. Al-though the plethora of docking methods and scoringfunctions can be confusing to both the novice and ex-pert modeler, there is a core of docking programs thathave been very widely used for many years and havedemonstrated their strengths (and weaknesses) in alarge number of validation studies and prospectivevirtual screening applications.

There have been significant advances in the lastquarter of a century, such that the docking of verylarge datasets, e.g., of the order of a million com-pounds, is now a routine task. This is not only due tothe availability of fast docking programs and high-performance workstations but also due to the de-sign of software platforms that facilitate the automa-tion of every stage of the virtual screening process,from database preparation through to complex dock-ing and postprocessing procedures. Although thesetasks have traditionally been provided within a singlesoftware platform, the increasing popularity of datapipelining packages gives the modeler the freedom tointegrate the most appropriate examples of commer-cial and in-house software tools. Over the comingyears, this may support a shift from an uncritical ap-plication of ‘black-box’ packages to the developmentof more creative and finely tuned workflows.

However, despite these advances, it is also clearthat much remains to be done, particularly in the areaof scoring functions and prediction of binding affin-ity. The increasing use of physics-based scoring pro-cedures may represent the most promising route for-ward, perhaps coupled with regression or machine

learning techniques that exploit structure–activitydata for a particular receptor family or ligand chemo-type. Although it may be argued that current scoringmethods are actually good enough for virtual screen-ing (i.e., to simply retrieve a few plausible hits), theirperformance is typically far from acceptable for es-timating absolute binding affinities, for ranking ana-logues in lead optimization or library design, or foraddressing issues of off-target selectivity.

The second most significant area to be addressedis the routine incorporation of receptor flexibility. Al-though much insight has been gained by docking intorigid receptor structures, it is clear from even a cur-sory inspection of crystallographic complexes that themolecular recognition of a ligand by a protein is an ex-tremely subtle and intricate event. Although methodssuch as ensemble docking and flexible receptor dock-ing represent a major step forward toward a more re-alistic paradigm, they bring with them issues of howaccurately we can predict conformational flexibilitywithin the receptor and its contribution to the ligandbinding affinity.

Future progress in these areas is likely to behard won and incremental, rather than revolutionary.More than ever, a successful outcome relies on thecombination of state-of-the art methods in the handsof a modeler with a rich knowledge and experienceof drug design and an awareness of the limitationsof the available computational tools. With increasingaccess to high-performance computing, including gridand cloud resources, this may be an appropriate timeto devote more effort to improving the accuracy andreliability of protein–ligand docking.



REFERENCES

1. Andricopulo AD, Salum LB, Abraham DJ. Structure-based design strategies in medicinal chemistry. CurrTop Med Chem 2009, 9:771–790.

2. Paul N, Kellenberger E, Bret G, Muller P, RognanD. Recovering the true targets of specific ligands byvirtual screening of the protein data bank. Proteins2004, 54:671–680.

3. Cavasotto CN, Orry AJW. Ligand docking andstructure-based virtual screening in drug discovery.Curr Top Med Chem 2007, 7:1006–1014.

4. DesJarlais RL, Cummings MD, Gibbs AC. Virtualdocking: how are we doing and how can we improve?Frontiers Drug Des Discov 2007, 3:81–103.

5. Moitessier N, Englebienne P, Lee D, Lawandi J,Corbeil CR. Towards the development of universal,fast and highly accurate docking/scoring methods: along way to go. Br J Pharmacol 2008, 153:S7-S26.

6. Kontoyianni M, Madhav P, Suchanek E, Seibel W.Theoretical and practical considerations in virtualscreening: a beaten field? Curr Med Chem 2008,15:107–116.

7. Tuccinardi T. Docking-based virtual screening: recentdevelopments. Comb Chem High Throughput Screen2009, 12:303–314.

8. Coupez B, Lewis RA. Docking and scoring—theoretically easy, practically impossible? Curr MedChem 2006, 13:2995–3003.

9. Seifert MHJ, Lang M. Essential factors for successfulvirtual screening. Mini Rev Med Chem 2008, 7:63–72.

10. Leach AR, Shoichet BK, Peishoff CE. Prediction ofprotein–ligand interactions. Docking and scoring:successes and gaps. J Med Chem 2006, 49:5851–5855.

11. Jain AN. Scoring functions for protein–ligand dock-ing. Curr Protein Pept Sci 2006, 7:407–420.

12. Rajamani R, Good AC. Ranking poses in structure-based lead discovery and optimization: current trendsin scoring function development. Curr Opin DrugDiscov Dev 2007, 10:308–315.

13. Joseph-McCarthy D, Baber JC, Feyfant E, Thomp-son DC, Humblet C. Lead optimization via high-throughput molecular docking. Curr Opin Drug Dis-cov Dev 2007, 10:264–274.

14. Hawkins PCD, Skillman AG, Nicholls A. Comparisonof shape-matching and docking as virtual screeningtools. J Med Chem 2007, 50:74–82.

15. McGaughey GB, Sheridan RP, Bayly CI, CulbersonJC, Kreatsoulas C, et al. Comparison of topological,shape, and docking methods in virtual screening. JChem Inf Model 2007, 47:1504–1519.

16. Waszkowycz B. Towards improving compound selec-tion in structure-based virtual screening. Drug DiscovToday 2008, 13:219–226.

17. Friesner RA, Banks JL, Murphy RB, Halgren TA, Kli-cic JJ, et al. Glide: a new approach for rapid, accuratedocking and scoring. 1. Method and assessment ofdocking accuracy. J Med Chem 2004, 47:1739–1749.

18. Ewing TJA, Makino S, Skillman AG, Kuntz ID.DOCK 4.0: search strategies for automated molecu-lar docking of flexible molecule databases. J ComputAided Mol Des 2001, 15:411–428.

19. Moustakas DT, Lang PT, Pegg S, Pettersen E,Kuntz ID, et al. Development and validation of amodular, extensible docking program: DOCK 5. JComput Aided Mol Des 2006, 20:601–619.

20. Rarey M, Kramer B, Lengauer T, Klebe G. A fastflexible docking method using an incremental con-struction algorithm. J Mol Biol 1996, 261:470–489.

21. Kramer B, Rarey M, Lengauer T. Evaluation ofthe FlexX incremental construction algorithm forprotein–ligand docking. Proteins 1999, 37:228–241.

22. Jain AJ. Surflex: fully automatic flexible moleculardocking using a molecular similarity-based search en-gine. J Med Chem 2003, 13:499–511.

23. Jones G, Willett P, Glen RC. Molecular recognitionof a receptor sites using a genetic algorithm with adescription of desolvation. J Mol Biol 1995, 245:43–53.

24. Jones G, Willett P, Glen RC, Leach AR, Taylor R.Development and validation of a genetic algorithmfor flexible docking. J Mol Biol 1997, 267:727–748.

25. Osterberg F, Morris GM, Sanner MF, Olson AJ,Goodsell DS. Automated docking to multiple tar-get structures: incorporation of protein mobility andstructural water heterogeneity in AutoDock. Proteins2002, 46:34–40.

26. Morris GM, Huey R, Lindstrom W, Sanner MF,Belew RK, et al. AutoDock4 and AutoDockTools4:automated docking with selective receptor flexibility.J Comput Chem 2009, 30:2785–2791.

27. Abagyan R, Totrov M, Kuznetsov D. ICM—a newmethod for protein modeling and design: applicationsto docking and structure prediction from the distortednative conformation. J Comput Chem 1994, 15:488–506.

28. Venkatachalam CM, Jiang X, Oldfield T, WaldmanM. LigandFit: a novel method for the shape-directedrapid docking of ligands to protein active sites. J MolGraph Model 2003, 21:289–307.

29. Perola E. Minimizing false positives in kinase virtualscreens. Proteins 2006, 64:422–435.



30. FRED. Developed and marketed by Open-Eye Scientific Software. Available at: http://www.eyesopen.com/products/applications/fred.html.(Accessed on February 7, 2011).

31. Miller MD, Kearsley SK, Underwood DJ, SheridanRP. FLOG: a system to select quasi-flexible lig-ands complementary to a receptor of known three-dimensional structure. J Comput Aided Mol Des1994, 8:153–174.

32. Good AC, Cheney DL. Analysis and optimization ofstructure-based virtual screening protocols (1): explo-ration of ligand conformational sampling techniques.J Mol Graph Model 2003, 22:23–30.

33. Unity. Developed and marketed by Tripos. Availableat: http://tripos.com/index.php?family=modules,SimplePage,,,&page=UNITY. (Accessed on Febru-ary 7, 2011).

34. Catalyst. Developed and marketed by Accelrys.Available at: http://accelrys.com/products/datasheets/ds-pharmacophore-0308.pdf. (Accessed on February7, 2011).

35. Warren GL, Andrews CW, Capelli AM, Clarke B,LaLonde J, et al. A critical assessment of dockingprograms and scoring functions. J Med Chem 2006,49:5912–5931.

36. Yin S, Biedermannova L, Vondrasek J, DokholyanNV. MedusaScore: an accurate force field-based scor-ing function for virtual drug screening. J Chem InfModel 2008, 48:1656–1662.

37. Bohm HJ. The development of a simple empiricalscoring function to estimate the binding constant for aprotein–ligand complex of known three-dimensionalstructure. J Comput Aided Mol Des 1994, 8:243–256.

38. Eldridge MD, Murray CW, Auton TR, Paolini GV,Mee RP. Empirical scoring functions: I. The develop-ment of a fast empirical scoring function to estimatethe binding affinity of ligands in receptor complexes.J Comput Aided Mol Des 1997, 11:425–445.

39. Muegge I, Martin YC. A general and fast scoring func-tion for protein–ligand interactions: a simplified po-tential approach. J Med Chem 1999, 42:791–804.

40. Gohlke H, Hendlich M, Klebe G. Knowledge-basedscoring function to predict protein–ligand interac-tions. J Mol Biol 2000, 295:337–356.

41. Kolb P, Irwin JJ. Docking screens: right for the rightreasons? Curr Top Med Chem 2009, 9:755–770.

42. Friesner RA, Murphy RB, Repasky MP, Frye LL,Greenwood JR, et al. Extra precision Glide: dockingand scoring incorporating a model of hydrophobicenclosure for protein–ligand complexes. J Med Chem2006, 49:6177–6196.

43. Homans SW. Water, water everywhere—exceptwhere it matters? Drug Discov Today 2007, 12:534–539.

44. Voth AR, Ho PS. The role of halogen bonding ininhibitor recognition and binding by protein kinases.Curr Top Med Chem 2007, 7:1336–1348.

45. Pierce AC, Sandretto KL, Bemis GW. Kinase in-hibitors and the case for CH. . .O hydrogen bonds inprotein–ligand binding. Proteins 2002, 49:567–576.

46. Søndergaard CR, Garrett AE, Carstensen T, PollastriG, Nielsen JE. Structural artifacts in protein–ligandx-ray structures: implications for the developmentof docking scoring functions. J Med Chem 2009,52:5673–5684.

47. Pearlman DA, Rao BG, Charifson P. FURSMASA:a new approach to rapid scoring functions that usesa MD-averaged potential energy grid and a solvent-accessible surface area term with parameters GAfit to experimental data. Proteins 2008, 71:1519–1538.

48. Tarasov D, Tovbin D. How sophisticated should ascoring function be to ensure successful docking, scor-ing and virtual screening? J Mol Model 2009, 15:329–341.

49. Pham TA, Jain AN. Customizing scoring functionsfor docking. J Comput Aided Mol Des 2008, 22:269–286.

50. Seifert MHJ. Robust optimization of scoring func-tions for a target class. J Comput Aided Mol Des2009, 23:633–644.

51. Charifson PS, Corkery JJ, Murcko MA, Walters WP.Consensus scoring: a method for obtaining improvedhit rates from docking databases of three-dimensionalstructures into proteins. J Med Chem 1999, 42:5100–5109.

52. Cheng T, Li X, Li Y, Liu Z, Wang R. Comparativeassessment of scoring functions on a diverse test set.J Chem Inf Model 2009, 49:1079–1093.

53. O’Boyle NM, Liebeschuetz JW, Cole JC. Testing as-sumptions and hypotheses for rescoring success inprotein–ligand docking. J Chem Inf Model 2009,49:1871–1878.

54. Englebienne P, Moitessier N. Docking ligands intoflexible and solvated macromolecules. 4. Are popularscoring functions accurate for this class of proteins? JChem Inf Model 2009, 49:1568–1580.

55. Walters WP, Murcko MA. Prediction of ‘drug-likeness’. Adv Drug Deliv Rev 2002, 54:255–271.

56. Klebe G. Virtual ligand screening: strategies, per-spectives and limitations. Drug Discov Today 2006,11:580–594.

57. Sadowski J, Schwab CH, Gasteiger J. ComputationalMedicinal Chemistry for Drug Discovery. New York,NY: Marcel Dekker; 2004, 151–212.

58. Pearlman RS. Rapid generation of high quality ap-proximate 3D molecular structures. Chem Des AutoNews 1987; 2:1–7. Marketed by Tripos. Availableat: http://tripos.com/index.php?family=modules,SimplePage,sybyl concord.

59. Sadowski J, Gasteiger J. From atoms andbonds to three-dimensional atomic coordinates:automatic model builders. Chem Rev 1993;93:2567–2581. Marketed by Molecular Networks.



Available at: http://www.molecular-networks.com/products/corina.

60. Omega. Developed and marketed by OpenEye Sci-entific Software. Available at: http://www.eyesopen.com/products/applications/omega.html. (Accessed onFebruary 7, 2011).

61. LigPrep. Developed and marketed bySchrodinger. Available at: http://www.schrodinger.com/products/14/10/. (Accessed on February 7,2011).

62. Brooks WH, Daniel KG, Sung SS, Guida WC. Com-putational validation of the importance of abso-lute stereochemistry in virtual screening. J Chem InfModel 2008, 48:639–645.

63. Irwin JJ, Shoichet BK. ZINC—a free database of com-mercially available compounds for virtual screening.J Chem Inf Model 2005, 45:177–182.

64. Milletti F, Storchi L, Sforna G, Cross S, Cruciani G.Tautomer enumeration and stability prediction forvirtual screening on large chemical databases. J ChemInf Model 2009, 49:68–75.

65. Martin YC. Let’s not forget tautomers. J ComputAided Mol Des 2009, 23:693–704.

66. MoKa. Developed and marketed by Molecular Dis-covery. Available at: http://www.moldiscovery.com/soft moka.php. (Accessed on February 7, 2011).

67. Polgar T, Magyar C, Simon I, Keseru GM. Im-pact of ligand protonation on virtual screeningagainst β-secretase (BACE1). J Chem Inf Model 2007,47:2366–2373.

68. Foloppe N, Chen IJ. Conformational sampling andenergetics of drug-like molecules. Curr Med Chem2009, 16:3381–3413.

69. Knox AJS, Meegan MJ, Carta G, Lloyd DG. Con-siderations in compound database preparation—“hidden” impact on virtual screening results. J ChemInf Model 2005, 45:1908–1919.

70. ten Brink T, Exner TE. Influence of protonation, tau-tomeric, and stereoisomeric states on protein–liganddocking results. J Chem Inf Model 2009, 49:1535–1546.

71. Kalliokoski T, Salo HS, Lahtela-Kakkonen M, PosoA. The effect of ligand-based tautomer and pro-tomer prediction on structure-based virtual screening.J Chem Inf Model 2009, 49:2742–2749.

72. Feher M, Williams CI. Effect of input differences onthe results of docking calculations. J Chem Inf Model2009, 49:1704–1714.

73. Bologa CG, Olah MM, Oprea TI. Chemical databasepreparation for compound acquisition or virtualscreening. Methods Mol Biol 2005, 316:375–388.

74. Cummings MD, Gibbs AC, DesJarlais RL. Processingof small molecule databases for automated docking.Med Chem 2007, 3:107–113.

75. Davis AM, Teague SJ, Kleywegt GJ. Applicationand limitations of X-ray crystallographic data in

structure-based ligand and drug design. Angew ChemInt Ed Engl 2003, 42:2718–2736.

76. Davis AM, St-Gallay SA, Kleywegt GJ. Limitationsand lessons in the use of X-ray structural informationin drug design. Drug Discov Today 2008, 13:831–841.

77. Hooft RWW, Vriend G, Sander C, Abola EE. Errorsin protein structures. Nature 1996, 381:272.

78. Protein Preparation Wizard developed bySchrodinger. Available at: http://www.schrodinger.com/products/14/16/. (Accessed on February 7,2011).

79. Mobley DL, Dill KA. Binding of small-molecule lig-ands to proteins: “what you see” is not always “whatyou get”. Structure 2009, 17:489–498.

80. Totrov M, Abagyan R. Flexible ligands docking tomultiple receptor conformations: a practical alterna-tive. Curr Opin Struct Biol 2008, 18:178–184.

81. Cavasotto C, Orry AJW, Abagyan RA. The challengeof considering receptor flexibility in ligand dockingand virtual screening. Curr Comput Aided Drug Des2005, 1:423–440.

82. Henzler AM, Rarey M. In pursuit of fully flexibleprotein–ligand docking: modeling the bilateral mech-anism of binding. Mol Inform 2010, 29:164–173.

83. Jiang F, Kim SH. ‘Soft-docking’: matching of molec-ular surface cubes. J Mol Biol 1991, 219:79–102.

84. Barril X, Fradera X. Incorporating protein flexibilityinto docking and structure-based drug design. ExpertOpin Drug Discov 2006, 1:335–349.

85. Kokh DB, Wenzel W. Flexible side chain models im-prove enrichment rates in in silico screening. J MedChem 2008, 51:5919–5931.

86. Leach AR. Ligand docking to proteins with discreteside chain flexibility. J Mol Biol 1994, 235:345–356.

87. Available at: http://www.ccdc.cam.ac.uk/products/life sciences/gold/. (Accessed on February 7, 2011).

88. Zavodszky MI, Kuhn LA. Side-chain flexibility inprotein–ligand binding: the minimal rotation hypoth-esis. Protein Sci 2005, 14:1104–1114.

89. Frimurer TM, Peters GH, Iversen LF, Andersen HS,Møller NP, et al. Ligand-induced conformationalchanges: improved predictions of ligand binding con-formations and affinities. Biophys J 2003, 84:2273–2281.

90. Cavasotto CN, Orry AJ, Abagyan RA. Structure-based identification of binding sites, native ligandsand potential inhibitors for G-protein coupled recep-tors. Proteins 2003, 51:423–433.

91. Alberts IL, Todorov NP, Dean PM. Receptor flexi-bility in de novo ligand design and docking. J MedChem 2005, 48:6585–6596.

92. Bostrom J, Hogner A, Schmitt S. Do structurally sim-ilar ligands bind in a similar fashion? J Med Chem2006, 49:6716–6725.



93. Claussen H, Buning C, Rarey M, Lengauer T. FlexE:efficient molecular docking considering protein struc-ture variations. J Mol Biol 2001, 308:377–395.

94. Korb O, Bowden S, Olsson T, Frenkel D, LiebeschuetzJ, Cole J. Ensemble docking revisited. J Cheminform2010, 2:25.

95. Cavasotto CN, Abagyan RA. Protein flexibility in lig-and docking and virtual screening to protein kinases.J Mol Biol 2004, 337:209–225.

96. Sherman W, Day T, Jacobson MP, Friesner RA, FaridR. Novel procedure for modelling ligand receptor in-duced fit effects. J Med Chem 2006, 49:534–553.

97. Bottegoni G, Kufareva I, Totrov M, Abagyan R. Anew method for ligand docking to flexible receptorsby dual alanine scanning and refinement (SCARE). JComput Aided Mol Des 2008, 22:311–325.

98. Wang H, Aslanian R, Madison VS. Induced-fit dock-ing of mometasone furoate and further evidence forglucocorticoid receptor 17α pocket flexibility. J MolGraph Model 2008, 27:512–521.

99. Salam NK, Huang TH, Kota BP, Kim MS, Li Y, et al.Novel PPAR-gamma agonists identified from a nat-ural product library: a virtual screening, induced-fitdocking and biological assay study. Chem Biol DrugDes 2008, 71:57–70.

100. Cho Y, Yao K, Pugliese A, Malakhova ML, BodeAM, et al. A regulatory mechanism for RSK2 NH2-terminal kinase activity. Cancer Res 2009, 69:4398–4406.

101. Shim J, Choi HS, Pugliese A, Lee S, Chae J, et al.(-)-Epigallocatechin gallate regulates CD3-mediatedT-cell receptor signaling in leukaemia through theinhibition of ZAP-70 kinase. J Biol Chem 2008,283:28370–28379.

102. Gadakar PK, Phukan S, Dattatreya P, Balaji VN. Poseprediction accuracy in docking studies and enrich-ment of actives in the active site of GSK-3β. J ChemInf Model 2007, 47:1446—1459.

103. Krystek SR, Kimura SR, Tebben AJ. Modeling andactive site refinement for G protein-coupled receptors:application to the α2 adrenergic receptor. J ComputAided Mol Des 2006, 20:463–470.

104. Maeda K, Das D, Ogata-Aoki H, Nakata H,Miyakawa T, et al. Structural and molecular inter-actions of CCR5 inhibitors with CCR5. J Biol Chem2006, 281:12688–12698.

105. Farid R, Day T, Friesner RA, Pearlstein RA. Newinsights about HERG blockade obtained from pro-tein modelling, potential energy mapping, and dock-ing studies. Bioorg Med Chem 2006, 14:3160–3173.

106. Jovanovic T, Farid R, Friesner RA, McDermott AE.Thermal equilibrium of high- and low-spin formsof cytochrome P450-BM3: repositioning of the sub-strate? J Am Chem Soc 2005, 127:13548–13552.

107. Cozzini P, Kellogg GE, Spyrakis F, Abraham DJ,Costantino G, et al. Target flexibility: an emergingconsideration in drug discovery and design. J MedChem 2008, 51:6237–6255.

108. Ferrari AM, Wei BQ, Costantino L, Shoichet BK. Softdocking and multiple receptor conformations in vir-tual screening. J Med Chem 2004, 47:5076–5084.

109. Barril X, Morley SD. Unveiling the full potentialof flexible receptor docking using multiple crystallo-graphic structures. J Med Chem 2005, 48:4432–4443.

110. Rueda M, Bottegoni G, Abagyan R. Recipes for theselection of experimental protein conformations forvirtual screening. J Chem Inf Model 2010, 50:186–193.

111. Furbham N, Blundell TL, DePristo MA, TerwilligerTC. Is one solution good enough? Nat Struct Mol Biol2006, 13:184–185.

112. Lam PYS, Jadhav PK, Eyermann CJ, Hodge CN, RuY, et al. Rational design of potent, bioavailable non-peptide cyclic ureas as HIV protease inhibitors. Sci-ence 1994, 263:380–384.

113. Chen JM, Xu SL, Wawrzak Z, Basarab GS, JordanDB. Structure-based design of potent inhibitors of scy-talone dehydratase: displacement of a water moleculefrom the active site. Biochemistry 1998, 37:17735–17744.

114. Liu C, Wrobleski ST, Lin J, Ahmed G, Metzger A,et al. 5-cyanopyrimidine derivatives as a novel class ofpotent, selective and orally active inhibitors of p38α

MAP kinase. J Med Chem 2005, 48:6261–6270.115. Lu Y, Wang R, Yang CY, Wang S. Analysis of

ligand-bound water molecules in high-resolution crys-tal structures of protein–ligand complexes. J Chem InfModel 2007, 47:668–675.

116. Rarey M, Kramer B, Lengauer T. The particle con-cept: placing discrete water molecules during protein–ligand docking predictions. Proteins 1999, 34:17–28.

117. Verdonk ML, Chessari G, Cole JC, Hartshorn MJ,Murray CW, et al. Modeling water molecules inprotein–ligand docking using GOLD. J Med Chem2005, 48:6504–6515.

118. Corbeil CR, Englebienne P, Moitessier N. Dockinginto flexible and solvated macromolecules. 1. Devel-opment and validation of FITTED 1.0. J Chem InfModel 2007, 47:435–449.

119. Roberts BC, Mancera RL. Ligand–protein dockingwith water molecules. J Chem Inf Model 2008,48:397–408.

120. Huang N, Shoichet BK. Exploiting ordered watersin molecular docking. J Med Chem 2008, 51:4862–4865.

121. Huang SY, Zou X. Inclusion of solvation and entropyin the knowledge-based scoring function for protein–ligand interactions. J Chem Inf Model 2010, 50:262–273.



122. Raymer ML, Sanschagrin PC, Punch WF, Venkatara-man S, Goodman ED, et al. Predicting conservedwater-mediated and polar ligand interactions in pro-teins using a K-nearest-neighbors genetic algorithm. JMol Biol 1997, 265:445–464.

123. Garcıa-Sosa AT, Mancera RL, Dean PM. WaterScore:a novel method for distinguishing between bound anddisplaceable water molecules in the crystal structureof the binding site of protein–ligand complexes. J MolModel 2003, 9:172–182.

124. Amadasi A, Surface JA, Spyrakis F, Cozzini P, Moz-zarelli A, et al. Robust classification of “relevant”water molecules in putative protein binding sites. JMed Chem 2008, 51:1063–1067.

125. Beuming T, Farid R, Sherman W. High-energy wa-ter sites determine peptide binding affinity and speci-ficity of PDZ domains. Protein Sci 2009, 18:1609–1619.

126. Santos R, Hritz J, Oostenbrink C. Role of water inmolecular docking simulations of cytochrome P4502D6. J Chem Inf Model 2010, 50:146–154.

127. Huang N, Jacobson MP. Physics-based methods forstudying protein–ligand interactions. Curr Opin DrugDiscov Devel 2007, 10:325–331.

128. Brown SP, Muchmore SW. Rapid estimation of rel-ative protein–ligand binding affinities using a high-throughput version of MM-PBSA. J Chem Inf Model2007, 47:1493–1503.

129. Brown SP, Muchmore SW. Large-scale application ofhigh-throughput molecular mechanics with Poisson–Boltzmann surface area for routine physics-basedscoring of protein–ligand complexes. J Med Chem2009, 52:3159–3165.

130. Thompson DC, Humblet C, Joseph-McCarthy D. In-vestigation of MM-PBSA rescoring of docking poses.J Chem Inf Model 2008, 48:1081–1091.

131. Huang N, Kalyanaraman C, Irwin JJ, Jacobson MP.Physics-based scoring of protein–ligand complexes:enrichment of known inhibitors in large-scale virtualscreening. J Chem Inf Model 2006, 46:243–253.

132. Lyne PD, Lamb ML, Saeh JC. Accurate prediction ofthe relative potencies of members of a series of kinaseinhibitors using molecular docking and MM-GBSAscoring. J Med Chem 2006, 49:4805–4808.

133. Guimaraes CRW, Cardozo M. MM-GB/SA rescoringof docking poses in structure-based lead optimization.J Chem Inf Model 2008, 48:958–970.

134. Deng Z, Chuaqui C, Singh J. Structural interactionfingerprint (SIFt): a novel method for analyzing three-dimensional protein–ligand binding interactions. JMed Chem 2004, 47:337–344.

135. Brewerton SC. The use of protein–ligand interactionfingerprints in docking. Curr Opin Drug Discov De-vel 2008, 11:356–364.

136. Nandigam RK, Kim S, Singh J, Chuaqui C. Positionspecific interaction dependent scoring technique for

virtual screening based on weighted protein–ligand in-teraction fingerprint profiles. J Chem Inf Model 2009,49:1185–1192.

137. Perez-Nueno VI, Rabal O, Borrell JI, Teixido J. APIF:a new interaction fingerprint based on atom pairs andits application to virtual screening. J Chem Inf Model2009, 49:1245–1260.

138. Melville JL, Burke EK, Hirst JD. Machine learningin virtual screening. Comb Chem High ThroughputScreen 2009, 12:332–343.

139. Springer C, Adalsteinsson H, Young MM,Kegelmeyer PW, Roe DC. PostDOCK: a struc-tural, empirical approach to scoring protein ligandcomplexes. J Med Chem 2005, 48:6821–6831.

140. Klon AE, Glick M, Thoma M, Acklin P, Davies JW.Finding more needles in the haystack: a simple and ef-ficient method for improving high-throughput dock-ing results. J Med Chem 2004, 47:2743–2749.

141. Klon AE, Glick M, Davies JW. Combination of anaive Bayes classifier with consensus scoring improvesenrichment of high-throughput docking results. J MedChem 2004, 47:4356–4359.

142. Klon AE, Glick M, Davies JW. Application ofmachine learning to improve the results of high-throughput docking against the HIV-1 protease. JChem Inf Comput Sci 2004, 44:2216–2224.

143. Klon AE. Bayesian modeling in virtual high through-put screening. Comb Chem High Throughput Screen2009, 12:469–483.

144. Reynolds CH, Tounge BA, Bembenek SD. Ligandbinding efficiency: trends, physical basis, and impli-cations. J Med Chem 2008, 51:2432–2438.

145. Jacobsson M, Karlen A. Ligand bias of scoring func-tions in structure-based virtual screening. J Chem InfModel 2006, 46:1334–1343.

146. Cole JC, Murray CW, Nissink JWM, Taylor RD, Tay-lor R. Comparing protein–ligand docking programsis difficult. Proteins 2005, 60:325–332.

147. Jain AN, Nicholls A. Recommendations for evalua-tion of computational methods. J Comput Aided MolDes 2008, 22:133–139.

148. Hawkins PCD, Warren GL, Skillman AG, Nicholls A.How to do an evaluation: pitfalls and traps. J ComputAided Mol Des 2008, 22:179–190.

149. Kirchmair J, Markt P, Distinto S, Wolber G, Langer T.Evaluation of the performance of 3D virtual screeningprotocols: RMSD comparisons, enrichment assess-ments, and decoy selection – What can we learn fromearlier mistakes? J Comput Aided Mol Des 2008,22:213–228.

150. Andersson CD, Thysell E, Lindstrom A, Bylesjo M,Raubacher F, et al. A multivariate approach to inves-tigate docking parameters’ effects on docking perfor-mance. J Chem Inf Model 2007, 47:1673–1687.

151. Nicholls A. What do we know and when do we knowit? J Comput Aided Mol Des 2008, 22:239–255.



152. Hartshorn MJ, Verdonk ML, Chessari G, BrewertonSC, Mooij WTM, et al. Diverse, high-quality test setfor the validation of protein–ligand docking perfor-mance. J Med Chem 2007, 50:726–741.

153. Verdonk ML, Mortensen PN, Hall RJ, Hartshorn MJ,Murray CW. Protein–ligand docking against non-native protein conformers. J Chem Inf Model 2008,48:2214–2225.

154. Huang N, Shoichet BK, Irwin JJ. Benchmarking setsfor molecular docking. J Med Chem 2006, 49:6789–6801.

155. Irwin JJ. Community benchmarks for virtual screen-ing. J Comput Aided Mol Des 2008, 22:193–199.

156. SAMPL: Statistical Assessment of the Modeling ofProteins and ligands. http://sampl.eyesopen.com. (Ac-cessed January 28, 2011).

157. Baber JC, Thompson DC, Cross JB, HumbletC. GARD: a generally applicable replacement forRMSD. J Chem Inf Model 2009, 49:1889–1900.

158. Clark RD, Webster-Clark DJ. Managing bias in ROCcurves. J Comput Aided Mol Des 2008, 22:141–146.

159. Truchon JF, Bayly CI. Evaluating virtual screeningmethods: good and bad metrics for the “early recog-nition” problem. J Chem Inf Model 2007, 47:488–508.

160. Villoutreix BO, Eudes R, Miteva MA. Structure-basedvirtual ligand screening: recent success stories. CombChem High Throughput Screen 2009, 12:1000–1016.

161. Pierce AC, Jacobs M, Stuver-Moody C. Dockingstudy yields four novel inhibitors of the protoonco-

gene Pim-1 kinase. J Med Chem 2008, 51:1972–1975.

162. Kraemer O, Hazemann I, Podjarny AD, Klebe G. Vir-tual screening for inhibitors of human aldose reduc-tase. Proteins 2004, 55:814–823.

163. Eisenmann M, Steuber H, Zentgraf M, Altenkaem-per M, Ortmann R, et al. Structure-based opti-mization of aldose reductase inhibitors originatingfrom virtual screening. ChemMedChem 2009, 4:809–819.

164. Nichols SE, Domaoal RA, Thakur VV, Tirado-RivesJ, Anderson KS, et al. Discovery of wild-type andY181C mutant non-nucleoside HIV-1 reverse tran-scriptase inhibitors using virtual screening with mul-tiple protein structures. J Chem Inf Model 2009,49:1272–1279.

165. Topiol S, Sabio M. X-ray structure breakthroughs inthe GPCR transmembrane region. Biochem Pharma-col 2009, 78:11–20.

166. Topiol S, Sabio M. Use of the X-ray structure of thebeta2-adrenergic receptor for drug discovery. BioorgMed Chem Lett 2008, 18:1598–1602.

167. Sabio M, Jones K, Topiol S. Use of the X-ray structureof the beta2-adrenergic receptor for drug discovery.Part 2: Identification of active compounds. BioorgMed Chem Lett 2008, 18:5391–5395.

168. Kolb P, Rosenbaum DM, Irwin JJ, Fung JJ, Ko-bilka BK, et al. Structure-based discovery of beta2-adrenergic receptor ligands. Proc Natl Acad Sci U SA 2009, 106:6843–6848.

FURTHER READING

Special issue of Journal of Computer Aided Molecular Design on validation of virtual screening methods. 2008, Vol 22.

Bissantz C, Kuhn B, Stahl M. A medicinal chemist’s guide to molecular interactions. J Med Chem 2010, 53:5061–5084.

Schneider G. Virtual screening: an endless staircase? Nat Rev Drug Discov 2010, 9:273–276.

Merz KM Jr. Limits of free energy computation for protein–ligand interactions. J Chem Theory Comput 2010, 6:1769–1776.

Alvarez J, Shoichet B. Virtual Screening in Drug Discovery. Boca Raton: CRC Press; 2005.

Stroud R, Finer-Moore J. Computational and Structural Approaches to Drug Discovery: Ligand–Protein Interactions.Cambridge: RSC Publishing; 2007.


outstanding challenges in protein-ligand docking and structure-based virtual screening

Documents

ome n n

comwcms h n

single protein

oiq yellow n ome o n

silico docking methods

modern docking programs

highthroughput docking

potential protein targets