assessing energy functions for flexible docking

11

Click here to load reader

Upload: charles-l

Post on 06-Jun-2016

221 views

Category:

Documents


7 download

TRANSCRIPT

Page 1: Assessing energy functions for flexible docking

— —< <

Assessing Energy Functions forFlexible Docking

MICHAL VIETH,* JONATHAN D. HIRST, ANDRZEJ KOLINSKI,†

CHARLES L. BROOKS III( )Department of Molecular Biology TPC-6 , The Scripps Research Institute, 10550 North Torrey Pines

Road, La Jolla, California 92037

Received 17 February 1998; accepted 16 June 1998

ABSTRACT: A good docking algorithm requires an energy function that isselective, in that it clearly differentiates correctly docked structures frommisdocked ones, and that is efficient, meaning that a correctly docked structurecan be identified quickly. We assess the selectivity and efficiency of a broadspectrum of energy functions, derived from systematic modifications of theCHARMM param19rtoph19 energy function. In particular, we examine theeffects of the dielectric constant, the solvation model, the scaling of surfacecharges, reduction of van der Waals repulsion, and nonbonded cutoffs. Based onan assessment of the energy functions for the docking of five differentligand]receptor complexes, we find that selective energy functions include avariety of distance-dependent dielectric models together with truncation of the

˚nonbonded interactions at 8 A. We evaluate the docking efficiency, the meannumber of docked structures per unit of time, of the more selective energyfunctions, using a simulated annealing molecular dynamics protocol. The largestimprovements in efficiency come from a reduction of van der Waals repulsionand a reduction of surface charges. We note that the most selective potential isquite inefficient, although a hierarchical approach can be employed to take

*Present address: Computational Chemistry and MolecularStructure Research, Lilly Research Laboratories, Lilly CorporateCenter, Indianapolis, IN, USA

†Also: Department of Chemistry, University of Warsaw,Warsaw, Poland

Correspondence to: C. L. Brooks III; e-mail: [email protected]

Contractrgrant sponsor: NIH; contractrgrant numbers:GM37554, RR12255

( )Journal of Computational Chemistry, Vol. 19, No. 14, 1612]1622 1998Q 1998 John Wiley & Sons, Inc. CCC 0192-8651 / 98 / 141612-11

Page 2: Assessing energy functions for flexible docking

ASSESSING ENERGY FUNCTIONS

advantage of both selective and efficient energy functions. Q 1998 John Wiley& Sons, Inc. J Comput Chem 19: 1612]1622, 1998

Keywords: docking; energy functions; simulated annealing; moleculardynamics; scoring functions

Introduction

n structure-based drug design one would likeI to be able to predict the orientation of a drugcandidate in the active site as well as estimate itsbinding affinity. The former is a problem of molec-ular recognition—a field of great importance inmolecular biology. Molecular docking is a compu-tational approach to molecular recognition and akey technique in computer-aided rational drug de-sign.1, 2 The aim of docking is to predict the struc-ture of a ligand]receptor complex given the struc-tures of the free ligand and free receptor.3 The firstligand docking program, DOCK,4 employed shapeinformation and a hydrogen bonding term. Sincethen, docking techniques have been developedwith a variety of energy functions and differentsearch strategies.3, 5, 6 The number of dockingstrategies continues to rise with increasing interestin structure-based drug design. Simulated anneal-

7 Ž .ing Monte Carlo, molecular dynamics MD and,recently, genetic algorithms5, 9 ] 11 have been em-ployed. In the studies we report here, we intro-duce and apply metrics for systematically assess-ing and developing docking algorithms. In thisarticle, we systematically study the energy func-tion used in docking and, in the accompanyingarticle, we compare the search strategies for mol-ecular docking. Preliminary results of these stud-ies have been summarized elsewhere.12 Themacromolecular modeling software, CHARMM,provides a convenient means of introducing modi-fications to the energy functions, and we haveimplemented several docking protocols within theCHARMM environment as a step toward the de-velopment of an integrated toolbox for structure-based drug design.

A successful docking methodology requires anenergy function capable of selecting the correctligand]receptor structure; that is, one consistentwith the crystallographic structure of the complex.A good energy function should also be efficient,meaning that the landscape on which the ligandmoves should be relatively smooth with an ab-sence of large energy barriers separating different

structures, to allow the desired minimum to belocated reasonably rapidly. Both components, se-lectivity and efficiency, are of importance and mustcoexist in a successful docking algorithm. For agiven energy function, the efficiency of the dock-ing process will also depend on the search algo-rithm. In this study, we concentrate on the energyfunction and aspects of selectivity and efficiency.We describe the selection of the search algorithmin the accompanying work.13 The influence of theenergy function on the docking efficiency, as mea-sured by the computer time required to find thecorrect structure of the complex, is investigated bymeans of an MD simulated annealing protocol.The energy functions evaluated in this work arederived from the CHARMM param19 parameterset,14 with some modifications that are introducedto achieve good selectivity and high docking effi-ciency. Here the docking efficiency is discussedonly in the context of the MD methodology.

In the majority of docking approaches, the loca-tion of the active site is assumed to be known andthe receptor is kept rigid.3, 7 ] 11, 15, 16 Many teststudies on docking published in the last 15 yearshave used the rigid receptor structure taken fromthe exact complex that was to be predicted, eventhough, in many cases, the receptor active site

Ždiffered substantially from its free form e.g., HIV17 18 .protease and streptavidin to name just two .

The idea behind this approach, which we alsoadopt in our study, is to test the strengths andweaknesses of various docking algorithms in the‘‘best case scenario,’’ which assumes that the freeand loaded forms of the receptors are identical. Inonly a few cases19 has the structure of the nativeunbound form of the receptors been employed.In practical applications, the receptor structure istaken from either an experimentally determinedstructure of the free form or from a structure of acomplex with another ligand.

The organization of this study is as follows.First, we introduce the five ligand]receptor com-plexes that form the basis of the study. Generalfeatures of the energy functions, together with theparameterization of the ligands, are presented. Inthe next section, we introduce our modifications to

JOURNAL OF COMPUTATIONAL CHEMISTRY 1613

Page 3: Assessing energy functions for flexible docking

VIETH ET AL.

the CHARMM force field and the metrics used toassess the selectivity and efficiency of the energyfunctions. The modifications that result in selectiveand efficient energy functions are presented. Weconclude with the implications of our findings forthe development of fast and effective dockingstrategies.

Method

The selectivity and computational efficiency ofenergy functions have been evaluated with respectto the flexible docking of five diverse ligand]receptor complexes. The complexes include asmall, rigid ligand in an open active site,benzamidinertrypsin20 ; flexible ligands in openactive sites, phosphocholinerFAB McPC-60321 andsialic acidrhemagglutinin; and flexible ligandsin relatively inaccessible active sites, glycerol 3-phosphatertriose phosphate isomerase,23 and bi-otinrstreptavidin.18 The ligands are shown in Fig-ure 1. No crystallographic water molecules wereretained, which may have some consequences forthe complexes in which water molecules mediatethe ligand]receptor interaction. We believe thatthe five complexes are sufficiently diverse for ourconclusions about the selectivity and efficiency ofenergy functions to be of general applicability. Forthe complexes phosphocholinerFAB, benzami-dinertrypsin, glycerol 3-phosphatertim, and sialicacidrhemagglutinin, the active site conformation

˚is similar, within 0.4-A root-mean-square deviationŽ .RMSD on heavy atoms, to the unbound form ofthe receptor. For biotinrstreptavidin, the changesin the active site are more substantial. Because thisarticle focuses on the assessment of energy func-tions for docking, and not on the prediction ofbinding modes, we used the receptor structurespresent in the complexes.

The receptors were described using theCHARMM param19rtoph19 parameter set.14 Forthe ligands, charges were generated by the tem-plate method in QUANTA,24 with smoothing ofcharges over all atoms. The charges on the car-boxylic group of sialic acid were taken from aspar-tic acid. The bond, angle, and improper dihedralangle parameters were also generated by QUANTA,

Ž .as were the nonbonded van der Waals vdWparameters. The ligand charges and parameters forthe angles, bonds, dihedrals, and nonbonded inter-actions are accessible through anonymous ftp.25

For the nonbonded interactions, switching func-

FIGURE 1. The five ligands used this study. Arrows( )show rotatable bonds: a biotin / streptavidin;

( ) ( )b phosphocholine / FAB McPC-603; c sialic acid /(hemagglutinin for hemagglutinin only residues 61 ]269

were used, because they account for the neighborhood) ( )of the active site ; d benzamidine / trypsin; and

( )e glycerol 3-phosphate / triose phosphate isomerase.

tions14 for the vdW and electrostatic interactionswere used.

To prevent ligands escaping from the neighbor-hood of active sites, a harmonic restraining poten-tial was applied. It acted only when the distance,d, between the ligand atom closest to the center ofmass of the ligand and the receptor atom closest tothis atom in the crystal structure was further than

˚11 A. Thus, the center of the search space is about˚ Ž3.5 A the typical distance between the ligand

.atom and closest receptor atom from the center ofthe active site. The restraining potential, R, has thefollowing form:

˚0 ; d - 11 A Ž .R s 12½ ˚Ž .k d y 11 ; d ) 11 A

where k is the force constant set to 50 kcalr˚2mol A . The restraining potential is implemented

through the NOE module in CHARMM. The re-straining potential effectively confines ligands to

VOL. 19, NO. 141614

Page 4: Assessing energy functions for flexible docking

ASSESSING ENERGY FUNCTIONS

the neighborhood of the active site. The volumeaccessible to the center of mass of a ligand is

˚3roughly 5500 A . The search space used in thisrealization is much larger than in many otherdocking protocols that employ a search space for

˚3 3, 15the center of mass on the order of 8 A .In the following subsections, we present the

various modifications of the CHARMM forcefield14 that will be evaluated. The energy functionsthat we consider in this article have the form,Ž .E solv, « , SCR, vdWR, cutoffnb , where solv refers

to solvation model, « refers to four different di-Želectric constants, SCR refers to the reduction or

.not of surface charges, vdWR refers to soft versushard core nonbonded interactions, and cutoffnbŽ .r refers to three different values of the non-offbonded interaction truncation. Different combina-tions of these modifications give rise to a total of144 distinct energy functions. We assess the selec-tivity of each one. Subsequently, the most promis-ing selective energy functions are tested for theirefficiency.

NONBONDED CUTOFFS AND DIELECTRICCONSTANT

We assessed energy functions with values of 8˚ ˚ ˚A, 9 A, and 10 A for the truncation of the non-

Ž .bonded interactions r . The values for the non-off˚ ˚ ˚bonded list were 9 A, 10 A, and 11 A, respectively.

A short nonbonded list decreases the number ofnonbonded interactions to compute and hence thecomputational cost of the docking. In addition,short values for the nonbonded interaction trunca-tion should favor tightly packed structures andhighly localized electrostatic interactions in theactive site over more loosely packed structureswith screened electrostatic interactions.

Both distance-dependent and constant di-electrics were tested. The dielectric constant wasalso treated as a parameter and values between 1and 4 were explored. Low dielectric constantsshould favor strong electrostatic interactions, butshifting the balance of electrostatic versus vdWinteractions may effectively decrease the impor-tance of tight packing in active sites. In addition, alow dielectric constant may also generate kinetictraps and not discriminate well between active sitecavities and binding sites on the protein surfaces.

SURFACE CHARGE REDUCTION

One of the problems that can arise in the dock-ing of a ligand to a rigid protein is preferential

binding of the ligand at surface sites instead ofactive site cavities. Another issue that needs to beaccounted for is the flexibility of the surface sidechains and resulting surface charge delocalizationdue to this flexibility. As a means of avoidingthese possible problems, we test a model in whichthe surface side chain charges are reduced:

spr o t ei n Ž .C s 1 y C 2new ž /st r i p e p t i de

where C is the reduced charge of the side chainn ewatom, s is the surface exposure of side chainpr o t ei nX, s is the surface exposure of this sidet r i p e p t i d echain in a Gly—X—Gly trans tripeptide and C isthe original charge of the side chain atom. Thismodification reduces the surface side chaincharges, but does not modify the charges of buriedside chains, thereby providing a ‘‘smeared-out’’representation of the surface charges accountingfor both aspects of side chain mobility and solventscreening. The largest observed reduction of a sidechain charge was 50%. This modification of theenergy function is implemented through theSCALAR module in CHARMM.

REDUCTION vdW REPULSION

The next modification involves the introductionof a vdW potential with a soft core repulsion. Thefollowing function was employed in the dockingsimulations:

¡ 12 A 6Bi j i j Ž .y fs y rÝ i j i j13 7ž /Ž . Ž .fs fsi/j i j i j

Ž .qE fs ; r - fsv dW i j i j i j~ Ž .E s 3v dWA Bi j i j 2 2 2y sw r , r , rŽ .Ý i j on o f f12 6ž /r ri j i ji/j¢ ; r G fsi j i j

where E is the vdW energy; A and B arev dW i j i jnonbonded parameters, and sw is a switchingfunction14 ; s is the sum of the vdW radii fori jatoms i and j; r is the distance between atoms ii j

˚Žand j, and r is the inner switching distance 6 Aon.in our case . The parameter f denotes the fraction

Žof s at which the soft core potential starts f si j.0.885 was used in our applications . The essence of

this modification is that, for distances shorter than

JOURNAL OF COMPUTATIONAL CHEMISTRY 1615

Page 5: Assessing energy functions for flexible docking

VIETH ET AL.

fs , the force acting on atoms i and j is the samei jas at fs , and the energy is a linearly decreasingi jfunction of r . This modification makes the vdWi j

Žrepulsions very small the maximum observedvalue is on the order of hundreds of kilocalories

.per mole and the resulting conformational transi-tion barriers much smaller than with the unmodi-fied vdW potential. An analogous modification ismade to the electrostatic potential and forces.However, its influence on the electrostatics shouldbe negligible relative to the effect on the vdWinteractions. Thus, most of our discussion of thesoft core nonbonded terms will concern the modi-fication of the vdW potential and forces. The softcore modification is similar in principle to ap-proaches in other docking programs,26, 27 and al-lows a ligand to penetrate the interior of the pro-tein with a relatively small energetic penalty. Inother words, the local energy barriers are verysmall. As a result, MD sampling is no longerdisadvantaged with respect to Monte Carlo sam-pling and can be used as an effective search strat-egy. The only negative effect of this modification isthat the tight packing of ligands in the active site isless favored compared with a regular vdW poten-tial. However, we note the soft core potential canbe turned on or off during different stages of thedocking protocol. This modification is similar inprinciple to the soft core potential used in the earlystages of NMR refinement protocols28 and in someversions of simulated annealing. The major influ-ence of a soft core potential is expected to be onthe efficiency of the docking process, without de-grading the selectivity of the energy function.30

This modification of the energy function will beavailable in a future CHARMM release.

CONTINUUM SOLVATION CONTRIBUTION

The final modification of the force field involvesthe representation of the solvation contribution.This contribution is based on an approximation to

Ž .the finite difference Poisson]Boltzmann PB con-tinuum solvent model.31 Because we are using afixed receptor model, we propose that the receptoris the only low dielectric medium that needs to beconsidered and the additional contribution of aligand to the polarization and the potential is neg-ligible. To examine this, we generated a number ofligand conformations of glycerol 3-phosphatearound the active site of triose phosphate iso-merase. For each position of the ligand, the ‘‘true’’solvation energy, E , is computed in the follow-so l v

ing way:

N

Ž . Ž .E s w y w q 4Ýso l v j , 80, i nt j , i nt , i nt jjs1

where N is the number of ligand atoms; q is thejatomic charge; w refers to the electrostaticj, 80, i ntpotential generated by the complex with an inte-

Žrior dielectric, int s 1, 2, 3, or 4 depending on the.value of dielectric constant chosen in an exterior

dielectric of 80; w refers to the potentialj, i nt, i ntgenerated by the complex in the reference statewhere exterior dielectric is the same as the interiordielectric. A similar calculation is performed usingthe approximate method, where the electrostaticpotentials are generated from the receptor chargedistribution only; that is, the receptor is the onlylow dielectric medium and the ligand is absent. A

˚grid size of 0.7 A was used in the calculations. Thetwo solvation energies are compared in Figure 2for ten different positions of glycerol 3-phosphatein the triose phosphate isomerase binding sites.

˚These ten positions have a mean RMSD of 9 Afrom one another. The correlation coefficient isr 2 s 0.92. It varies from 0.92 to 0.97 for the five

FIGURE 2. Correlation between finite difference PBsolvation energies of ligands and approximate solvationenergies.

VOL. 19, NO. 141616

Page 6: Assessing energy functions for flexible docking

ASSESSING ENERGY FUNCTIONS

complexes tested. The high correlation allows us touse a static solvation potential around the proteinand a fast computation of the approximate electro-static contribution to the solvation energies forligands in the different positions around the pro-tein. The finite difference PB equation31 has to besolved only twice to provide the potential on thegrid. The forces are computed as the electrostaticpotential gradients on the grid using a centraldifference derivative approximation. We proposethis simplified treatment of the system’s electro-statics as an approximation of the solvation effects.

ASSESSMENT OF SELECTIVITY

The selectivity of a given energy function or itsability to discriminate between docked and mis-docked structures was assessed using a databaseof docking trials. This database was generated

Ž .using the 12 energy functions out of 144 , whichhave a distance-dependent dielectric and « s 3.For each ligand]receptor complex, 240 positions ofthe ligand around the active site were generatedfrom 20 trial simulations using each of the 12energy functions. The 20 initial conformations ofthe ligands for each complex were generated ran-domly and had a mean RMSD from the crystallo-

˚graphic ligand position of 9 A. An additional 50simulations, initiated from the crystallographic lig-and position, were carried out to increase the sam-pling of docked structures. The average ratio ofdocked:misdocked structures was roughly 1:1. Thisdatabase of 290 structures for each complex wasused to evaluate the energy distributions of eachenergy function.

We define correctly docked conformations as˚ Ž .those less than 2 A RMSD on all heavy atoms

from the crystallographic structure of the ligand inthe ligand]receptor complex. Structures with a

˚RMSD of larger than 4 A are considered incor-Ž .rectly docked or misdocked . Structures with

˚RMSD between 2 and 3 A are considered partiallydocked, whereas those with RMSD between 3 and

˚4 A are partially misdocked. Both partially dockedand partially misdocked structures are disre-garded in our assessment of selectivity, becausethey are, in most cases, in the binding site, butoften with an incorrect orientation. This left, onaverage, about 220 structures for each complex.

The measure of selectivity we use is similar tothose used in the design of parameters for proteinfolding algorithms,32 sequence design studies,33, 34

and inverse folding approaches.35 ] 37 In these prob-lems, energies are converted to so called Z-scoresto take into account the nature of energy distribu-tions. The Z-score is defined as:

Ž .E y EŽ . Ž .Z E s 5

s

where E is the energy of the ligand]receptor com-plex, E is the mean energy and s is the standarddeviation of the energy distribution. The energycomputed is the sum of the internal energy of theligand and its interaction energy with its receptor.The selectivity, SG, of an energy function is de-fined as:

N N1¡i iŽ .Z y Z f ; f / 0Ý ŁD , m M , m i iN is1is1~ Ž .SG s 6

N

0 ; f s 0Ł i¢is1

where i refers to the different complexes; andZ i and Z i are the minimum values of theD , m M , mZ-score for the docked and misdocked conforma-

Žtions the Z-scores of the lowest energy docked.and misdocked structures , respectively; f is thei

fraction of the docked structures with Z-scoreslower than those of the misdocked structures. N is

Ž .the number of complexes in this work N s 5 .The selectivity of an energy function is taken to bezero if, for any complex, the lowest energy mis-docked structure is lower in energy than the low-est energy docked structure. The lower the valueof SG, the more selective is the energy function.The metric SG defines a selective energy functionas one that maximizes the energy gap between theenergy distributions of the docked and misdockedstructures, while minimizing the overlap of thedistributions.39

ASSESSMENT OF EFFICIENCY

To estimate the efficiency of each energy func-tion, we carried out 20 docking simulations foreach of the five complexes. The search strategyused a version of the multiple-copy simultaneoussearch method,40 which employed an MD-simu-lated annealing search protocol. The initial condi-tions were the same for each energy function, aswere the cooling schedules. The annealing sched-ule for the docking simulations depends on thenature of the vdW potential. For energy functions

JOURNAL OF COMPUTATIONAL CHEMISTRY 1617

Page 7: Assessing energy functions for flexible docking

VIETH ET AL.

with a soft core vdW potential, the annealing wasaccompanied by a gradual hardening of the vdWpotential, as described previously. The annealingprotocol for energy functions with a hard corevdW was identical, but without hardening. First,

Ž .75 picoseconds ps of dynamics, with a timestepŽ .of 1.5 femtoseconds fs , was performed, while the

system was cooled from T s 700 K to T s 300 K,with a cooling frequency of 150 steps and a coolingdecrement of 1 K. The short-range nonbonded in-

Ž .teractions were modified, as described in eq. 3 ,with f s 0.885 and the nonbonded list was up-dated every 0.3 ps. In the second stage, the veloci-ties were generated from the Boltzmann distribu-tion and 75 ps of annealing from 400 K to 300 Kwas performed with a cooling frequency of 100steps and a temperature decrement of 5 K. At thisstage, the short-range vdW and electrostatic inter-actions were modified using f s 0.6. This wasfollowed by 10 ps of cooling by 1 K every 150steps, from 400 K to 300 K using a reduction of thenonbonded interactions with f s 0.4. The initialtemperature of the second stage was higher thatthe final temperatures for the first stage to com-pensate for the hardening of the vdW potential.The final stage consisted of 6 ps of quenching from300 K to 50 K with a temperature decrement of 5 Kevery 100 steps and unmodified short-range non-bonded interactions.

The efficiency, the number of correctly dockedŽ .structures per unit of time min , of an energy

function is defined as:

N¡1Ž Ž ..f q 0.5 f y fÝ i , - 2 i , - 3 i , - 2T is1

N~ Ž .SE s 7; f / 0Ł i , - 3iN

0 ; f s 0Ł i , - 3¢i

where f , indicates the fraction of structuresi, - a˚with RMSD less than a A from the crystal struc-

ture and energy lower than that of the lowestenergy misdocked structure. T is the average timerequired for docking a ligand to its receptor. Alltimings and fractions of docked structures aregiven for the same annealing schedules with108,000 energy evaluations. The efficiency is zero,if f is zero for any complex. For the efficiencyi, - 3measure, we include 50% of the partially dockedstructures with energies better than the lowestmisdocked in the pool of docked structures.

Results

In the series of calculations presented in whatfollows, we assess the performance of the energyfunctions in docking experiments. First, we con-sider their selectivity. Then, from a pool of selec-tive energy functions, we identify the most effi-cient one.

SELECTIVITY

ŽTable I ranks the 22 energy functions out of the.144 that have nonzero selectivity. A variety of

energy functions are selective. Most of them haveshort distances for the truncation of nonbondedinteractions, a distance-dependent dielectric, andunmodified vdW and electrostatic interactions. Theadvantage of the standard nonbonded interactionsover the soft core vdW is probably due to betterpacking of the ligands in their respective activesites. Looser packing in false binding sites maycompete more favorably in a soft core vdW poten-tial. Also, a short distance for nonbonded trunca-tion is advantageous for specific interactions in theactive sites over nonspecific interactions.

Three of the four most selective energy func-tions have reduced surface charges. This may re-flect the fact that most nonspecific binding sitesare located on the receptor surface, or in smallcavities, and thus reducing the surface electrostaticinteractions favors the more buried active sitestructures. In addition, a distance-dependent di-electric is almost exclusively present in the selec-tive potentials. A constant dielectric, low dielectricconstants, and the approximate continuum solva-tion model are disadvantageous for selectivity. Thepoor performance of the energy functions employ-ing a constant dielectric may be explained by thefact that a constant dielectric selects predomi-nantly on the basis of electrostatics and, as in thecase for low values of dielectric constants, proba-bly disrupts the proper balance between electro-

Ž .statics and packing vdW interactions necessaryfor tight and specific interactions. On the otherhand, incorporation of the continuum solvationflattens the energy surface and reduces the gapbetween the energy distributions of the dockedand misdocked structures. The general conclusionto be drawn from Table I is that, for the mostselective energy function, one should use a dis-

Žtance-dependent dielectric, with « s 3]4 3 being

VOL. 19, NO. 141618

Page 8: Assessing energy functions for flexible docking

ASSESSING ENERGY FUNCTIONS

TABLE I.Ranking of Energy Functions Based on Selectivity.

Selectivity Solvation Reduced surface vdW soft NBba c d( ) ( )Rank SG DE f model charges RSC ? core? cutoffG

1 y0.78 y10.2 0.77 rdie, 3 YES NO 82 y0.77 y9.6 0.76 rdie, 4 YES NO 83 y0.72 y9.4 0.74 rdie, 4 NO NO 84 y0.70 y11.4 0.76 rdie, 2 YES NO 85 y0.67 y9.8 0.72 rdie, 3 NO NO 86 y0.45 y10.0 0.54 rdie, 2 NO NO 87 y0.36 y5.9 0.46 rdie, 3 NO YES 88 y0.35 y5.4 0.45 rdie, 4 NO YES 89 y0.35 y5.9 0.45 rdie, 3 YES YES 8

10 y0.33 y5.1 0.44 rdie, 4 YES YES 811 y0.31 y6.4 0.44 rdie, 2 YES YES 812 y0.26 y5.9 0.40 rdie, 2 NO YES 813 y0.23 y4.7 0.43 rdie, 2 NO NO 1014 y0.22 y3.5 0.39 rdie, 4 YES YES 1015 y0.22 y4.5 0.44 rdie, 2 YES NO 1016 y0.21 y3.6 0.37 rdie, 4 NO NO 917 y0.21 y3.5 0.37 rdie, 4 YES NO 918 y0.17 y3.4 0.32 solv, 4 YES NO 1019 y0.16 y4.0 0.25 solv, 4 YES YES 820 y0.13 y4.8 0.30 rdie, 1 YES NO 1021 y0.09 y5.6 0.20 rdie, 1 YES YES 822 y0.04 y3.5 0.23 rdie, 1 NO NO 10

a ( )Energy gap kcal / mol between the lowest energy docked structure and the lowest energy misdocked structure.b Fraction of docked structures with energies lower than that of the lowest energy misdocked structure.c (Dielectric parameters. Type of solvation model rdie, distance-dependent dielectric constant; solv, continuum solvation contribu-

)tion with constant dielectric followed by the value of the dielectric constant.d ˚( )Nonbonded interaction truncation A .

the most universal for the short nonbonded cut-˚. Ž .offs , short 8-A nonbonded interaction trunca-

tion, regular vdW interactions, and reduce thecharges of surface side chains. Figure 3a shows theenergy histograms for the most selective energyfunction. The fraction of docked structures whoseenergy is lower than that of the lowest energymisdocked conformation is 0.77. For the most se-lective energy function, both the separation of theenergy distributions and the energy gap betweenthe tails of distributions are satisfactory.

EFFICIENCY

To test the efficiency of docking we selectednine energy functions from Table I: ranks 1]2,7]11, 14, 19, and one energy function not in Table IŽ .with SG s 0 that used the continuum solvationcontribution. The choice of the first eight was basedon ranking and low nonbonded cutoffs. The en-ergy function of rank 14 was chosen as a represen-tative of those with longer nonbonded cutoffs,

whereas the energy function of rank 19 was se-lected as a selective potential with a continuumsolvation contribution. From the best 11 energyfunctions with short nonbonded cutoffs, we do notexamine ranks 3]6, as they have the same type ofvdW interactions as the first two. As can be seenfrom the data presented in Table II, these are notlikely to affect our final conclusions.

Table II shows the ranking of the ten energyfunctions based on their efficiency. To get an ac-ceptable efficiency for all five complexes, a softcore vdW potential is clearly necessary. This is dueto the fact that for two receptors, streptavidin and

Ž .triose phosphate isomerase although less so , theactive site is in a cavity and, consequently, ratherinaccessible to the ligands. For these two com-plexes, the efficiency is much lower, in fact zerofor streptavidin, if the regular vdW potential isused. Soft core vdW interactions are needed forand result in efficient docking in these cases.

For the other complexes, where the active site iseasily accessible, the efficiency does not strongly

JOURNAL OF COMPUTATIONAL CHEMISTRY 1619

Page 9: Assessing energy functions for flexible docking

VIETH ET AL.

FIGURE 3. Energy distributions for the docked andmisdocked conformations averaged over five complexes.Histograms represent the misdocked structures; solidblack lines are the correctly docked structures. Energiesare relative to the minimum energy of the misdocked

( )structure. a Energy distribution for the most selective( ) ( )energy function Table I, rank 1 . b Energy distribution

( )for the most efficient energy function Table II, rank 1 .

depend on the form of vdW potential. This isshown in Table III, where we compare the effi-ciency of the second most selective energy func-

Ž .tion which has a standard vdW potential and thecorresponding energy function with a soft corevdW potential. The energy gap between the dockedand misdocked structures is, in two cases, substan-tially reduced by the presence of the soft core vdWpotential. We do not see a correlation betweenefficiency and the size of the energy gap. Theenergy barriers have a stronger influence on effi-ciency. This is apparent especially for biotin butalso for glycerol 3-phosphate. For the lowest en-ergy structures, the balance between the electro-static and the vdW interactions varies from almostentirely electrostatic in the phosphocholinerFABand glycerol 3-phosphatertim complexes, to al-most entirely vdW in the biotinrstreptavidin com-plex. This balance is different for misdocked struc-tures and thus a proper weighting of electrostaticsis important for selectivity.

The best energy function, in terms of efficiency,seems to be characterized by a distance-dependentdielectric with « s 3, reduced surface charges, a

˚soft core vdW potential, and an 8-A nonbondedtruncation. The most efficient potential is the ninthmost selective and gives a reasonable separation ofthe energy distributions of the docked and mis-

Ž .docked structures see Fig. 3b . We consider theoverall best docking energy function as the mostefficient one, for two reasons. First, to considerselectivity at all, at least one ligand has to dock tothe active site, and only an efficient energy func-tion can achieve this condition. Second, during ourannealing schedule we gradually change the softcore to a hard core, so the final energy evaluationis performed on the unmodified short-range vdWand electrostatic interactions. Thus, the importanceof the soft core vdW potential is emphasized inearly stages of docking, in which ligands are find-ing the active site. Once the active site is found,the hard core potential will optimize the bind-ing mode. In this way, our annealing protocolcombines the desirable features of efficiency andselectivity.

Conclusions

In this article, we presented metrics for assess-ing the selectivity and efficiency of docking energyfunctions. Some key features of our approach areborrowed from methodology used for designingpotentials applied to protein folding problems. Wehave identified a number of energy functions thatdiscriminate correctly docked structures from mis-docked structures.

A soft core nonbonded repulsion was found tobe critical for the kinetic accessibility of the bind-ing site. In a receptor with a relatively small en-trance to the active site, the use of the regular vdWpotential precludes ligands from entering. For allreceptors with closed active sites, such as strepta-vidin or triose phosphate isomerase, a soft corepotential appears to be essential for successfuldocking. Thus, in general, the most discriminatingenergy functions are not the best in terms of effi-ciency. No correlation is observed between the sizeof the energy gap between the best docked and thebest misdocked structures and docking efficiency.

ŽThus, we see a separation of kinetic effects ef-. Ž .ficiency and thermodynamic effects selectivity .

To take advantage of both a selective energy func-tion with hard core vdW interactions and the effi-

VOL. 19, NO. 141620

Page 10: Assessing energy functions for flexible docking

ASSESSING ENERGY FUNCTIONS

TABLE II.Efficiencies of Some Selective Energy Functions.

SE Efficiency SG Solvation vdW soft NBa b( )rank SE f f rank f model RSC core? cutoff-2 - 3 G

1 0.060 0.45 0.55 9 0.45 rdie, 3 YES YES 82 0.057 0.39 0.53 10 0.44 rdie, 4 YES YES 83 0.057 0.41 0.55 8 0.45 rdie, 4 NO YES 84 0.057 0.43 0.53 7 0.46 rdie, 3 NO YES 85 0.054 0.39 0.52 11 0.44 rdie, 2 YES YES 86 0.033 0.43 0.56 14 0.39 rdie, 4 YES YES 107 0.032 0.40 0.57 144 0.35 solv, 3 YES YES 108 0.00 0.28 0.32 2 0.76 rdie, 4 YES NO 89 0.00 0.20 0.22 1 0.77 rdie, 3 YES NO 8

10 0.00 0.19 0.21 19 0.25 solv, 4 YES YES 8

a Average fraction of docked conformations.b Average fraction of partially docked structures. All other abbreviations are explained in the footnotes to Table I.

TABLE III.( )Per-Complex Comparison of Efficiency for Two Selective Potentials Selectivity Ranks 2 and 10 Differing

Only in Short-Range Interactions.

a b( )Ligand vdW potential SE f f f Time DE-1 - 2 - 3

( )g3p hard core 0.02 0.1 0.15 0.15 7.2 y7( )g3p soft core 0.09 0.25 0.50 0.65 6.7 y8

( )Sialic acid hard core 0.02 0.05 0.20 0.20 10.3 y2( )Sialic acid soft core 0.01 0.00 0.0 0.15 10.7 y3

( )Biotin hard core 0.00 0.00 0.00 0.00 7.2 y11( )Biotin soft core 0.05 0.15 0.20 0.50 6.9 y6

( )Phosphocholine hard core 0.05 0.00 0.25 0.45 7.3 y5( )Phosphocholine soft core 0.09 0.00 0.45 0.75 6.8 y5

( )Benzamidine hard core 0.10 0.15 0.80 0.80 8.2 y23( )Benzamidine soft core 0.10 0.15 0.80 0.80 8.1 y3

a ˚Fraction of ligands with RMSD on all heavy atoms less than 1 A.b ( )Average computer time minutes for a single docking simulation. Other columns as in Table I.

ciency of the corresponding energy function with asoft core vdW, our annealing protocol graduallychanges the vdW potential from soft to regular,yielding a useful tool for molecular docking.

In contrast to the impression given by the rela-tive rarity of MD as a docking tool, we haveshown that MD can be successfully used whenpaired with a smooth energy surface that has aclear global minimum. In the subsequent study,we present a comparison of different algorithmsbased on docking efficiency. Much remains to bedone to apply our approach to more general casesof docking using unbound receptor structures andto larger search spaces when the active site loca-tion and the binding mode are unknown. Never-theless, this work provides guidelines that could

be useful in developing solutions to various dock-ing problems.

Acknowledgments

Helpful discussions with Professors JeffreySkolnick and Angel Ortiz are acknowledged.

References

Ž .1. R. C. Jackson, Curr. Opin. Biotechnol., 6, 646 1995 .

2. D. A. Gschwend, A. C. Good, and I. D. Kuntz, J. Mol.Ž .Recogn., 9, 175 1996 .

Ž .3. G. Jones and P. Willet, Curr. Opin. Biotechnol., 6, 652 1995 .

JOURNAL OF COMPUTATIONAL CHEMISTRY 1621

Page 11: Assessing energy functions for flexible docking

VIETH ET AL.

4. I. D. Kuntz, J. M. Blaney, S. J. Oatley, R. Langridge, andŽ .T. E. Ferrin, J. Mol. Biol., 161, 269 1982 .

5. D. E. Clark and D. R. Westhead, J. Comput.-Aided Mol.Ž .Design, 10, 337 1996 .

Ž .6. G. Bohm, Biophys. Chem., 59, 1 1996 .Ž .7. D. S. Goodsell and A. J. Olson, Proteins, 8, 195 1990 .

8. N. Di Nola, D. Roccatano, and H. J. C. Berendsen, Proteins,Ž .19, 174 1994 .

9. G. Jones, P. Willey, and R. C. Glen, J. Comput.-Aided Mol.Ž .Design, 9, 532 1995 .

10. R. S. Judson, E. P. Jaeger, and A. M. Treasurywala, J. Mol.Ž .Struct., 308, 191 1994 .

11. R. S. Judson, Y. T. Tan, E. Mori, C. Melius, E. P. Jeager,A. M. Treasurywala, and A. Mathiowetz, J. Comput. Chem.,

Ž .16, 1405 1995 .12. J. D. Hirst, B. Dominy, Z. Guo, M. Vieth, and C. L. Brooks

Ž .III, Am. Chem. Symp. Series in press .13. M. Vieth, J. D. Hirst, B. N. Dominy, H. Daigler, and C. L.

Ž .Brooks III. J. Comput. Chem. this issue .14. B. R. Brooks, R. E. Bruccoleri, B. D. Olafson, D. J. States,

S. Swaminathan, and M. Karplus, J. Comput. Chem., 4, 187Ž .1983 .

Ž .15. K. P. Clark and Ajay, J. Comput. Chem., 16, 1210 1995 .16. A. Caflisch, S. Fischer, and M. Karplus, J. Comput. Chem.,

Ž .18, 723 1997 .17. M. Miller, J. Schneider, B. K. Sathynarayana, M. V. Toth,

G. R. Marshall, L. Clawson, L. Selk, S. B. H. Kent, andŽ .A. Wlodawer, Science, 246, 1149 1989 .

18. P. C. Weber, D. H. Ohlendorf, J. J. Wendoloski, and F. R.Ž .Salemme, Science, 243, 85 1989 .

19. H. A. Gabb, R. M. Jackson, and M. J. E. Sternberg, J. Mol.Ž .Biol., 272, 106 1997 .

20. M. Marquart, J. Walter, J. Deisenhofer, W. Bode, and R.Ž .Huber, Acta Cryst., 39, 480 1983 .

21. E. A. Padlan, G. H. Cohen, and D. R. Davies, Ann. Im-Ž . Ž .munol. Paris , 136, 271 1985 .

22. W. I. Weis, A. T. Bruenger, J. J. Skehel, and D. C. Wiley,Ž .J. Mol. Biol., 212, 737 1990 .

23. M. E. M. Noble, R. K. Wierenga, A. M. Lambeir, F. R.Opperdoes, M. W. H. Thunnissen, K. H. Kalk, H. Groendijk,

Ž .and W. G. J. Hol, Proteins, 10, 50 1991 .24. Quanta, Molecular Simulations Inc., San Diego, CA, 1997.25. Parameter files for ligands are available through anoyn-

mous ftp from ftp.scripps.edurficiorparametersr or www.scripps . edurbrooksrficior parameters r parameters.html,1997.

Ž .26. B. K. Shoichet and I. D. Kuntz, J. Mol. Biol., 221, 327 1991 .27. P. H. Walls and M. J. E. Sternberg, J. Mol. Biol., 228, 277

Ž .1992 .28. M. Nilges, G. M. Clore, and A. M. Gronenborn, FEBS Lett.,

Ž .229, 317 1988 .Ž .29. C. A. Laughton, Prot. Eng., 7, 235 1994 .

30. J. Apostolakis, A. Pluckthun, and A. Caflisch, J. Comput.Ž .Chem., 19, 21 1998 .

31. J. Warwicker and H. C. Watson, J. Mol. Biol., 157, 671Ž .1982 .

32. L. A. Mirny and E. I. Shakhnovich, J. Mol. Biol., 264, 1164Ž .1996 .

33. A. M. Gutin, V. I. Abkevich, and E. I. Shakhnovich, Proc.Ž .Natl. Acad. Sci. USA, 92, 1282 1995 .Ž .34. A. Godzik, Prot. Eng., 8, 409 1995 .

35. A. Godzik, A. Kolinski, and J. Skolnick, J. Comput.-AidedŽ .Mol. Design, 78, 397 1993 .

36. D. Jones and J. Thornton, J. Comput.-Aided Mol. Des., 7, 439Ž .1993 .

37. R. Luthy, J. U. Bowie, and D. Eisenberg, Nature, 356, 83Ž .1992 .

38. J. U. Bowie, R. Luthy, and D. Eisenberg, Science, 253, 164Ž .1991 .

39. G. M. Verkhivker, P. A. Rejto, D. K. Gehlhaar, and S. T.Ž .Freer, Proteins, 25, 342 1996 .

Ž .40. A. Miranker and M. Karplus, Proteins, 11, 29 1991 .

VOL. 19, NO. 141622