from in silico target prediction to multi-target drug design: … · 2020. 6. 2. · ligand-based...

21
Review From in silico target prediction to multi-target drug design: Current databases, methods and applications Alexios Koutsoukas a , Benjamin Simms b , Johannes Kirchmair a , Peter J. Bond a , Alan V. Whitmore c , d , Steven Zimmer c , Malcolm P. Young c , e , Jeremy L. Jenkins b , Meir Glick b , Robert C. Glen a, , Andreas Bender a, a Unilever Centre for Molecular Sciences Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom b Lead Discovery Informatics, Center for Proteomic Chemistry, Novartis Institutes for Biomedical Research Inc., 250 Massachusetts Avenue, Cambridge, MA, 02139, USA c e-Therapeutics plc, Holland Park, Holland Drive, Newcastle upon Tyne, NE2 4LZ, United Kingdom d The School of Pharmacy, 29-39 Brunswick Square, London WC1N 1AX, United Kingdom e Institute of Neuroscience, University of Newcastle, Framlington Place, Newcastle upon Tyne, NE2 4HH, United Kingdom ARTICLE INFO ABSTRACT Article history: Received 21 January 2011 Accepted 6 May 2011 Available online 18 May 2011 Given the tremendous growth of bioactivity databases, the use of computational tools to predict protein targets of small molecules has been gaining importance in recent years. Applications span a wide range, from the designed polypharmacologyof compounds to mode-of-action analysis. In this review, we firstly survey databases that can be used for ligand-based target prediction and which have grown tremendously in size in the past. We furthermore outline methods for target prediction that exist, both based on the knowledge of bioactivities from the ligand side and methods that can be applied in situations when a protein structure is known. Applications of successful in silico target identification attempts are discussed in detail, which were based partly or in whole on computational target predictions in the first instance. This includes the authors' own experience using target prediction tools, in this case considering phenotypic antibacterial screens and the analysis of high-throughput screening data. Finally, we will conclude with the prospective application of databases to not only predict, retrospectively, the protein targets of a small molecule, but also how to design ligands with desired polypharmacology in a prospective manner. © 2011 Elsevier B.V. All rights reserved. Keywords: Target prediction Polypharmacology Mode of action Target fishing In silico Off targets Contents 1. Introduction ......................................................... 2555 2. Public and proprietary databases making bioactivity data available........................... 2556 JOURNAL OF PROTEOMICS 74 (2011) 2554 2574 Corresponding authors. Tel.: +44 1223 336 432, +44 1223 762 983. E-mail addresses: [email protected] (R.C. Glen), [email protected] (A. Bender). 1874-3919/$ see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.jprot.2011.05.011 available at www.sciencedirect.com www.elsevier.com/locate/jprot

Upload: others

Post on 25-Feb-2021

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: From in silico target prediction to multi-target drug design: … · 2020. 6. 2. · Ligand-based methods for protein target prediction .....2557 3.1. In silico target ... that the

J O U R N A L O F P R O T E O M I C S 7 4 ( 2 0 1 1 ) 2 5 5 4 – 2 5 7 4

ava i l ab l e a t www.sc i enced i r ec t . com

www.e l sev i e r . com/ loca te / j p ro t

Review

From in silico target prediction to multi-target drug design:Current databases, methods and applications

Alexios Koutsoukasa, Benjamin Simmsb, Johannes Kirchmair a, Peter J. Bonda,Alan V. Whitmore c, d, Steven Zimmer c, Malcolm P. Youngc, e, Jeremy L. Jenkinsb,Meir Glickb, Robert C. Glena,⁎, Andreas Bendera,⁎aUnilever Centre for Molecular Sciences Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW,United KingdombLead Discovery Informatics, Center for Proteomic Chemistry, Novartis Institutes for Biomedical Research Inc., 250 Massachusetts Avenue,Cambridge, MA, 02139, USAce-Therapeutics plc, Holland Park, Holland Drive, Newcastle upon Tyne, NE2 4LZ, United KingdomdThe School of Pharmacy, 29-39 Brunswick Square, London WC1N 1AX, United Kingdome Institute of Neuroscience, University of Newcastle, Framlington Place, Newcastle upon Tyne, NE2 4HH, United Kingdom

A R T I C L E I N F O

⁎ Corresponding authors. Tel.: +44 1223 336 43E-mail addresses: [email protected] (R.C. G

1874-3919/$ – see front matter © 2011 Elsevidoi:10.1016/j.jprot.2011.05.011

A B S T R A C T

Article history:Received 21 January 2011Accepted 6 May 2011Available online 18 May 2011

Given the tremendous growth of bioactivity databases, the use of computational tools topredict protein targets of small molecules has been gaining importance in recent years.Applications span a wide range, from the ‘designed polypharmacology’ of compounds tomode-of-action analysis. In this review, we firstly survey databases that can be used forligand-based target prediction and which have grown tremendously in size in the past. Wefurthermore outline methods for target prediction that exist, both based on the knowledgeof bioactivities from the ligand side and methods that can be applied in situations when aprotein structure is known. Applications of successful in silico target identification attemptsare discussed in detail, which were based partly or in whole on computational targetpredictions in the first instance. This includes the authors' own experience using targetprediction tools, in this case considering phenotypic antibacterial screens and the analysisof high-throughput screening data. Finally, we will conclude with the prospectiveapplication of databases to not only predict, retrospectively, the protein targets of a smallmolecule, but also how to design ligands with desired polypharmacology in a prospectivemanner.

© 2011 Elsevier B.V. All rights reserved.

Keywords:Target predictionPolypharmacologyMode of actionTarget fishingIn silicoOff targets

Contents

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25552. Public and proprietary databases making bioactivity data available. . . . . . . . . . . . . . . . . . . . . . . . . . . 2556

2, +44 1223 762 983.len), [email protected] (A. Bender).

er B.V. All rights reserved.

Page 2: From in silico target prediction to multi-target drug design: … · 2020. 6. 2. · Ligand-based methods for protein target prediction .....2557 3.1. In silico target ... that the

2555J O U R N A L O F P R O T E O M I C S 7 4 ( 2 0 1 1 ) 2 5 5 4 – 2 5 7 4

3. Ligand-based methods for protein target prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25573.1. In silico target prediction based on single structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25573.2. Target prediction based on data mining methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25583.3. Chemogenomic approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25583.4. Pharmacophore-based methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25593.5. How to extend the scope of target prediction models: Extrapolating in ligand space . . . . . . . . . . . . . . 25593.6. How to extend the scope of target prediction models: Extrapolating in target space . . . . . . . . . . . . . . 2560

4. Structure-based methods for protein target prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25605. Applications of target prediction models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2561

5.1. Applications of machine learning based target prediction methods . . . . . . . . . . . . . . . . . . . . . . . 25615.2. Applications of pharmacophore-based target prediction methods . . . . . . . . . . . . . . . . . . . . . . . . 25635.3. Applications of docking-based target prediction methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2564

6. Novel applications 1: Target identification in phenotypic bacterial growth inhibition screens. . . . . . . . . . . . . 25647. Novel applications 2: HTS fingerprints in target prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25668. Prospective utilization of bioactivity data for multi-target drug design: Finding the right target . . . . . . . . . . . 25689. Conclusions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2570Acknowledgments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2570References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2570

1. Introduction

‘Drug discovery’ has historically been based on phenotypicreadouts on the organism level, such as the effect of herbs orother natural remedies on humans. In this case the desiredphenotype (usually reversal of the diseased state of theorganism back to healthy) was the relevant endpoint beingconsidered, while themode of action ofmany compounds (theprecise reasons leading to this modulation) remained in thedark. In the more recent past reductionist approaches in thepharmaceutical community have traditionally seen drugs as‘magic bullets’, a concept first established by Paul Ehrlich [1,2].This concept states that drugs exert their activities bymodulating one target of particular relevance to a disease,the famous idea of one ‘key’ (or ligand) modifying each ‘lock’(or protein) [3]. This paradigm has guided the pharmaceuticalindustry throughout approximately the last three decades.

However, in recent years, there is mounting evidenceoffering a significant challenge to this hypothesis as it hasbecome increasingly obvious that many drugs elicit theirtherapeutic activities by modulating multiple targets [4–7].Recently, it has been estimated that each drug on the marketpossesses bioactivity against, on average, six experimentallyconfirmed protein targets (based on currently available data)[8]. In fact, in questioning the universal appropriateness of theclassical single-target drug-discovery approach one needs tolook no further than the large number of drugs that have faileddue to insufficient clinical efficacy [9].

While a certain set of targetsmodulated by a smallmoleculeseems to be inmany cases beneficial for efficacy, there are alsoproteins one would rather avoid modulating (often called‘antitargets’ [10]). Over the past few decades, a significantnumber of chemical compoundshave failed to get approvedandreach the market due to severe clinical side-effects and cross-reactivity that were observed during later-stage clinical trials,increasing attrition rates of new compounds [11,12]. The fact isthat the multi-target interactions of drugs are either largelyunknown or insufficiently understood in most cases [8], and

examples of the truly prospective design of multi-target drugsare still rather scarce [6].

Hence, we need to accept that for a drug to have the desiredeffect in man we need to modulate a set of targets to achieveefficacy, while avoiding others to reduce the risk of sideeffects. A promising category of chemical compounds wherepromiscuity was only relatively recently seen as a virtue iskinase inhibitors [13,14], such as the Novartis compoundGleevec, which was once thought to be highly selective intargeting the Bcr–Abl fusion gene. However, more recently ithas turned out that Gleevec (along with many other kinaseinhibitors) also inhibits other kinases such as c-kit and PDGFR[15,16], enabling its further use against gastrointestinalstromal tumors [17]. While in this case, different targets arelinked to different diseases (but the same compound), thekinase inhibitors Sorafenib and Sunitinib which were recentlyapproved by the FDA, targeting VEGFR, PDGFR, FLT-3 and c-kit,are thought to achieve efficacy in the clinic by modulatingdifferent pathways also in the same disease [18]. On the otherhand, and as an ‘atypical’ case, the polypharmacology ofpsychoactive compounds (such as antidepressants or anti-psychotics) against a set of mainly dopamine and serotoninreceptor subtypes has been known (and accepted as a specialcase) for some time. Such ‘polypharmacology’ [12] substan-tially changes the way we perceive the bioactivity of com-pounds, which, from a medicinal chemist's point of view, hasnow evolved from a unidimensional into a multidimensionalarea to explore and optimize [19,20]. In this context, and this isthe subject of this review, the use of target prediction tools canhelp us to better anticipate the bioactivity spectra of com-pounds earlier, and in a much more predictive fashion.

Here, computational methods, with the wealth of bioactiv-ity information available today, can help us both to designligands with a higher likelihood of exhibiting a desiredbioactivity profile, as well as to anticipate secondary andalternative bioactivities in compounds at an early stage. Publicdatabases such as ChEMBL [21,22], PubChem [23] and Chem-Bank [24] are continuously increasing in size, with ChEMBL in

Page 3: From in silico target prediction to multi-target drug design: … · 2020. 6. 2. · Ligand-based methods for protein target prediction .....2557 3.1. In silico target ... that the

2556 J O U R N A L O F P R O T E O M I C S 7 4 ( 2 0 1 1 ) 2 5 5 4 – 2 5 7 4

its latest release, comprising about 700,000 distinct smallmolecules and about 2.7 million bioactivity data points. Thesepublic initiatives follow the example set by various bioinfor-matics initiatives globally, of providing open access toessential data; a trend that started in the bioinformaticsworld several decades ago with sequence databases distrib-uted on magnetic tapes but that has only happened in thecheminformatics world in the last 5 years or so.

Bioactivity data of small molecules can be used in a varietyof ways, ‘forwards’ (to predict protein targets of smallmolecules) as well as ‘backwards’ (to design ligands withdesired bioactivity against a set of targets, based on knowledgecontained in the databases). The ‘forward’ concept is visual-ized in Fig. 1, where a user is able to input a molecularstructure into a computational bioactivity model, and where –without experiment – based on the database's ‘educatedguesses’, the ‘predicted’ bioactivities of a compound aregenerated. This review will cover both of the above domains;the rationalization of phenotypic observations related tocompound application as well as the prospective design ofnovel molecules with the desired activities. It is an extensionof previous publications in the field [25–27] due to the largelyincreased availability of data, as well as methods, relating tothe area.

In the following, we will firstly (Section 2) summarizewhich databases are at our disposal at the current stage, anarea under tremendous development. We will continue withan overview of ligand-based (Section 3) and structure-based(Section 4) methods which can be used to predict proteintargets for smallmolecule ligands. These sections are followedby published applications of target prediction tools (Section 5),as well as two applications carried out by the authors, namelythe prediction of targets for antibacterial compounds in ahigh-throughput screening settings (Section 6) as well as theapplication of multi-target affinity fingerprints for scaffoldhopping (Section 7). We will conclude with the prospectiveutilization of bioactivity databases and address the importantquestion how to decide which targets to modulate (Section 8),before concluding this review (Section 9).

Fig. 1 –An in silico targetpredictionworkflow. Inorder to enablea cofirstly target class models for up to multiple thousand proteins neefingerprints in combinationwith theNaïve BayesClassifier. Afterwcan be annotated within a fraction of a second with its most likely

2. Public and proprietary databases makingbioactivity data available

The field of public, and private, bioactivity databases is largeand continuously growing. One cannot claim to present acomplete overview of bioactivity data in a review such as thisone, however due to the importance of this type of data atleast an overview of many of the major public databases shallbe listed here, listed in Table 1 (for more information see arecently published review on the field [28]; for a list ofcommercial databases see a previous review [25]).

A remarkable development in recent years is that databaseshave not only grown in size, but they have also started tointegrate diverse types of data, such as phenotypic data anddrug side effects [29], both of which were hardly accessible inelectronic form in public previously. Two of the most compre-hensive integrated resources available today are probablyChem2Bio2RDF [30] and PROMISCUOUS [31], both linkingchemical data to pathway information and phenotypic data aswell as merging side effect information within this integrateddata resource. Both databases differ in many practical details,such as the data sources used, the database implementation,and the ways in which they can be queried. However, theircommonaim is the integration of different data types— anareaof great interest also topharmaceutical companies,which is stillvery much under active investigation, not least due to theheterogeneity of the data at hand. This is closely linked to thedevelopment of ontologies suchas the SmallMoleculeOntology(SMO) [32], which allows for an easier integration of differentdata sources using relatively recently developed Semantic Webtechnologies such as theResourceDescription Framework (RDF)and Web Ontology Language (OWL). Conversion of formerlyrelational databases to RDF triples seems tobe a general trend inthe cheminformatics (and related) fields due to the flexibility ofthe data that can be represented; however the comparativerigidity of relational databases can also be seen as an advantagein some cases where data needs to adhere to a predefinedstructure for data mining and similar applications.

mputationalmodel topredict protein targetsof smallmolecules,d to be constructed which takes only hours for circularards, a chemical structure input in themodel (shownon the left)macromolecular interaction partners (shown on the right).

Page 4: From in silico target prediction to multi-target drug design: … · 2020. 6. 2. · Ligand-based methods for protein target prediction .....2557 3.1. In silico target ... that the

Table 1 – Overview of some of themajor public bioactivity data resources available today. Due to the fast development of thearea this list does not claim to be exhaustive in any way.

Databases Data Web address

ChEMBL [21] >700k small molecules with >2.7 million bioactivity data points http://www.ebi.ac.uk/chemblDrugBank [150] 4800 drug entries including: >1350 FDA-approved drugs and

123 FDA-approved biologicshttp://www.drugbank.ca/

ChemBank [24] Information on hundreds of thousands of small molecules andhundreds of biomedical assays

http://chembank.broadinstitute.org/

ComparativeToxicogenomicsDatabase (CTD) [151]

6000 compounds, 1.4 million chemical–gene–disease data points http://ctd.mdibl.org/

SuperTarget [152] 1500 drugs, 2500 targets proteins and 7300 drug–target interactions http://bioinf-tomcat.charite.deMATADOR [152] Manually annotated compounds from the SuperTarget database http://matador.embl.de/Therapeutic TargetDatabase (TTD) [153]

Contains 1906 targets, including 358 successful, 251 clinical trial,43 discontinued and 1254 research targets, and 5124 drugs,including 1511 approved, 1118 clinical trial and 2331 experimental drugs

http://xin.cz3.nus.edu.sg

PubChemBioActivity [23]

250k compounds, 2500 bioassays http://pubchem.ncbi.nlm.nih.gov/

BindingDB [39] >271k compounds, >620k binding affinities against 5526 protein targets http://www.bindingdb.orgDrugPort (EBI) Contains 1492 approved drugs and 1664 unique protein targets http://www.ebi.ac.ukPotential Drug TargetDatabase (PDTD) [80]

Contains 1207 entries covering 841 known andpotential drug targets with structures from the Protein Data Bank (PDB)

http://www.dddc.ac.cn/pdtd/

Promiscuous [31] From text mining >20 mio. Publications, includes >10 mio. Proteins,>25k compounds, as well as drug side effects and various other data;includes protein network visualization

http://bioinformatics.charite.de/promiscuous

T3DB [154] Toxin and Toxin-Target Database; >2900 toxins,1300 toxin targets and >33,000 toxin-target associations

http://www.t3db.org/

Chem2Bio2RDF [30] RDF-based database integrating chemical, biological and phenotypic data http://cheminfov.informatics.indiana.edu:8080/

2557J O U R N A L O F P R O T E O M I C S 7 4 ( 2 0 1 1 ) 2 5 5 4 – 2 5 7 4

What is clearly visible from Table 1 is that millions ofbioactivity data points are available, which can be used forligand-based target prediction (Fig. 1). Based on various datasources also internet-based public resources for target predic-tion are available which are listed in Table 2.

3. Ligand-based methods for proteintarget prediction

3.1. In silico target prediction based on single structures

The most simple target predictions are based on individualmolecules. Ligand-based methods include methods that arebased on the principle of chemical similarity, which states thatsimilar chemical structures tend to present similar biologicalactivitiesmoreoften thannot [33,34], althoughevenstructurally

Table 2 – Publicly available web services that can be employed

Name Method

GUSAR Antitarget/Toxicity Prediction [155] Fingerprint-basedPASS INet [156] Fingerprint-basedPharmMapper [68] Pharmacophore-baReverseScreen3D [71] Hybrid 2D/3D methSimilarity Ensemble Approach (SEA) [44] Fingerprint-basedTarFisDock [80] Docking-based

similar compounds may interact with a protein target indifferent ways [35]. These methods rely on prior knowledge ofbioactive ligands and protein structures and canmine in a shorttime huge chemical libraries in order to locate the mostchemically related structures that share substructural similar-ities and subsequently interact with similar protein targets.However, they are generally not applicable in cases where noligands for a given protein exist, since in these situations notraining on ligand-based information is possible.

One of the questions that need to be answered is how todescribeamolecular structure toacomputer—andnouniversalanswer exists here. Recent studies compared [36,37] (anddiscussed in more detail [38]) currently available moleculardescriptors and to which extent they are correlated to bioactiv-ity. Here it was found that circular fingerprints, counts ofcircular fingerprints, predefined keys, atom pairs and trianglesbehave significantly distinct — and that in most cases circular

for the target prediction of small molecules.

Web address

http://www.pharmaexpert.ru/GUSAR/antitargets.htmlhttp://www.pharmaexpert.ru

sed http://59.78.96.61/pharmmapper/od http://www.modelling.leeds.ac.uk/ReverseScreen3D

http://sea.bkslab.org/http://www.dddc.ac.cn/tarfisdock

Page 5: From in silico target prediction to multi-target drug design: … · 2020. 6. 2. · Ligand-based methods for protein target prediction .....2557 3.1. In silico target ... that the

2558 J O U R N A L O F P R O T E O M I C S 7 4 ( 2 0 1 1 ) 2 5 5 4 – 2 5 7 4

fingerprints in their various definitions seem to have bestcorrelationwith bioactivity.Hence, theymight alsobe oneof themost appropriate methods to use in target prediction attempts.

Target prediction methods focused on individual moleculesare essentially a similarity searching exercise: from a bioactivitydatabase, such as ChEMBL [21] or BindingDB [39], similarstructures to the one under consideration are identified.Experimentally measured activities from the database arethen used as a guideline for the bioactivity of the moleculeunder consideration. While this is a rapid method to predicttargets, its main shortcoming is that it results in insufficientextrapolation in practice since only molecules as a whole arecompared toeachother. Letus for thispurpose consider thecasewhere a database contains molecules consisting of fragmentsA–B and fragments C–D as active against a particular proteintarget. Using similarity searching alone, a newmolecule A–C (orA–D, B–C, B–D) would not necessarily be identified to as similarto the database-identified molecules. Machine learning modelsdiscussed in the following though would likely have higherchances of success to be able to capture this novel combinationof features in a previously unseen molecular structure.

3.2. Target prediction based on data mining methods

Another increasingly important field used for bioactivityprediction of chemical compounds is data mining methods.These methods, in contrast to chemical similarity methodswhich simply identify potential targets based on ligandsimilarities of molecular substructures, perform pattern recog-nition invast chemogenomic spaceandcluster compounds thatpresent similarities inmultidimensional space. Thesemethodssteadily gain preference and credibility as they are able toidentify and correlate relationships among large numbers ofcompounds that otherwise would be impossible to connectbased substructural similarities alone. Machine learningmethods such as Support Vector Machines (SVMs) and LinearRegression Models (LRMs) are the driving force of theseapproaches. A significant number of methods that employdata mining approaches for mapping the chemogenomic spacehave been published during the last decade.

Nidhi et al. applied Multiple-Category Bayesian models inorder to mine chemogenomics space [40]. The model wastrained on extended-connectivity fingerprints of compoundsfrom 964 target classes in the WOMBAT (World Of MolecularBioAcTivity) chemogenomics database and thenwas applied topredict the 3-top most likely protein targets for all MDDR (MDLDrugDatabase Report) database compounds.Whilemuchof theearly work of protein target predictionmodelswas based on theBayes Classifier [41], more recently, implementations of theWinnowalgorithm for target prediction have been reported [42].While major difference in the overall performance in targetprediction were not shown, both algorithms did predict slightlydifferent targets, which demonstrates the importance of havingmultiple algorithms available. In addition, in situations wherenew bioactivity data become available, complete retraining ofthe model is not required in case of the Winnow algorithm,which may well be of relevance where frequent additions ofnovel data are made to existing models.

A related method – which was one of the first targetprediction methods published – is the PASS method [43]. This

method is currently able to predict more than 4130 biologicalactivities based on the structural chemical formula of acompound alone. The development of this method requiredanalysis of structure–activity relationships for more than270,000 biologically active compounds. Currently, this methodis reported to predict with an accuracy of about 95%.

A target predictionmethod that is based on the distributionof features in each bioactivity class of molecules is theSimilarity Ensemble Approach (SEA) [5,44], which employs aBLAST-derived algorithm to develop minimal spanning treesconsidering chemical similarity. SEA estimates target similar-ity, which is ranked based on their ligands' chemicalsimilarity. The authors applied the SEA method to largescale study of more than 3000 FDA-approved drugs againsthundreds targets and identified 23 new cases of drug–targetinteractions, five of which were validated in vitro to be potentwith affinities less than 100 nM. Moreover, the SEA methodwas applied to examine for potential off-target inhibition bycommercially available drugs against the enzyme proteinfarnesyltransferase (PFTase) [45]. Here, loratadine and micon-azole were identified and experimentally confirmed as ligandsfor PFTase.

Other machine learning methods have been applied tofingerprint-based target prediction. Wale [46] used ECFP4fingerprints on a PubChem-derived dataset comprising 231targets and 40,170 ligands and compared a Bayes Classifierwith binary SVM, a ranking-based SVM, cascaded SVM,Ranking Perceptron and the combination of SVM and aRanking Perceptron with respect to their ability to predicttargets of small molecules. Here it was found that thecascaded SVM methods generally outperformed the BayesClassifier, while on ligands which belong to multiple catego-ries the combination of SVM and Ranking Perceptron are thebest-performing method on this dataset. Given that muchtarget prediction literature was based on the Bayes Classifieralone, this indicates that it would be beneficial to employ –and benchmark – other machine learning methods in thecontext of the target prediction of small molecules. SVMswerealso employed for target prediction in a different recentpublication [47], albeit on a smaller dataset comprising onlyfive activity classes. Here, consistently more than 80%sensitivity for the target predictions has been obtained.

3.3. Chemogenomic approaches

Chemogenomic approaches attempt to systematically analyze(‘all’) ligand–target interaction space in toto[48–51]. Frequently,protein targets are, in the spirit of chemogenomics, classifiednot according to sequence or fold, but according to thesimilarity of their ligands. Given a ligand-based classificationof protein targets, one can analyze which targets are likely tobe hit by a ligand, given its structure [52,53].

In this area, recent publications have appeared which havetried to evaluate to what extent novel ligand–protein targetpairs could be predicted, based on bioactivity knowledge of aset of ligands and a set of targets on the one hand, or on therelationships of the targets themselves. (As discussed abovefor the ligand side, also for the protein side no single descriptorexists, so sequences, binding pockets and selected subsets ofresidues in ligand binding pockets have all been used to

Page 6: From in silico target prediction to multi-target drug design: … · 2020. 6. 2. · Ligand-based methods for protein target prediction .....2557 3.1. In silico target ... that the

2559J O U R N A L O F P R O T E O M I C S 7 4 ( 2 0 1 1 ) 2 5 5 4 – 2 5 7 4

capture the essential properties of a protein for bindingligands.) In this spirit, groups have employed SVMs [54] aswell as substructural analysis [53] in order to relate GPCRsfrom the ligand-based side— and to estimate howwell ligand–target pairings could be predicted also in cases where noligands of a particular target are known. In the latter of theabove studies [53] it was found that in those cases, 93% of thetargets ligands could be identified with a reliability beyondrandom even if no ligands were given in the training dataset(this resembles a ‘receptor deorphanization’ setting in prac-tice), thereby illustrating that not only explicitly knownligand–target pairs can be predicted by computational models.

Linear regression models have also been applied to studydrug–target interactions. Zhao et al. [55] developed a computa-tional framework, termed drugCIPHER, which is based on theobserved correlation in pharmacological and genomic spaces.They proposed three linear regression models that relatetherapeutic similarity, chemical similarity and their combina-tions to the relevance of the targets based on protein–proteininteraction network.

Also in the area of chemogenomics-driven predictive ap-proaches to identifying ligand–target pairs falls a novel low-dimensional protein–ligand fingerprint-based (PLFP) methodwas recently reported [56] encodingboth chemical andbiologicalproperties and which is suitable for exploring chemogenomicsspace. PLFP method captures the pharmacophore properties ofligands and their respective transmembrane binding cavitiesand was found to show preference for SVM classifiers, where itachieved nearly 90% precision in determining true from falsepairs.

Furthermore, topological descriptors based on molecularfeatures have been shown to be especially successful indescribing and comparing molecular profiles. Two novel setsof topological molecular descriptors have been recentlyreported; SHED (SHannon Entropy Descriptors) [57] and RED(Renyi entropy descriptors) [58]. SHEDdescriptors are calculatedfrom distributions of atom-centered feature pairs extractedfrom the topology of molecules and their scores reflect thedistribution of pharmacophore features in a molecular consti-tute. Greogori-Puigjané et al. [59] applied SHED descriptors tomine chemogenomics space ofmore than 700 drugs against 600targets. Their results conclude that compounds targetingaminergic G protein-coupled receptors (GPCRs) exhibit themost promiscuous pharmacological profiles seen amongdrugs. On the other hand, RED descriptors are based ongeneralized Renyi entropy as a variability measure for afeature-pair distribution in contrast to SHED descriptors.

Also scaffold compositions can be used to predict thebioactivity of compounds on protein targets [60]. In thisparticular work, about 24,000 unique molecular scaffolds wereextracted from 458 different bioactivity classes and a hierarchi-cal analysis of scaffolds contained in each bioactivity class wascarried out. As a validation of the method, bioactivities for‘virtual’ scaffoldswereperformed,and indeedcompoundscouldbe found in external tests sets with the predicted bioactivities.

3.4. Pharmacophore-based methods

The first softwaremodule supporting the profiling of bioactivityspectra using a pharmacophore-based approach was Ligand

Profiler, distributedwith Discovery Studio [61]. Ligand Profiler isbased on a Pipeline Pilot [62] workflow for automated parallelscreening. The program allows screening of collections ofcompounds (single- ormulti-conformational databases) againsta series of pharmacophore models defined by the user (for arecent review see [63]). Technically, the ligand profiler mapseach input ligandagainst all selectedpharmacophoremodels. Afit value is computed that is a measurement of how well thecompound maps the chemical function-based features of thepharmacophore and the results can be displayed as a heat mapvisualizingwhichcompoundsare likely to bind towhich targets.KNIME is an alternative pipelining software allowing easyintegration of such parallel screening workflows [64].

The Inte:Ligand PharmacophoreDB [65] currently includes2500 annotated structure- and ligand-based pharmacophoremodels covering 300 unique biological targets. All structure-based models of this database were developed using Ligand-Scout [66] and are provided in Accelrys Catalyst data format.

LigandScout generates pharmacophore models based on agiven 3D structure of a ligand–protein complex in a fullyautomated manner. After clean-up of the binding site,chemical features are identified on the ligand site, whilesearching for corresponding features in the neighboringprotein environment. Pharmacophore features are placed atpositions with favorable protein–ligand interactions. Addres-sing tautomeric effects is a crucial aspect when it comes to thegeneration of pharmacophore models, in particular withrespect to hydrogen bonding patterns [67]. LigandScoutexamines different tautomeric forms and indicates themost-likely one(s). To account for the shape properties of thebinding site, excluded volume spheres can be added onmoieties forming the protein surface. This avoids the identi-fication of bulky compounds during virtual screening, whichare unlikely to be accommodated in the target interaction site.

Also a free web interface (see Table 2) has been publishedwhich employs pharmacophores to predict protein targets ofsmallmolecules, entitled PharmMapper [68]. In order to derive adataset of pharmacophores structures complexed with smallmolecules were derived from DrugBank, BindingDB, PDBBindand the PDTD database. Subsequently, pharmacophore modelswere constructed also in LigandScout, resulting in a total of 7302pharmacophoremodels, out of which 2241 are covering humanproteins. Analyzing the targets for tamoxifen anAUC of the trueprotein interaction partners of 0.7 could be achieved.

3.5. How to extend the scope of target prediction models:Extrapolating in ligand space

When predicting protein targets for a small molecule usingligand-basedmethods, limitations exist regarding the scope ofthe training datasets, both in chemical space as well asbiological space. This means in practice that not for allchemical classes reliable target predictions can be obtained,and that for proteins, which do not have bioactivity datapoints available, no target predictions can be made at all.However, both in ligand (chemical) space and protein target(biological) space extrapolations are possible to some extent.

Extrapolations in ligand space are largely dependent on thedescriptor chosen. In previous work from our groups weexplored both two-dimensional descriptors (fingerprints) and

Page 7: From in silico target prediction to multi-target drug design: … · 2020. 6. 2. · Ligand-based methods for protein target prediction .....2557 3.1. In silico target ... that the

2560 J O U R N A L O F P R O T E O M I C S 7 4 ( 2 0 1 1 ) 2 5 5 4 – 2 5 7 4

three-dimensional descriptors (FEPOPS) with respect to theirability of predicting protein targets of ligandswhere no similarstructures were contained in the training set. It could beshown that more ‘abstract’ 3D descriptors, such as FEPOPSwhich just consists of a small number of four pharmacophoricpoints for each molecule, are able to outperform the more‘exact’ 2D fingerprints in those cases where no similar ligand–target associations are known [69,70].

Also a hybrid 2D/3D target prediction, ReverseScreen3D, hasbeen published recently [71]. Firstly, 2D fingerprints are used tocalculate the similarity of the query molecule to every ligandstructure in the ligand–target interaction database,whichhas inturn been derived from a clustering of the PDB. In an optionalsecond step, the database ligand of each target cluster with thehighest similarity to the query molecule is then used as atemplate for which 3D matching of the query is performed. For20 compounds a validation procedure was performed andexperimental targets could be confirmed in the majority ofcases. 2D and 3D predictions often show complementaryresults, and on a 4-OH tamoxifen case study three othermethods (INVDOCK, TarFisDock and PharmMapper) could beoutperformedbyReverseScreen3D. Themethod isalsoavailableto the public via a web interface (see Table 2 for details).

3.6. How to extend the scope of target prediction models:Extrapolating in target space

Also on the target side extrapolation abilities are oftenrequired — such as in cases where most of the bioactivitydata available relate to human proteins, but one is interestedin predicting protein targets in a pathogen instead. Oneway toaddress this problem is to annotate proteins with proteinfolds — and map back to pathogen proteins after predictingprotein folds targeted by a small molecule In a recentlypublication InterPro domains were employed for this purpose[72]. Compounds that bind to a specific protein domain mayalso bind to other proteins that share the same InterProdomain, even if they share low sequence similarity. This insilico ‘Domain Fishing Model’ (DFM) is capable of extrapolatingto targets outside training sets. Furthermore, DFM can beemployed to triage the results of affinity chromatographyexperiments. The DFM was applied by Prathipati et al. [73] toelicit protein targets of 44 antitubercular (antiTB) compounds,where it successfully predicted after validation the targets oftwo novel antiTB compounds, CBR-2092 and Amiclenomycin.

Another possibility of extrapolating in target space are so-called ‘proteochemometrics’ (PCM) models which include,apart from descriptors of the ligands, descriptors of the targetside [74–76]. Thesemethods are related to the chemogenomicsmethods illustrated above and recently reviewed; [48] however,as opposed to chemogenomics methods ligand and protein,features are explicitly included as input variables in the PCMmodel, with the bioactivity being the dependent variable. Inprospective studies (which are currently being prepared forpublication) the authors have also introduced the concept of‘leave-one-target-out’ cross validation in order to judge theability of the bioactivity model to predict bioactivities of ligandsfor related targets—and itwas foundontheenzymeHIVreversetranscriptase that, indeed, for 12 out of 14 mutants predictivityof the bioactivity model could be obtained that approached the

reproducibility of the assay. (In the remaining two cases norelated targets were present in the training set; hence, noreliable bioactivity prediction was possible in those cases.)

Also the so-called ‘signature descriptors’ fall into this categoryof approaches which simultaneously employ ligand as well astarget features for enzyme-metabolite interaction modeling [77].Employing SVMs in combination with Signature Kernels, a five-fold cross validation on 873 drug–target pairs comprising 121targets as well as 551 drugs yielded interaction accuracies >85%.

As summarized above, a multitude of target predictionmethods, based on small-molecule information, exists in thecurrent literature. Given the ever increasing amount of bioac-tivity data available it can be expected that their performancewill continue to increase in the future, as will their applicationsin predicting targets for compounds with an unknown bioac-tivity profile.

4. Structure-based methods for proteintarget prediction

Structure based methods, in the context of drug design,generally describe approaches that exploit protein structuralinformation, combined with scoring functions, in order topredict ligand–target pairs. More conventionally, ligands areidentified as interaction partners for a given protein, howevermore recently this concept has also been reverted to dock onesmall molecule against a panel of multiple receptors (‘inversedocking’) which is then applied to target prediction. For amoredetailed overview of this particular field the reader is alsoreferred to a recent review on the topic [78].

In particular, the docking-basedmethods INVDOCK [79] andTarFisDock [80] have recently been presented which aim atelucidating the protein target of a small molecule, based ondocking against a panel of proteins putatively interacting withthe ligand. INVDOCK is able to consider flexibility of the ligand(but not of the protein) and employs a custom-made scoringfunction,whileTarFisDock isbasedon thesoftwareDOCKand isalso available publicly in a web interface (see Table 2). Applica-tions of the programs are given in the following section.

More recently, Kellenberger et al. [81] performed a study ofcurrently available methods for in silico reverse screening usedfor target prediction in order to examine how the algorithmsfor ranking protein targets perform. As it was found scoringfunctions still remain the main weakness of virtual screeningapproaches. Several ranking protocols based on GOLD fitnessscore and topological molecular interaction fingerprint (IFP)comparison were evaluated in that study, showing thatproblems associated with accuracy and false positives arepresent and need to be addressed.

Also for endogenous ligands docking has been used toidentify substrate–proteinpairs [82]. The conclusionof thisworkwas that proteins (and substrates) are more promiscuous innature than one might expect. However, this result needs to beconsidered also in context with the predictivity of currentscoring functions,whichmightnot be too reliable anestimate ofbinding affinities encountered in experiments. A recent review[83] even states that, on the particular dataset chosen andemploying a set of 37 scoring functions, “wehave demonstratedthat for the eight proteins of seven evolutionarily diverse target

Page 8: From in silico target prediction to multi-target drug design: … · 2020. 6. 2. · Ligand-based methods for protein target prediction .....2557 3.1. In silico target ... that the

2561J O U R N A L O F P R O T E O M I C S 7 4 ( 2 0 1 1 ) 2 5 5 4 – 2 5 7 4

types studied in this evaluation, no statistically significantrelationship existed between docking scores and ligand affinityperformed large scale comparison of available docking pro-grams and scoring functions.” However, it was also found thatthe docking engines of the programswere inmany cases able togenerate ligand binding poses, and pick them among theenergetically favorable ones, in many cases — hence, whilenumerically assigning binding affinities through scoring func-tions seems to be difficult at the current stage, performingclassifications into binders and non-binders seems to be moresuccessful in many cases.

While ligand-based methods are fast, docking-basedmethods take considerably more computational resourcesfor a docking run against hundreds, or even thousands oftargets while still not achieving reliable results. Hence, turningto the more computationally intense area of moleculardynamics simulations may in the future, with increasingcomputer power, become more and more important topredicting ligand–protein binding affinities.

Docking seeks to rankposesorderedby a score,which ideallycorrelates with free energy of binding. However, only limitedflexibility in the protein is considered, and solvent effects aregenerally excluded. Poisson–Boltzmann based approaches canprovide a more rigorous estimate of binding energies, but theiraccuracy tends to be dependent upon the configuration of theligand [84]. A more detailed picture of ligand binding may beobtained via simulation-based approaches. All-atommoleculardynamics (MD) simulations provide ameans to characterize themotions of biologicalmacromolecular complexes in the contextof the ligand-bound and free states, and to explicitly include theeffects of water [85]. While simulation approaches tend to becomputationally demanding, a number of developments haveprovided a means to accurately calculate relative and absolutebinding free energies and their component contributions insilico, making such methods potentially useful in the latteroptimization stages of drug design.

Rigorous simulation-based approaches to the estimationof binding free energies generally rely upon the use of athermodynamic cycle, in which a change in binding freeenergy is calculated as the difference between the free energyof converting one ligand to another in the bound state, minusthe free energy of the same change in solution [86]. This canalso be useful for understanding the ligand binding energet-ics associated with individual amino acid substitutions,which may be of interest in the context of specificity withina closely related family of proteins (Fig. 2, example takenfrom [157]). In the free energy perturbation (FEP) approach,the free energy change is measured as one ligand is“alchemically” transformed into another. For a convergedestimate, the configurations of the two ligands sampledmustbe similar (their “phase space” must overlap); as a result, aseries of non-physical intermediate “windows” between thetwo ligands are introduced [87]. Thus, while such methodsmay be too time consuming for screening large compounddatabases, they are likely to prove useful toward the end ofthe drug design process, when attempting to optimize alimited number of ligands rationally [88]. No target predictionmethods have been reported yet that make use of large-scalemolecular dynamics simulations, however given their accu-rate results this may change in the foreseeable future.

In conclusion, although structure-based target predictionhave made significant advances and improvements over thelast decade and can now be applied to real-life problems, thereare still significant limitations that prohibit them fromattracting a wider acceptance in the pharmaceutical industry.Themain current limitations include that expert knowledge isoften required to operate the software, they remain compu-tationally very intensive and there are limitations imposed bythe availability of experimental data such as X-ray structuresfor membrane-bound proteins. In addition, scoring functionsstill need improvement in order to estimate ligand bindingaffinities reliable. On the other side, structure-based methodsare not limited to the chemical ligand classes covered in atraining set in theway ligand-based target predictionmethodsare, so they might very well become more versatile in theirapplications in the future once the above problems have beenalleviated.

5. Applications of target prediction models

A significant number of applications of in silico target predictionmodels have been published recently, ranging from the mode-of-action elucidation of compounds to the analysis of falsepositives in reporter gene assays, which we will summarize inthe current section

5.1. Applications of machine learning based targetprediction methods

The ‘Similarity Ensemble Approach’ (SEA) described above [5]has recently been applied in a prospective manner [44] toelucidate additional targets for drugs on the market. Calcu-lating the similarity of 3665 FDA-approved and investigationaldrugs against hundreds of targets a large number of novelassociation have been uncovered. Out of 30 experimentallytested associations 23 were confirmed, five of which werepotent (below 100 nM). This was among others the case for theantagonism of the β1 receptor by fluoxetine (Prozac), theinhibition of the serotonin transporter by ifenprodil (Vadilex)and antagonism of the H4 receptor by delavirdine (Rescriptor).This prospective validation indeed supports hope that, bothfor predicting off-targets [89,90] as well as novel indications, insilico target prediction tools based on ligand similarity mayplay a role of growing importance in the future.

Earlier work investigated 130 of the ‘top 200 medicines’ onthe market using the PASS method and in 93.2% of the casesknown pharmacological effects could be found in thebioactivity spectra [91]. In this case no experimental valida-tion was provided directly in the publication. Shortlyafterwards however, PASS was applied to the discovery ofnew cognition enhancers [92], and in this case evenvalidation using animal models was successfully performedon a novel chemical series. Out of 8 compounds tested in amodel of scopolamine-induced passive avoidance reflex(PAR) amnesia all showed cognition enhancement, three ata dose of 1 mg/kg and five at a dose of 10 mg/kg. This isprobably one of the few truly prospective validations of noveltherapeutic areas of a series of compounds, based on in silicotarget prediction tools.

Page 9: From in silico target prediction to multi-target drug design: … · 2020. 6. 2. · Ligand-based methods for protein target prediction .....2557 3.1. In silico target ... that the

Fig. 2 – Relative binding affinities from free energy perturbation. A peptide ligand (white cartoon format) is shown in its boundstate to an SH2 domain (ice blue cartoon format); note the presence of key water molecules which may affect the bindingaffinity. In this example, the relative free energy of ligand binding is calculated in the context of an amino acid mutation in thebinding pocket, from Tyr (cyan wireframe format) to Leu (green wireframe format). The free energy difference for binding to theprotein containing Tyr (ΔGbind

1 ) or Leu (ΔGbind2 ) is unknown, but according to the thermodynamic cycle shown,

ΔΔG=ΔGbind2 −ΔGbind

1 =ΔGmut2 −ΔGmut

1 . To obtain the quantities ΔGmut1 and ΔGmut

2 , two alchemical calculations are carried out,during which the Tyr residue is slowly “mutated” to Leu, in the presence and absence of peptide ligand, respectively. In the“dual topology” approach, both amino acids are present at the same time (as shown in the figure), but their interactionwith theenvironment is scaled according to a coupling parameter λ, which varies between 0 (equivalent to Tyr, the initial state) and 1(equivalent to Leu, the final state). The change in energy for such a mutation is obtained from a series of N overlappingsimulation “windows” as a function of λ:

ΔGTyr→Leu = −kBT ∑i=1

Nln exp

− Eλi + 1−Eλið ÞkBT

gλi

where (…) i implies an ensemble average over configurations of state i, kB is the Boltzmann constant, and T is the temperature.

2562 J O U R N A L O F P R O T E O M I C S 7 4 ( 2 0 1 1 ) 2 5 5 4 – 2 5 7 4

Also fingerprint-based target prediction models, based onthe WOMBAT database and Naïve Bayes models, have beenapplied in practice, such as to the rationalization of falsepositives in reporter gene assays [93]. Here it was found that

false positives in reporter gene assays very often are predictedto interact with kinases involved in cell cycle progression —hence, a mechanistic hypothesis for false positives could begenerated. This hypothesis was prospectively validated in

Page 10: From in silico target prediction to multi-target drug design: … · 2020. 6. 2. · Ligand-based methods for protein target prediction .....2557 3.1. In silico target ... that the

2563J O U R N A L O F P R O T E O M I C S 7 4 ( 2 0 1 1 ) 2 5 5 4 – 2 5 7 4

cytotoxicity assays, where it was indeed shown that >50% offalse positives in reporter gene assays possess cytotoxicproperties [93].

Another application of this class of models was applied tothe analysis of ‘hits’ in High-Content Screening [94]. Whileproviding the screening with a cellular phenotype, high-content screening readouts are very often cryptic with respectto the underlying mode of action which is responsible for theparticular phenotype encountered [95,96]. In this recent study[94] high-content screening readouts were combined with insilico target predictions as well as an analysis of the structuralsimilarity of compounds. It was found that all three domains,chemical structure, predicted targets/mode of action, as wellas phenotype, are to some degree independent and that noone-to-one mapping between any two of the spaces can beperformed, hinting at the importance of using informationfrom all three domains when analyzing the biomodulatorycapabilities of a molecular structure.

The recent publication [97] of a set of 13 antipsychotic drugsagainst 34 protein targets prompted the equivalent analysisusing in silico methods. It was found [98] that 65% of the 442affinities could be predictedwithin one affinity log unitwith thelevel of precision being above 92%, rendering the computationalinteraction predictions rather reliable (more than 9 out of 10predictionswould then be confirmed experimentally, with onlyone false positive prediction present).

For St. John's Wort (Hypericum perforatum) a recent publica-tion [99] attempted to demonstrate the capabilities of compu-tational methods in the bioactivity spectra prediction of activeconstituents of plants and, more generally, compounds ofnatural origin.

One of the earlier applications in the field [100] employedSVMs togenerateclassificationmodels fora total of 125molecularfunctions which were derived from the MDDR database. Modelswere generated using MDL keys for a total set of 70 molecularactions and 55 therapeutic areas and 871 marketed drugs wereanalyzed with respect to their in silico bioactivity profile. Inaddition, adverse effects related to the liver were analyzed withrespect to thehighest correlatedmolecular functions, and severalof the links were sensible as a first approximation — such asinhibition of HMG-CoA reductase, which is a class of drugs withknown association with hepatotoxicity.

Models predicting the interactions of drugs, encoded usingpresence/absence vectors of functional groups, with proteins,encoded using physicochemical property descriptors havealso been published [101]. However, given that only the targetfamily (enzymes, ion channels, G-protein-coupled receptorsand nuclear receptors) are predicted as interaction partnersfor each ligand, the practical applicability of the model willlikely be limited. The same four bioactivity classes were usedemploying bipartite local models, however in this case alsoindividual proteins were predicted as putative targets for a setof the top-scoring ligand–target pairs which is very relevantinformation to have in practice [102]. In this case, firstly targetproteins of a given drugs were predicted, followed by theprediction of drugs of a given target protein. In this way, theperformance predicting ligand–target interactions could beimproved over previous methods.

Focusing on interactions between metabolites and en-zymes, Chen et al. [103] employed a dataset from the KEGG

database, where ligands were described by a graph-basedsimilarity measure, and proteins were described using eitherprotein functional domain composition, sequence identity asdetermined by the BLAST algorithm or a similarity of the GOannotations of each protein. Here it was found that employingmultiple nearest-neighbor models using only parts of thenegative training data in each case followed bymajority votingoutperformed each individual target prediction model.

5.2. Applications of pharmacophore-based targetprediction methods

Several examples of successful pharmacophore-based bioactiv-ity profiling have been reported (for more details see a recentreview [104]). A Pipeline Pilot-based parallel screening systemusing structure-based pharmacophore models was used toclassify 100 antiviral compounds. All of these test compoundsare known to exhibit activity on one particular target. 50pharmacophore models for five different targets (i.e., 10pharmacophore models per target) were developed usingLigandScout. The test compounds were then screened againstall pharmacophore models and the prediction of activity of acertain compound on a certain target was based on a majorityvote of the respective pharmacophoremodels. Itwas found thatin 88% of the cases the activity of the test compounds wascorrectly predicted [105]. In another experiment, the selectivityof pharmacophore models of HIV protease was investigatedusing a test set of HIV protease inhibitors and inhibitors knownto be active on other proteases, as well as inactive compounds.The results indicate that the utilization of ensembles ofpharmacophore models and ensemble voting may help toenhance the signal-to-noise ratio. While the retrieval rate ofknown active compounds may be significantly increased, theselectivity of the pharmacophore-based approach may bepreserved [106]. This approach also been employed to screenmolecules for interactions with different cytochrome P450isoforms and other ADME- and anti-targets [107].

Such a parallel screening setup was also applied in a targetfishing experiment for 357 known peroxisome proliferatoractivated receptor (PPAR) agents. While PPAR targets wereranked first among all 181 targets screened, several newpotential targets for these PPAR agents were identified. In arelated approach, the plant constituents of Prasaplai, a Thaitraditional medicine, which consists of 10 plants, camphor andsodium chloride, were analyzed for COX inhibiting compoundsusing a collection of pharmacophoremodels. It was shown thatcompounds known or suspected to be binding to COX could beidentified by in silico modeling [108]. A database of Chineseherbal constituents was investigated for activity on four targetsinvolved in inflammation (cyclo-oxygenases 1 and 2, COX; p38MAP kinase, p38; c-Jun terminal-NH2 kinase, JNK and type 4cAMP-specific phosphodiesterase, PDE4) using a collection ofpharmacophoremodels. The study revealed several agents thatare expected to exhibit biological effects on one or more targetsunder investigation [109]. 16 main constituents isolated fromRuta graveolens L. were screened against the Inte:LigandPharmacophoreDB for activity on acetylcholinesterase (AChE),cannabinoid receptor 2 and human rhinovirus (HRV) coatprotein. The compounds were also tested experimentally andin themajority of cases theactivitieswerepredicted correctly by

Page 11: From in silico target prediction to multi-target drug design: … · 2020. 6. 2. · Ligand-based methods for protein target prediction .....2557 3.1. In silico target ... that the

2564 J O U R N A L O F P R O T E O M I C S 7 4 ( 2 0 1 1 ) 2 5 5 4 – 2 5 7 4

the in silico screeningmethod. Several interesting activitieswererevealed, such as the in vitro inhibitory activity of arborinine onAChE and HRV-2 and the antiviral activity of 6,7,8-trimethox-ycoumarin on HRV-2 [110].

5.3. Applications of docking-based target predictionmethods

Muller et al. [111] using high-throughput docking screeningapproach examined more than 2000 druggable active sitesfrom sc-PDB to identify putative biological targets of de-rivatives of the 1,3,5-triazepan-2,6-dione scaffold. Specifically,they focused on five molecules representatives of thiscombinatorial library. Among promising biological targetsthat were identified and verified in vitro was the secretedphospholipase A2 (sPLA2).

Traditional Chinese Medicine (TCM) in recent years hasattracted attention of the pharmaceutical community as apotentially alternative option to conventional western-medi-cine due to its in some cases validated phenotypic efficacy. Themain problem associated with chemical compounds found inTCM is that inmost cases the biological targets uponwhich thiscompounds act are not yet known. Thus, there has been anincreased interest exploring these compounds using in silicotarget prediction strategies. One such large scale study wasreported by Chen et al. [112] who mined scientific literature tocollect a large amount of available information onTCMs. On theherbal ingredients that have been identified analyticallyINVDOCK [79,113,114] was applied as an inverse dockingalgorithm for identifying their potential molecular targets.Their result lead to creation of the Traditional ChineseMedicineInformation Database TCM-ID, a web accessible free of chargedatabase that includes information for prescriptions, constitu-ent herbs, herbal ingredients, molecular structure and func-tional properties of active ingredients, therapeutic and sideeffects, clinical indication and application and related matters[112]. Another study focused on TCM compounds was con-ducted by Zahler et al. [115], who applied inverse screeningmethod in order to identify potential kinase targets for threederivatives of indirubin, namely 5-bromo-indirubin-3′oxime(5BIO), 6-bromo-indirubin-3′oxime (6BIO), 7-bromo-indirubin-3′oxime (7BIO). A total number of 84 unique protein kinaseswere examined in this study, with structural target data beingdrawn fromsc-PDB. Indirubin, anactivecompoundwas recentlyidentified to present therapeutic activities against chronicmyelogenous leukemia (CML) [116].

INVDOCK was also used in recent work attempting torationalize adverse drug reactions [117]. In this particularstudy a total of eleven HIV protease, nucleoside reversetranscriptase and non-nucleoside reverse transcriptase in-hibitors were analyzed. Out of the complete set of putativeinteraction partners several protein targets were consistentlypredicted to be interaction partners of drugs, such as DNApolymerase beta, DNA topoisomerase I, FK binding protein 12and the sterol regulatory element binding protein-1. Overall itwas found that more than 86% of the adverse drug reactionspredicted by INVDOCK are consistent with the adversereactions reported in literature. Hence, it could be shownthat also docking-based target prediction methods are poten-tially of use in practice, not only for the anticipation of on-target effects, but also to rationalize (and potentially predict)

adverse drug reactions via their association off-targets.TarFisDock on the other hand was applied to predict targetsof the toxin 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD), and itwas suggested that also aryl hydrocarbon receptor (AhR)independent pathways might play a role in its toxicity [118].

Finally, also integrated approaches for target prediction ofsmall molecules exist, such as those based on gene expression[119] ormetabolic profiling data [120]. However, this falls not intothe scope of the current review, hence the reader would bereferred to the above articles for further information about thefield.

6. Novel applications 1: Target identification inphenotypic bacterial growth inhibition screens

We will now present two applications of target predictiontools performed by the authors themselves.

In the last decade many bottlenecks associated with “largescale science” such as HTS have been removed; hence it is nolonger an issue to cherry pick (individually select) and assaytens of thousands of compounds in a matter of days.Previously, this was a serious problem so applications suchas ‘plate-based diversity selection’ had to be developed tospeed up the compound selection process [121]. In tandem,not only the number of compounds as well as the throughputincreased, but rather the data associatedwith each compoundincreased simultaneously, in proprietary as well as in publicdatabases (see Section 2 for details).

However, the increase of data is not true in all areas, sincemany pharmaceutical companies abandoned antibacterial re-search in the past decades, leaving only five players (GSK,Novartis, AZ, Merck and Pfizer) involved in the area [122]. As aresult, the knowledge around antibacterial targets is limited.More specifically, the databases are rich in human targets andcompounds that modulate these targets, but contain limitedinformation when it comes to bacterial targets. This is unfortu-nate because the major bottleneck in antibacterial drug discov-ery is often not discoveringhits— in a phenotypic assay, such asa bacterial growth assay, understanding the mode of action of thehits is a challenge. For example, it is unclear if the phenotypicobservation is a result of general cytotoxicity or rather target-specific; and if the phenotypic observation is target-specific thenwhich targets are involved. Once the target is elucidated it ismuch easier to optimize the potency and selectivity against theprimary target.

Hence, one way to elucidate the mode of action ofcompounds tested in antibacterial assays is the applicationof in silico target predictionmodels; our workflow to assess hitsfrom phenotypic antibacterial screen is shown in Fig. 3. Thefigure outlines a computational approach for target elucida-tion of phenotypic observations taken from the MDDR such as“Antibiotic”, “Antibacterial” or “Antifungal”. After reading allstructures they are assessed in silico through a MulticlassBayes model (implemented in PipelinePilot) trained on theGVK bioscience database and employing ECFP circular finger-prints, where the objective is to elucidate the cellular targetthat the compounds modulate. For each compound all thetargets with a Bayes score greater than 30 were consideredlikely targets and were further analyzed. The most prevalent

Page 12: From in silico target prediction to multi-target drug design: … · 2020. 6. 2. · Ligand-based methods for protein target prediction .....2557 3.1. In silico target ... that the

Fig. 3 – Implementation of target prediction models in PipelinePilot. 7812 structures with annotated antibacterial activity(activity indices 6700 and 68,000) were taken from the MDDR databases and likely targets could be predicted for 6874 of themusing an in silico model. Calculations (using fingerprints and a Bayes model, such as here) are computationally efficient, withprediction for all compounds obtained in the order of 10 s.

2565J O U R N A L O F P R O T E O M I C S 7 4 ( 2 0 1 1 ) 2 5 5 4 – 2 5 7 4

targets (those predicted for 100 compounds or more by themodel) are visualized in Fig. 4. This knowledge can now beused in real-world situation, such as to annotate the hits anddeprioritize the compounds with known mode of action.Specifically, among the most prevalent targets in Fig. 4 areDNA gyrase (GYR), DNA gyrase subunit A (GYRA) and AMPC(beta-lactamase). The compounds predicted to be activeagainst those targets may not be of interested to a team thatwould like to discover an antibacterial with a novel mode ofaction. Vice versa, other teams might wish to follow upcompounds with known mode of action, but novel chemo-types. All those selections are now possible, based on theannotated putative compound targets.

Interestingly many of the predicted targets are human andnot antibacterial.Thisbrings furtherknowledge to the teambothin terms of off target effect and what risks should be mitigated.

Fig. 4 – Target predictions for the set of antibacterial compoundfrequently predicted target. Hence, in silico target prediction modcompounds where only a phenotype has been determined expe

For example, monoamine oxidase A and B (MAOA and MAOB)are enriched in this analysis (Fig. 4). Another example iscytochrome P450 (CYP3A4); the knowledge that compoundsinteract (usually as substrates) with this enzyme could be usefulto the project team when developing a risk mitigation strategyand interpreting in vivo data such as those referring to acompoundhalf-life. Furthermore, knowledge of themammaliantarget could be leveraged to generate a hypothesis with regardfor the bacterial target; for example, a target prediction ofmatrixmetalloproteinase 9 (MMP9) may hint that the compounds alsomay hit a bacterialmetalloprotein. Many of theMMP9 inhibitorsinteract with the zinc atom in the metalloprotein though ahydroxamate moiety that could potentially also bind to otherzinc-containing bacterial proteins such as LpxC. As an example(and in line with many other LpxC inhibitors) the antibioticCHIR-090 contains a hydroxamate moiety coordinating the

s analyzed in Fig. 3. It can be seen that gyrase is the mostels can be beneficial when assigning a mode of action torimentally.

Page 13: From in silico target prediction to multi-target drug design: … · 2020. 6. 2. · Ligand-based methods for protein target prediction .....2557 3.1. In silico target ... that the

2566 J O U R N A L O F P R O T E O M I C S 7 4 ( 2 0 1 1 ) 2 5 5 4 – 2 5 7 4

catalytic zinc ion [123], and it also exhibits weak activity in ourin-house the MMP assay panel. This exemplifies a possiblerelation, which is not fully understood or obvious, of usingchemical probes to associate human and bacterial targets. Allthisknowledge cannowbeused to anticipate targets for drugs indevelopment, both in humans and pathogens, based on theinitial target prediction obtained in silico.

7. Novel applications 2: HTS fingerprints intarget prediction

Traditional ligand-based target prediction methods rely on thecommonlyacceptedprinciple that small-moleculeswithsimilarchemical structures will bind to similar targets [34,124]. Usingthis principle, target predictions can be made by comparingcompounds, using a desired chemical descriptor and similaritymetric, to a database of compoundswith known activities. Thissimple descriptor matching can also be further extended usingstatistical models, such as Bayesian classifiers, trained on largechemogenomic databases [125]. The central caveat in all ofthesemethods is that they rely on chemical structure similarity,rendering it difficult tomakepredictions for orphan compoundsas well as to scaffold-hop to very diverse structures, e.g. from aclass I to class II inhibitor of the same kinase. The analysis of“structure–activity cliffs” [126,127] as well as recent ligand-based virtual screening benchmarks against statistically unbi-ased chemical databases [128], have also revealed that struc-turally similar compounds may not always be as similar inactivity space as once thought. These factors have caused a shiftaway from traditional structural descriptors toward “biologicalfingerprints”, which describe small molecules purely by theirbioactivityprofilesagainst apanel of targetssince theabsenceofany of explicit chemical structure information in these de-scriptors often allows them to overcomemany of the caveats oftraditional fingerprints. Overall, this concept has some relationto the ‘affinity fingerprints’ first presented by Kauvar [129], whofirst mentioned that bioactivity can be described in a ‘universalway’ by using a set of bioactivities against a set of ‘orthogonal’proteins.

In their work on Biological Spectra Analysis, Fliri et al.showed that by clustering compounds on a “biological spectra”derived from Cerep's BioPrint database, structure–activity re-lationships could be found without introducing the bias of anystructural descriptor [130]. This work was later extended toanalyze and predict adverse drug reactions (ADRs) [131].Similarly, Plouffe et al. used activity profiles generated fromthe Genomics Institute of the Novartis Research Foundation'sin-house high-throughput screening data to predict the mech-anism of actions of novel anti-malarial compounds [132]. Alsoan in silico analog of this fingerprint, termed ‘Bayes AffinityFingerprint’ has recently been presented [125], extending thisconcept to areas where no experimental bioactivity data of acompound against a particular set of targets is available.

The Novartis Institutes for BioMedical Research hassimilarly developed a small molecule biological descriptorcalled High-Throughput Screening (HTS) fingerprints, whichare derived from single concentration activity data from 184high-throughput screens using over 1.6 million compounds(Fig. 5). The current fingerprint is composed of 104 biochemical

assays spanning the major drug target classes, as well as 80cell-based assays or phenotypic screens. The combineddataset creates an information-rich profile that reflectscompound pharmacophores and biological effects in cells.Using a systematic network-based clustering approach, weshow that compounds with similar HTS fingerprints, whileoften very structurally diverse, modulate the same broadbiological processes and pathways. While in many casessimilar profiles will result from compounds having the sametarget, it is often observed that compounds with similar HTSfingerprints hit different targets in the same pathway. Thisallows us to not only make single compound-target pre-dictions, but also to group compounds into biologically similarclusters which modulate the same therapeutically relevantprocesses such as inflammation or autophagy.

In order to compare the HTS fingerprints of two com-pounds, the similarity metric must take into account thesparseness of the data, i.e., not every compound has beentested in every assay. In order to adjust for this, a modifiedversion of the Pearson correlation that takes into account thenumber of assays in common between the two compoundswas used. Using this metric, an N-by-N HTS fingerprintsimilaritymatrixwas computed for a set of 58,056 compounds.The set included all compounds in Novartis's screening librarythat could be mapped to human protein targets using publicbioactivity data from GVK and ChEMBL, a subset of marketeddrugs, and a collection of natural products. This matrix wasthen used to construct a similarity network by representingeach compound as a node and connecting two nodes(compounds) if their similarity score was above a thresholdof 0.85. Unconnected nodes were removed, resulting in anetwork with 8288 nodes and 29,024 edges.

In order to extract meaningful clusters of compounds withsimilar biological activity from the network, the MolecularComplex Detection (MCODE) clustering algorithm was used.MCODE uses vertex weighting by local neighborhood densityto extract highly interconnected clusters of nodes. By exam-ining the protein targets hit by the compounds within eachcluster, the statistical enrichment of those targets withincertain biological pathways (defined by GeneGo) or biologicalprocesses (defined by the Gene Ontology (GO) [133]) can becalculated. With alpha equal to 0.05, it was found that 52% ofthe clusters were enriched for at least one canonical pathway(GeneGo Metacore) and 67% of clusters were enriched for atleast one Gene Ontology (GO) biological process. Within thesebiologically enriched clusters, target predictions can beextended to compounds without known targets using the“guilt-by-association” principle. Using this approach we wereable to predict and successfully validate in vitro over 3400compound activities that were not known in the publicdomain. The targets included over 100 different proteins,spanning most druggable protein families.

While, as expected, many of the clusters contained analo-gous compounds which hit the same targets, in many othercases considerable chemical diversity was observed withinclusters. Fig. 6 illustrates two well-known, but structurallydiverse classes of histone deacetylase (HDAC) inhibitors. Thisexample represents a scaffold hop not only from a syntheticcompound to a natural product, but two chemotypes withcompletely different binding modes. Hydroxamic acids such as

Page 14: From in silico target prediction to multi-target drug design: … · 2020. 6. 2. · Ligand-based methods for protein target prediction .....2557 3.1. In silico target ... that the

NO

O

O

ON N

O

O

O O

O

N ON

N

N

O

O

NO

NH

NH

NH

O

O

O

O

ON

O

NH

NH

NH

O

O

O

O

ON

NO

A B

Fig. 6 – Two major classes of known HDAC inhibitors. a) Hydroxamic acids. b) Cyclic tetrapeptides. Multiple members of eachchemotype were found clustered together in the HTS fingerprint similarity network based on their common biological profiles.It is highly unlikely that a 2D structure-basedmethodwould scaffold hop betweenmembers of these two chemotypes, based onan average 2D Tanimoto similarity of less that 0.2 (using ECFP4 fingerprints).

Fig. 5 – Schematic of the HTS fingerprint analysis workflow. Compound HTS fingerprints (a) are compared using a weightedPearson correlation to create an N-by-N similarity matrix (b). This similarity matrix can then be used to create a network(c) where every node (compound) is connected if their similarity score is above 0.85. TheMolecular Complex Detection (MCODE)algorithm can then be used to extract highly interconnected clusters of nodes from the network (d). The targets of thecompounds in these clusters are enriched for specific biological pathways and high level cell processes, despite their significantchemical diversity.

2567J O U R N A L O F P R O T E O M I C S 7 4 ( 2 0 1 1 ) 2 5 5 4 – 2 5 7 4

Page 15: From in silico target prediction to multi-target drug design: … · 2020. 6. 2. · Ligand-based methods for protein target prediction .....2557 3.1. In silico target ... that the

2568 J O U R N A L O F P R O T E O M I C S 7 4 ( 2 0 1 1 ) 2 5 5 4 – 2 5 7 4

trichostatin A reversibly inhibit HDACs by chelating the active-site zinc [134], while cyclic tetrapeptide analogs of trapoxin arecovalent irreversible inhibitors [135]. With an average 2Dsimilarity (Tanimoto coefficient of ECFP4) of less than 0.2between members of the two chemotypes, chemical structurebasedmethod would not likely suggest a biological relationshipbetween these compounds. However, due to their sharedHDACactivity, the compounds have similarity bioactivity profiles andtherefore HTS fingerprint analysis can easily suggest commontargets.

8. Prospective utilization of bioactivity data formulti-target drug design: Finding the right target

While predicting individual targets for compounds isalready of value, it has recently been discovered that thepromiscuity of drugs ‘can also be a virtue’ [136–138]. In thisparadigm shift away from single-target drug discovery tomulti-target drug discovery [139] we would ideally like toselect compounds exhibiting bioactivities against a series ofrelevant targets, with some targets probably easier to targetsimultaneously than others [125,140]. While target predic-tion models – as outline in the preceding sections – can helpdesign suitable chemistry to modulate targets, the questionof which targets should be modulated in the first place will bediscussed here.

Biological function at a systems level (e.g. switching fromone metabolic substrate to another in response to environ-mental stress; or producing an immune response as aconsequence of bacterial invasion) results from the integrat-ed activity of multiple interacting processes that evolutionhas organized into complex networks. Within such networksthe interactors are proteins, lipids, lipoproteins, DNA andRNA, small molecule metabolites, second messengers, anti-gens and so on. Disease states arise when these interactionsare perturbed in specific ways, often by mutations in genesencoding individual proteins. In complex diseases thedisturbance is multifactorial [141] and, for the diseasephenotype to manifest, multiple simultaneous interactinghits probably need to coincide in space and time, such asgenetic background, environmental stress, aging and specificmutation(s). Until recently most effort was focused on tryingto understand such interactions and their pathologicalvariants at the pathway level; and most drug discoveryefforts have targeted single components within these path-ways with drugs intended to exhibit high specificity for thosecomponents. This approach has been notable primarily forits failure to generate significant numbers of new drugs totreat complex diseases. In the case of infectious disease,failures frequently arise due to development of resistance; orbecause of toxicity.

The failure of drugs to yield the desired biological outcomein many cases is the result of two fundamental issues.

1) Biological networks are structurally highly complex and sopredicting the functional outcome of interventions or theconsequence of mutations that must then be addressedpharmacologically is non-trivial [139]. Biological networkshave evolved to be very robust [142]. This is a fundamental

organizing principle of life. Biochemical and physiologicalnetworksneed to beable to continue to function in the face ofcontinual internal and external perturbation. They have thegeneral property of attack tolerance. Random removal ofindividual components of a biological network has surpris-ingly little functional consequence for the network [142]. Tobe effective (or catastrophic depending on one's perspective)interventionswithin a complex biological networkneed to behighly selective.

2) Drugs almost invariably influence more than one target tosome extent either as a consequence of structural similar-ities between the intended target and other protein familymembers (e.g. β-blockers, SSRIs); or through allosteric effectson other proteins; or through genuine multivalent targetbinding [7]. In a recent analysis it was found that every drugwas, on average, accepted to interact with six differenttargets [8]. However if this occurs the net result is usually aseries of unwanted side effects in addition to the desiredeffect; or a treatment failure (and often both). Occasionally adrug's promiscuity and the fortuitous combination ofappropriate high value network targets (and minimalunpleasant off-target effects) converge to produce a treat-ment success. Statins are a good example of this — theyappear to work as much for their anti-inflammatory andimmunomodulatory properties as for the ability to inhibitHMG-CoA reductase for which they have been deliberatelydesigned [143] (andmost people can tolerate themilder side-effects of fatigue, malaise and myalgia, although rhabdo-myolosis and hepatotoxicity prevent their use in all cases).The more highly tuned and less promiscuous a drug is for aparticular target the more important that target has to be innetwork terms for it to have a significant effect. However, thefact that only 1 in 10,000 new drug candidates currentlymakes it to the clinic speaks for itself. So, how can we dobetter? Is it possible to identify those targets within acomplex network whose properties are such that a desiredtherapeutic consequence can be achieved?And can drugs bedesigned or repositioned to influence these targets? Webelieve that the answer to all of these questions is yes.

Networks are amenable to analysis using a branch ofcombinatorialmathematics termed graph theorywhose originsare attributable to Leonhard Euler in the 16th century. Eulerdevised the first topological tools to address the famous “Sevenbridges of Königsberg” problem. Just as for Euler's bridges andislands along the Pregel river, biological networks can berepresented as graphs (points [more commonly termed nodesor vertices] connected by lines [called edges]) representinginformation flow between locations in an interconnectedsystem and the local and global properties of this topologicalmap can be calculated. This information can then be used toidentify sets of interventions that are likely to have the desiredtherapeutic effect and optimal combinations of high valuetargets computed.

We will use the example of finding an effective antibiotic totreat an imaginary and notoriously drug resistant bacterium,BugX, to illustrate the basics of this approach. Fig. 7a shows ahypothetical (and very much simplified) biological networkrepresenting some processes in BugX that we believe to beimportant for its survival and forwhich therearenomammalian

Page 16: From in silico target prediction to multi-target drug design: … · 2020. 6. 2. · Ligand-based methods for protein target prediction .....2557 3.1. In silico target ... that the

Fig. 7 – Protein interaction network of 'BugX' (a). It can be seen that the modulation of a set of four proteins has significantlydifferent effects, depending onwhether they are all 'high-value' targets (c) or a selection of targetswith differing relevance to theoverall topology of the network (b and d). Combining knowledge about chemistry that modulates targets, together withknowledge about which targets to modulate, allows us to design modulators of biological networks with higher efficacy thanmore conventional single-target drugs.

(i) Construct a network model of the processes of interestusingcurrentbiologicalandpathophysiologicalknowledge

2569J O U R N A L O F P R O T E O M I C S 7 4 ( 2 0 1 1 ) 2 5 5 4 – 2 5 7 4

homologs (making human toxicity of any drugs designed toperturb this network less likely).

We first identify and represent the components of thenetwork — in this example the key proteins responsible forsome critical housekeeping functions: cell wall assembly,intracellular protein transport and post-translational mod-ification of cell wall components that are unlikely to haveclose mammalian homologs. These proteins form thevertices of our graph. We then connect the vertices withedges representing information flow. For example — ProteinA activates Protein B, which in turn modifies Protein C topermit it to be transported by Protein G. Protein G also has apermissive effect on Protein C making it easier for it to bindand be modified by Protein B. Protein D inhibits Protein A.Protein B inhibits Protein D. Properties such as the directionof information flow, its net sign and weight are representedby arrows and by numbers associated with the edges. Finallywe compute the importance of the network components ingraph theoretical terms based on this connectivity pattern(with appropriate controls).

In this simple example some unique properties of certainproteins can be recognized by eye, although in real networks amathematical approach is needed. It can be seen that Proteins Gand H have high “Betweenness”. In other words they formimportant conduits between the cluster of proteins producingraw materials (cluster analysis and the identification of func-tional sub groupings is another important aspect of thisapproach) and the cluster responsible for assembling thebacterial cellwall. Inactivating these twoproteinssimultaneously

would immediately sever the coupling between the factory andthe assembly line as illustrated in Fig. 7b. We can also see thatProteins A, B and J have lots of connections — they have high“Degree”; ProteinsFand Jhavehigh “OutDegree”— thedirectionsof the arrows is predominantly outward. This type ofmetric alsodistinguishes important network nodes— so called “Hubs”.

Having established some high value targets we can thenapply various algorithms to calculate the optimal combinationof these targets tomaximally disrupt the network forminimumcost. “Disruption” can be quantified by many different metricsbut connectivity and diameter are commonly used and broadlyequate to how many disconnected pieces the network can befragmented into and the size of thepieces.Obviously this relatesdirectly to functional integrity. Fig. 7c shows the effect ofattacking 4 of the 17 possible targets when the targets are allhigh value (Proteins A, G, H and J) and Fig. 7d when they are amixture of high,moderate and low value (Proteins B, I, P and Q).A compound, or pair of compounds, that could achieve the firstmight make an excellent antibiotic with a low probability ofsubsequent resistance; whereas one that affected the lowervalue group or individual low value targets (the majority ofnodes)wouldhave low efficacy andBugXwould likely be able todevelop resistance by upregulating alternative redundantroutes.

In summary then, the steps required to select proteintargets in this example are:

Page 17: From in silico target prediction to multi-target drug design: … · 2020. 6. 2. · Ligand-based methods for protein target prediction .....2557 3.1. In silico target ... that the

(ii) Subject it to network analysis to identify high valuetargets (details of the method to be published shortly)

(iii) Compute minimum synergistic combinations of targets(details of the method to be published shortly)

(iv) Find multivalent drugs that bind to these targets toexert the desired effecta) by searching chemoproteomic databases for existing

compounds with promising binding signatures, orb) by searching a “virtual” structure–function space,

which also includes the exploration of novel chemicalspace [19].

2570 J O U R N A L O F P R O T E O M I C S 7 4 ( 2 0 1 1 ) 2 5 5 4 – 2 5 7 4

Given the growing relevance of multi-target drug discovery,as well as the increasing amount of bioactivity data availableboth inprivate companiesbutalsopublic databases, theauthors– along with other researchers in the field [18,139,144–147] – areof the opinion that the prospective design of multi-target drugswill be one of the major developments in the pharmaceuticalindustry in the near future. Areas such as the consideration ofdynamic properties of networks will continue to gain impor-tance in the future [148] and some advances in the field arealready being reported [149], however the availability andintegration of date remains currently a challenge.

9. Conclusions

The increasing availability of bioactivity data is a unique chanceto boost success rates in rational drugdevelopment, and to learnfrom the past in order to make better chemical decisions in thefuture. This review attempts to summarize available databases,ligand- and structure-based methods to predict the proteintargets of small molecules, and to present applications of thosetoolswhichhave increased innumber in recent years.While thevolume and quality of information available will continue togrow,weare likely to seemoreandmoresuccessful applicationsof targetprediction algorithms in the future, aswell as increasedintegration of the available data. Given the size of chemicalspace of about 1042 bioactive molecules (estimation of theauthors, currently being published elsewhere) it is likely thattarget prediction methods will be around to supplementexperimental methods for the foreseeable future.

Acknowledgments

Unilever is thanked for funding AK, JK, RCG, PJB andAB.MG, JLJand BS thank Novartis for support.

R E F E R E N C E S

[1] Winau F, Westphal O, Winau R. Paul Ehrlich—in search ofthe magic bullet. Microbes Infect 2004;6:786–9.

[2] Kaufmann SH. Paul Ehrlich: founder of chemotherapy. NatRev Drug Discov 2008;7:373.

[3] Fischer E. Einfluss der Configuration auf die Wirkung derEnzyme. Ber Dtsch Chem Ges 1894;27:2985–93.

[4] Paolini GV, Shapland RH, van Hoorn WP, Mason JS, HopkinsAL. Global mapping of pharmacological space. NatBiotechnol 2006;24:805–15.

[5] Keiser MJ, Roth BL, Armbruster BN, Ernsberger P, Irwin JJ,Shoichet BK. Relating protein pharmacology by ligandchemistry. Nat Biotechnol 2007;25:197–206.

[6] Morphy R, Kay C, Rankovic Z. Frommagic bullets to designedmultiple ligands. Drug Discov Today 2004;9:641–51.

[7] Azzaoui K,Hamon J, Faller B,Whitebread S, Jacoby E, BenderA,et al. Modeling promiscuity based on in vitro safetypharmacology profiling data. ChemMed Chem 2007;2:874–80.

[8] Mestres J, Gregori-Puigjane E, Valverde S, Sole RV. Thetopology of drug–target interaction networks: implicitdependence on drug properties and target families. MolBiosyst 2009;5:1051–7.

[9] Kubinyi H. Drug research: myths, hype and reality. Nat RevDrug Discov 2003;2:665–8.

[10] KlabundeT, EversA.GPCRantitargetmodeling: pharmacophoremodels for biogenic amine binding GPCRs to avoidGPCR-mediated side effects. ChemBioChem 2005;6:876–89.

[11] Kola I, Landis J. Can the pharmaceutical industry reduceattrition rates? Nat Rev Drug Discov 2004;3:711–5.

[12] Hopkins AL. Network pharmacology. Nat Biotechnol 2007;25:1110–1.

[13] Zhang X, Crespo A, Fernandez A. Turning promiscuous kinaseinhibitors into safer drugs. TrendsBiotechnol 2008;26:295–301.

[14] Morphy R. Selectively nonselective kinase inhibition:striking the right balance. J Med Chem 2010;53:1413–37.

[15] Karaman MW, Herrgard S, Treiber DK, Gallant P, AtteridgeCE, Campbell BT, et al. A quantitative analysis of kinaseinhibitor selectivity. Nat Biotechnol 2008;26:127–32.

[16] Bantscheff M, Eberhard D, Abraham Y, Bastuck S, Boesche M,Hobson S, et al. Quantitative chemical proteomics revealsmechanisms of action of clinical ABL kinase inhibitors. NatBiotechnol 2007;25:1035–44.

[17] McLean SR, Gana-Weisz M, Hartzoulakis B, Frow R,Whelan J,Selwood D, et al. Imatinib binding and cKIT inhibition isabrogated by the cKIT kinase domain I missense mutationVal654Ala. Mol Cancer Ther 2005;4:2008–15.

[18] Petrelli A, Giordano S. From single- to multi-target drugs incancer therapy: when aspecificity becomes an advantage.Curr Med Chem 2008;15:422–32.

[19] Kruisselbrink JW, Emmerich MTM, Back T, Bender A,Ijzerman AP, Van der Horst E. Combining aggregation withPareto optimization: a case study in evolutionarymoleculardesign. Lect Notes Comput Sci 2009;5467:453–67.

[20] RollandC,GozalbesR,Nicolai E, PaugamMF,CoussyL, BarbosaF, et al. G-protein-coupled receptor affinity prediction basedon the use of a profiling dataset: QSAR design, synthesis, andexperimental validation. J Med Chem 2005;48:6563–74.

[21] ChEMBL database. Accessible at: http://www.ebi.ac.uk/chembl/.

[22] Bender A. Databases: compound bioactivities go public. NatChem Biol 2010;6:309.

[23] Wang Y, Xiao J, Suzek TO, Zhang J, Wang J, Bryant SH.PubChem: a public information system for analyzingbioactivities of small molecules. Nucleic Acids Res 2009;37:W623–33.

[24] Seiler KP, George GA, HappMP, Bodycombe NE, Carrinski HA,Norton S, et al. ChemBank: a small-molecule screening andcheminformatics resource database. Nucleic Acids Res2008;36:D351–9.

[25] Jenkins JL, Bender A, Davies JW. In silico target fishing:predicting biological targets from chemical structure. DrugDiscovToday: Technol 2007;3:413–21.

[26] Kuhn M, Campillos M, Gonzalez P, Jensen LJ, Bork P.Large-scale prediction of drug–target relationships. FEBS Lett2008;582:1283–90.

[27] Klipp E, Wade RC, Kummer U. Biochemical network-baseddrug–target prediction. Curr Opin Biotechnol 2010;21:511–6.

[28] Williams AJ. Public chemical compound databases. CurrOpin Drug Discov Devel 2008;11:393–404.

Page 18: From in silico target prediction to multi-target drug design: … · 2020. 6. 2. · Ligand-based methods for protein target prediction .....2557 3.1. In silico target ... that the

2571J O U R N A L O F P R O T E O M I C S 7 4 ( 2 0 1 1 ) 2 5 5 4 – 2 5 7 4

[29] Kuhn M, Campillos M, Letunic I, Jensen LJ, Bork P. A sideeffect resource to capture phenotypic effects of drugs. MolSyst Biol 2010;6:343.

[30] Chen B, Dong X, Jiao D, Wang H, Zhu Q, Ding Y, et al.Chem2Bio2RDF: a semantic framework for linking and datamining chemogenomic and systems chemical biology data.BMC Bioinformatics 2010;11:255.

[31] von Eichborn J, Murgueitio MS, Dunkel M, Koerner S, BournePE, Preissner R. PROMISCUOUS: a database fornetwork-based drug-repositioning. Nucleic Acids Res2010;39:D1060–6.

[32] Choi J, Davis MJ, Newman AF, Ragan MA. A semantic webontology for small molecules and their biological targets. JChem Inf Model 2010;50:732–41.

[33] Willett P, Barnard JM, Downs GM. Chemical similaritysearching. J Chem Inf Comput Sci 1998;38:983–96.

[34] Bender A, Glen RC. Molecular similarity: a key technique inmolecular informatics. Org Biomol Chem 2004;2:3204–18.

[35] Martin YC, Kofron JL, Traphagen LM. Do structurally similarmolecules have similar biological activity? J Med Chem2002;45:4350–8.

[36] Sastry M, Lowrie JF, Dixon SL, Sherman W. Large-scalesystematic analysis of 2D fingerprint methods andparameters to improve virtual screening enrichments. JChem Inf Model 2010;50:771–84.

[37] Bender A, Jenkins JL, Scheiber J, Sukuru SC, Glick M, DaviesJW. How similar are similarity searching methods? Aprincipal component analysis of molecular descriptor space.J Chem Inf Model 2009;49:108–19.

[38] Bender A. How similar are those molecules after all? Use twodescriptors and you will have three different answers. ExpOp Drug Disc 2010;5:1141–51.

[39] Liu T, Lin Y, Wen X, Jorissen RN, Gilson MK. BindingDB: aweb-accessible database of experimentally determinedprotein–ligand binding affinities. Nucleic Acids Res 2007;35:D198–201.

[40] Nidhi, Glick M, Davies JW, Jenkins JL. Prediction of biologicaltargets for compounds using multiple-category Bayesianmodels trained on chemogenomics databases. J Chem InfModel 2006;46:1124–33.

[41] Bender A, Mussa HY, Glen RC, Reiling S. Molecular similaritysearching using atom environments, information-basedfeature selection, and a naive Bayesian classifier. J Chem InfComput Sci 2004;44:170–8.

[42] Nigsch F, Bender A, Jenkins JL, Mitchell JB. Ligand–targetprediction using Winnow and naive Bayesian algorithmsand the implications of overall performance statistics. JChem Inf Model 2008;48:2313–25.

[43] Poroikov V, Filimonov D, Lagunin A, Gloriozova T, ZakharovA. PASS: identification of probable targets and mechanismsof toxicity. SAR QSAR Environ Res 2007;18:101–10.

[44] Keiser MJ, Setola V, Irwin JJ, Laggner C, Abbas AI, Hufeisen SJ,et al. Predicting new molecular targets for known drugs.Nature 2009;462:175–81.

[45] DeGraw AJ, Keiser MJ, Ochocki JD, Shoichet BK, DistefanoMD. Prediction and evaluation of protein farnesyltransferaseinhibition by commercial drugs. J Med Chem.53:2464–71.

[46] Wale N, Karypis G. Target fishing for chemical compoundsusing target–ligand activity data and ranking basedmethods. J Chem Inf Model 2009;49:2190–201.

[47] Plewczynski D, von Grotthuss M, Spieser SA, Rychlewski L,Wyrwicz LS, Ginalski K, et al. Target specific compoundidentification using a support vector machine. Comb ChemHigh Throughput Screen 2007;10:189–96.

[48] Bender A, Young DW, Jenkins JL, Serrano M, Mikhailov D,Clemons PA, et al. Chemogenomic data analysis: predictionof small-molecule targets and the advent of biologicalfingerprint. Comb Chem High-Throughput Screen 2007;10:719–31.

[49] Harris CJ, Stevens AP. Chemogenomics: structuring the drugdiscovery process to gene families. Drug Discov Today2006;11:880–8.

[50] Jacoby E, Mozzarelli A. Chemogenomic strategies to expandthe bioactive chemical space. Curr Med Chem 2009;16:4374–81.

[51] Mestres J. Computational chemogenomics approaches tosystematic knowledge-based drug discovery. Curr Opin DrugDiscov Devel 2004;7:304–13.

[52] van der Horst E, Okuno Y, Bender A, AP IJ. Substructuremining of GPCR ligands reveals activity-class specificfunctional groups in an unbiased manner. J Chem Inf Model2009;49:348–60.

[53] van der Horst E, Peironcely JE, Ijzerman AP, Beukers MW,Lane JR, van Vlijmen HW, et al. A novel chemogenomicsanalysis of G protein-coupled receptors (GPCRs) and theirligands: a potential strategy for receptor de-orphanization.BMC Bioinformatics 2010;11:316.

[54] Jacob L, Hoffmann B, Stoven V, Vert JP. Virtual screening ofGPCRs: an in silico chemogenomics approach. BMCBioinformatics 2008;9:363.

[55] Zhao S, Li S. Network-based relating pharmacological andgenomic spaces for drug target identification. PLoS One.5:e11764.

[56] Weill N, Rognan D. Development and validation of a novelprotein–ligand fingerprint to mine chemogenomic space:application to G protein-coupled receptors and their ligands.J Chem Inf Model 2009;49:1049–62.

[57] Gregori-Puigjane E, Mestres J. SHED: Shannon entropydescriptors from topological feature distributions. J Chem InfModel 2006;46:1615–22.

[58] Delgado-Soler L, Toral R, Tomas MS, Rubio-Martinez J. RED: aset of molecular descriptors based on Renyi entropy. J ChemInf Model 2009;49:2457–68.

[59] Gregori-Puigjane E, Mestres J. A ligand-based approach tomining the chemogenomic space of drugs. Comb ChemHighThroughput Screen 2008;11:669–76.

[60] Hu Y, Bajorath J. Combining horizontal and vertical sub-structure relationships in scaffold hierarchies for activityprediction. J Chem Inf Model 2011;51:248–57.

[61] Adamson GW, Bush JA. Method for relating the structure andproperties of chemical compounds. Nature 1974;248:406–7.

[62] PipelinePilot. San Diego: Accelrys; 2007.[63] Rollinger JM. Accessing target information by virtual parallel

screening—the impact on natural product research.Phytochem Lett 2009;2:53–8.

[64] KNIME, Knime.com, Zurich, Switzerland.[65] Inte:Ligand PharmacophoreDB, Inte:Ligand, Vienna, Austria.[66] Wolber G, Langer T. LigandScout: 3-D pharmacophores

derived from protein-bound ligands and their use as virtualscreening filters. J Chem Inf Model 2005;45:160–9.

[67] Sayle RA. So you think you understand tautomerism? JComput Aided Mol Des 2010;24:485–96.

[68] Liu X, Ouyang S, Yu B, Liu Y, Huang K, Gong J, et al.PharmMapper server: a web server for potential drug targetidentification using pharmacophore mapping approach.Nucleic Acids Res 2010;38:W609–14.

[69] Nettles JH, Jenkins JL, Bender A, Deng Z, Davies JW, Glick M.Bridging chemical and biological space: “target fishing”using 2D and 3D molecular descriptors. J Med Chem 2006;49:6802–10.

[70] Nettles JH, Jenkins JL, Williams C, Clark AM, Bender A, DengZ, et al. Flexible 3D pharmacophores as descriptors ofdynamic biological space. J Mol Graph Model 2007;26:622–33.

[71] Kinnings SL, Jackson RM. ReverseScreen3D: a structure-based ligand matching method to identify protein targets. JChem Inf Model 2011;51:624–34.

[72] Bender A, Mikhailov D, Glick M, Scheiber J, Davies JW,Cleaver S, et al. Use of ligand based models for protein

Page 19: From in silico target prediction to multi-target drug design: … · 2020. 6. 2. · Ligand-based methods for protein target prediction .....2557 3.1. In silico target ... that the

2572 J O U R N A L O F P R O T E O M I C S 7 4 ( 2 0 1 1 ) 2 5 5 4 – 2 5 7 4

domains to predict novel molecular targets and applicationsto triage affinity chromatography data. J Proteome Res2009;8:2575–85.

[73] Prathipati P, Ma NL, Manjunatha UH, Bender A. Fishing thetarget of antitubercular compounds: in silico targetdeconvolution model development and validation.J Proteome Res 2009;8:2788–98.

[74] Strombergsson H, Daniluk P, Kryshtafovych A, Fidelis K,Wikberg JE, Kleywegt GJ, et al. Interaction model based onlocal protein substructures generalizes to the entirestructural enzyme–ligand space. J Chem Inf Model 2008;48:2278–88.

[75] vanWesten GJP, Wegner JK, IJzerman AP, Van Vlijmen HWT,Bender A. Proteochemometric modeling as a tool fordesigning selective compounds and extrapolating to noveltargets. Med Chem Commun 2010;2:16–30.

[76] Lapinsh M, Prusis P, Gutcaits A, Lundstedt T, Wikberg JES.Development of proteo-chemometrics: a novel technologyfor the analysis of drug–receptor interactions. BiochimBiophys Acta-Gen Subj 2001;1525:180–90.

[77] Faulon JL, Misra M, Martin S, Sale K, Sapra R. Genome scaleenzyme–metabolite and drug–target interaction predictionsusing the signature molecular descriptor. Bioinformatics2008;24:225–33.

[78] Rognan D. Structure-based approaches to target fishing andligand profiling. Mol Inf 2010;29:176–87.

[79] Chen X, Ung CY, Chen YZ. Can an in silico drug–target searchmethod be used to probe potential mechanisms ofmedicinalplant ingredients? Nat Prod Rep 2003;20:432–44.

[80] Gao Z, Li H, Zhang H, Liu X, Kang L, Luo X, et al. PDTD: aweb-accessible protein database for drug targetidentification. BMC Bioinformatics 2008;9:104.

[81] Kellenberger E, Rodrigo J, Muller P, Rognan D. Comparativeevaluation of eight docking tools for docking and virtualscreening accuracy. Proteins 2004;57:225–42.

[82] Macchiarulo A, Nobeli I, Thornton JM. Ligand selectivity andcompetition between enzymes in silico. Nat Biotechnol2004;22:1039–45.

[83] Warren GL, Andrews CW, Capelli A-M, Clarke B, LaLonde J,Lambert MH, et al. A critical assessment of docking programsand scoring functions. J Med Chem 2006;49:5912–31.

[84] Woods CJ, King MA, Essex JW. The configurational depen-dence of binding free energies: a Poisson–Boltzmann studyof Neuraminidase inhibitors. J Comput Aided Mol Des2001;15:129–44.

[85] Karplus M, McCammon JA. Molecular dynamics simulationsof biomolecules. Nat Struct Biol 2002;9:646–52.

[86] Kollman P. Free energy calculations: applications to chemicaland biochemical phenomena. Chem Rev 1993;93:2395–417.

[87] Shirts MR, Mobley DL, Chodera JD. Alchemical free energycalculations: ready for prime time? Ann Rep Comp Chem2007;3:41–59.

[88] Chipot C. Free energy calculations in biological systems. Howuseful are they in practice? Lect Notes Comput Sci Eng2006;49:185–211.

[89] Bender A, Scheiber J, Glick M, Davies JW, Azzaoui K, Hamon J,et al. Analysis of pharmacology data and the prediction ofadverse drug reactions and off-target effects from chemicalstructure. ChemMedChem 2007;2:861–73.

[90] Schneider G, Tanrikulu Y, Schneider P. Self-organizingmolecular fingerprints: a ligand-based view on drug-likechemical space and off-target prediction. Future Med Chem2009;1:213–8.

[91] Poroikov V, Akimov D, Shabelnikova E, Filimonov D. Top 200medicines: can new actions be discovered throughcomputer-aided prediction? SAR QSAR Environ Res 2001;12:327–44.

[92] Geronikaki AA, Dearden JC, Filimonov D, Galaeva I, GaribovaTL, Gloriozova T, et al. Design of new cognition enhancers:

from computer prediction to synthesis and biologicalevaluation. J Med Chem 2004;47:2870–6.

[93] Crisman TJ, Parker CN, Jenkins JL, Scheiber J, ThomaM, KangZB, et al. Understanding false positives in reporter geneassays: in silico chemogenomics approaches to prioritizecell-based HTS data. J Chem Inf Model 2007;47:1319–27.

[94] Young DW, Bender A, Hoyt J, McWhinnie E, Chirn GW, TaoCY, et al. Integrating high-content screening andligand–target prediction to identify mechanism of action.Nat Chem Biol 2008;4:59–68.

[95] Feng Y, Mitchison TJ, Bender A, Young DW, Tallarico JA.Multi-parameter phenotypic profiling: using cellular effectsto characterize small-molecule compounds. Nat Rev DrugDiscov 2009;8:567–78.

[96] Kummel A, Gabriel D, Parker CN, Bender A. Computationalmethods to support high-content screening: fromcompound selection and data analysis to postulating targethypotheses. Exp Op Drug Disc 2009;4:5–13.

[97] Roth BL, Lopez E, Beischel S, Westkaemper RB, Evans JM.Screening the receptorome to discover the molecular targetsfor plant-derived psychoactive compounds: a novelapproach for CNS drug discovery. Pharmacol Ther 2004;102:99–110.

[98] Vidal D, Mestres J. In silico receptorome screening ofantipsychotic drugs. Mol Inf 2010;29:543–51.

[99] Lagunin A, Filimonov D, Poroikov V. Multi-targeted naturalproducts evaluation based on biological activity predictionwith PASS. Curr Pharm Des 2010;16:1703–17.

[100] Sato T, Matsuo Y, Honma T, Yokoyama S. In silico functionalprofiling of small molecules and its applications. J MedChem 2008;51:7705–16.

[101] He Z, Zhang J, Shi XH, Hu LL, Kong X, Cai YD, et al. Predictingdrug–target interaction networks based on functionalgroups and biological features. PLoS One 2010;5:e9603.

[102] Bleakley K, Yamanishi Y. Supervised prediction ofdrug–target interactions using bipartite local models.Bioinformatics 2009;25:2397–403.

[103] Chen L, Qian Z, Fen K, Cai Y. Prediction of interactivenessbetween small molecules and enzymes by combining geneontology and compound similarity. J Comput Chem 2010;31:1766–76.

[104] Schuster D. 3D pharmacophores as tools for activityprofiling. Drug Disc Today: Technol 2010;7:e205–11.

[105] Steindl TM, Schuster D, Laggner C, Langer T. Parallelscreening: a novel concept in pharmacophore modeling andvirtual screening. J Chem Inf Model 2006;46:2146–57.

[106] Steindl TM, Schuster D, Laggner C, Chuang K, Hoffmann RD,Langer T. Parallel screening and activity profiling with HIVprotease inhibitor pharmacophoremodels. J Chem Inf Model2007;47:563–71.

[107] SchusterD, Laggner C, Steindl TM, LangerT.Development andvalidation of an in silico P450 profiler based on pharmacophoremodels. Curr Drug Discov Technol 2006;3:1–48.

[108] Waltenberger B, Schuster D, Paramapojn S, Gritsanapan W,Wolber G, Rollinger JM, et al. Predicting cyclooxygenaseinhibition by three-dimensional pharmacophoric profiling.Part II: identification of enzyme inhibitors from Prasaplai, aThai traditional medicine. Phytomedicine 2010;18:119–33.

[109] Ehrman TM, Barlow DJ, Hylands PJ. In silico search formulti-target anti-inflammatories in Chinese herbs andformulas. Bioorg Med Chem 2010;18:2204–18.

[110] Rollinger JM, Schuster D, Danzl B, Schwaiger S, Markt P,Schmidtke M, et al. In silico target fishing for rationalizedligand discovery exemplified on constituents of Rutagraveolens. Planta Med 2009;75:195–204.

[111] Muller P, Lena G, Boilard E, Bezzine S, Lambeau G, GuichardG, et al. In silico-guided target identification of ascaffold-focused library: 1,3,5-triazepan-2,6-diones as novelphospholipase A2 inhibitors. J Med Chem 2006;49:6768–78.

Page 20: From in silico target prediction to multi-target drug design: … · 2020. 6. 2. · Ligand-based methods for protein target prediction .....2557 3.1. In silico target ... that the

2573J O U R N A L O F P R O T E O M I C S 7 4 ( 2 0 1 1 ) 2 5 5 4 – 2 5 7 4

[112] Chen X, ZhouH, Liu YB,Wang JF, Li H, Ung CY, et al. Databaseof traditional Chinese medicine and its application tostudies of mechanism and to prescription validation. Br JPharmacol 2006;149:1092–103.

[113] Chen YZ, Ung CY. Prediction of potential toxicity and sideeffect protein targets of a small molecule by a ligand–proteininverse docking approach. J Mol Graph Model 2001;20:199–218.

[114] Chen YZ, Zhi DG. Ligand–protein inverse docking and itspotential use in the computer search of protein targets of asmall molecule. Proteins 2001;43:217–26.

[115] Zahler S, Tietze S, Totzke F, Kubbutat M, Meijer L, VollmarAM, et al. Inverse in silico screening for identification ofkinase inhibitor targets. Chem Biol 2007;14:1207–14.

[116] MacDonald ML, Lamerdin J, Owens S, Keon BH, Bilter GK,Shang Z, et al. Identifying off-target effects and hiddenphenotypes of drugs in human cells. Nat Chem Biol 2006;2:329–37.

[117] Ji ZL, Wang Y, Yu L, Han LY, Zheng CJ, Chen YZ. In silicosearch of putative adverse drug reaction related proteins as apotential tool for facilitating drug adverse effect prediction.Toxicol Lett 2006;164:104–12.

[118] Olivero-Verbel J, Cabarcas-Montalvo M, Ortega-Zuniga C.Theoretical targets for TCDD: a bioinformatics approach.Chemosphere 2010;80:1160–6.

[119] Lamb J, Crawford ED, Peck D, Modell JW, Blat IC, Wrobel MJ,et al. The Connectivity Map: using gene-expressionsignatures to connect small molecules, genes, and disease.Science 2006;313:1929–35.

[120] Li L, Zhou X, Ching WK, Wang P. Predicting enzyme targetsfor cancer drugs by profiling human metabolic reactions inNCI-60 cell lines. BMC Bioinformatics 2010;11:501.

[121] Crisman TJ, Jenkins JL, Parker CN, Hill WA, Bender A, Deng Z,et al. “Plate cherry picking”: a novel semi-sequentialscreening paradigm for cheaper, faster, information-richcompound selection. J Biomol Screen 2007;12:320–7.

[122] Boucher HW, Talbot GH, Bradley JS, Edwards JE, Gilbert D,Rice LB, et al. Bad bugs, no drugs: no ESKAPE! An update fromthe Infectious Diseases Society of America. Clin Infect Dis2009;48:1–12.

[123] Barb AW, Jiang L, Raetz CR, Zhou P. Structure of thedeacetylase LpxC bound to the antibiotic CHIR-090:time-dependent inhibition and specificity in ligand binding.Proc Natl Acad Sci U S A 2007;104:18433–8.

[124] Maggiora GM, Petke JD, Mestres J. A general analysis offield-based molecular similarity indices. J Math Chem2002;31:251–70.

[125] Bender A, Jenkins JL, Glick M, Deng Z, Nettles JH, Davies JW.“Bayes Affinity Fingerprints” improve retrieval rates invirtual screening and define orthogonal bioactivity space:when are multitarget drugs a feasible concept? J Chem InfModel 2006;46:2445–56.

[126] Maggiora GM. On outliers and activity cliffs—why QSARoften disappoints. J Chem Inf Model 2006;46:1535.

[127] Medina-Franco JL, Martinez-Mayorga K, Bender A, Marin RM,Giulianotti MA, Pinilla C, et al. Characterization of activitylandscapes using 2D and 3D similarity methods: consensusactivity cliffs. J Chem Inf Model 2009;49:477–91.

[128] Tiikkainen P, Markt P, Wolber G, Kirchmair J, Distinto S,Poso A, et al. Critical comparison of virtual screeningmethods against the MUV data set. J Chem Inf Model2009;49:2168–78.

[129] Kauvar LM, Higgins DL, Villar HO, Sportsman JR,Engqvist-Goldstein A, Bukar R, et al. Predicting ligandbinding to proteins by affinity fingerprinting. Chem Biol1995;2:107–18.

[130] Fliri AF, Loging WT, Thadeio PF, Volkmann RA. Biologicalspectra analysis: linking biological activity profiles tomolecular structure. Proc Natl Acad Sci U S A 2005;102:261–6.

[131] Fliri AF, Loging WT, Thadeio PF, Volkmann RA. Analysis ofdrug-induced effect patterns to link structure and sideeffects of medicines. Nat Chem Biol 2005;1:389–97.

[132] Plouffe D, Brinker A, McNamara C, Henson K, Kato N, KuhenK, et al. In silico activity profiling reveals the mechanism ofaction of antimalarials discovered in a high-throughputscreen. Proc Natl Acad Sci U S A 2008;105:9059–64.

[133] Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, CherryJM, et al. Gene ontology: tool for the unification of biology.The Gene Ontology Consortium. Nat Genet 2000;25:25–9.

[134] Finnin MS, Donigian JR, Cohen A, Richon VM, Rifkind RA,Marks PA, et al. Structures of a histone deacetylasehomologue bound to the TSA and SAHA inhibitors. Nature1999;401:188–93.

[135] Kijima M, Yoshida M, Sugita K, Horinouchi S, Beppu T.Trapoxin, an antitumor cyclic tetrapeptide, is an irreversibleinhibitor of mammalian histone deacetylase. J Biol Chem1993;268:22429–35.

[136] Mencher SK, Wang LG. Promiscuous drugs compared toselective drugs (promiscuity can be a virtue). BMC ClinPharmacol 2005;5:3.

[137] Wermuth CG. Multitargeted drugs: the end of the“one-target-one-disease” philosophy? Drug Discov Today2004;9:826–7.

[138] Wermuth CG. Selective optimization of side activities: theSOSA approach. Drug Discov Today 2006;11:160–4.

[139] Csermely P, Agoston V, Pongor S. The efficiency ofmulti-target drugs: the network approach might help drugdesign. Trends Pharmacol Sci 2005;26:178–82.

[140] Morphy R, Rankovic Z. Designed multiple ligands. Anemerging drug discovery paradigm. J Med Chem 2005;48:6523–43.

[141] Epplen JT, Buitkamp J, Bocker T, Epplen C. Indirect genediagnoses for complex (multifactorial) diseases—a review.Gene 1995;159:49–55.

[142] Kartal O, Ebenhoh O. Ground state robustness as anevolutionary design principle in signaling networks. PLoSOne 2009;4:e8001.

[143] Shovman O, Levy Y, Gilburd B, Shoenfeld Y.Antiinflammatory and immunomodulatory properties ofstatins. Immunol Res 2002;25:271–85.

[144] Jia J, Zhu F, Ma X, Cao Z, Li Y, Chen YZ. Mechanisms of drugcombinations: interaction and network perspectives. NatRev Drug Discov 2009;8:111–28.

[145] MorphyR,RankovicZ.Designingmultiple ligands—medicinalchemistry strategies and challenges. Curr Pharm Des 2009;15:587–600.

[146] Schrattenholz A, Groebe K, Soskic V. Systems biologyapproaches and tools for analysis of interactomes andmulti-target drugs. Methods Mol Biol 2010;662:29–58.

[147] Ma XH, Shi Z, Tan C, Jiang Y, Go ML, Low BC, et al. In-silicoapproaches to multi-target drug discovery: computer aidedmulti-target drug design, multi-target virtual screening.Pharm Res 2010;27:739–49.

[148] Przytycka TM, Singh M, Slonim DK. Toward the dynamicinteractome: it's about time. Brief Bioinform 2010;11:15–29.

[149] Good DM, Zubarev RA. Drug target identification fromprotein dynamics using quantitative pathway analysis.J Proteome Res. 2011;10:2679–83.

[150] Wishart DS, Knox C, Guo AC, Shrivastava S, Hassanali M,Stothard P, et al. DrugBank: a comprehensive resource for insilico drug discovery and exploration. Nucleic Acids Res2006;34:D668–72.

[151] Mattingly CJ, Colby GT, Forrest JN, Boyer JL. The ComparativeToxicogenomics Database (CTD). Environ Health Perspect2003;111:793–5.

[152] Gunther S, Kuhn M, Dunkel M, Campillos M, Senger C,Petsalaki E, et al. SuperTarget and Matador: resources for

Page 21: From in silico target prediction to multi-target drug design: … · 2020. 6. 2. · Ligand-based methods for protein target prediction .....2557 3.1. In silico target ... that the

2574 J O U R N A L O F P R O T E O M I C S 7 4 ( 2 0 1 1 ) 2 5 5 4 – 2 5 7 4

exploring drug–target relationships. Nucleic Acids Res2008;36:D919–22.

[153] Chen X, Ji ZL, Chen YZ. TTD: Therapeutic Target Database.Nucleic Acids Res 2002;30:412–5.

[154] LimE, PonA,DjoumbouY,KnoxC, ShrivastavaS,GuoAC, et al.T3DB: a comprehensively annotated database of commontoxins and their targets. Nucleic Acids Res 2010;38:D781–6.

[155] Lagunin A, Zakharov A, Filimonov D, Poroikov V. QSARmodelling of rat acute toxicity on the basis of PASSprediction. Mol Inf 2011;30:241–50.

[156] Geronikaki A, Druzhilovsky D, Zakharov A, Poroikov V.Computer-aided prediction for medicinal chemistry via theInternet. SAR QSAR Environ Res 2008;19:27–38.

[157] Bond PJ, Faraldo-Gomez JD. Molecular mechanismof selective recruitment of SYK kinases by the membraneantigen-receptor complex. J Biol Chem 2011;286:25872–81.