know more before you score: an analysis of structure-based virtual screening protocols ä...

22
Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols Structure-Based Virtual Screening (SBVS) is a proven Structure-Based Virtual Screening (SBVS) is a proven technique for lead discovery technique for lead discovery Still many areas for improvement Still many areas for improvement Efforts generally focussed on scoring function Efforts generally focussed on scoring function Often with little consideration of the assumptions underpinning SBVS Often with little consideration of the assumptions underpinning SBVS Here we consider a number of these processes in detail Here we consider a number of these processes in detail from the perspective of our primary SBVS tool (DOCK) from the perspective of our primary SBVS tool (DOCK) Ligand conformational search protocols Ligand conformational search protocols Varying site points definitions Varying site points definitions Alteration of DOCK variables that directly affect sampling Alteration of DOCK variables that directly affect sampling Determine their impact on hit enrichment and search speed Determine their impact on hit enrichment and search speed Analyze implications for future research Analyze implications for future research

Upload: shannon-lane

Post on 02-Jan-2016

223 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols ä Structure-Based Virtual Screening (SBVS) is a proven technique

Know More Before You Score: An Analysis of Structure-Based Virtual

Screening Protocols

Know More Before You Score: An Analysis of Structure-Based Virtual

Screening Protocols

Structure-Based Virtual Screening (SBVS) is a proven technique for Structure-Based Virtual Screening (SBVS) is a proven technique for lead discoverylead discovery

Still many areas for improvementStill many areas for improvement Efforts generally focussed on scoring functionEfforts generally focussed on scoring function

Often with little consideration of the assumptions underpinning SBVSOften with little consideration of the assumptions underpinning SBVS

Here we consider a number of these processes in detail from the Here we consider a number of these processes in detail from the perspective of our primary SBVS tool (DOCK) perspective of our primary SBVS tool (DOCK) Ligand conformational search protocolsLigand conformational search protocols Varying site points definitionsVarying site points definitions Alteration of DOCK variables that directly affect sampling Alteration of DOCK variables that directly affect sampling

Determine their impact on hit enrichment and search speedDetermine their impact on hit enrichment and search speed Analyze implications for future researchAnalyze implications for future research

Page 2: Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols ä Structure-Based Virtual Screening (SBVS) is a proven technique

Ligand Flexibility StudiesStrategy

Ligand Flexibility StudiesStrategy

SBVS CPU intensiveSBVS CPU intensive Conformational searching of ligand clearly importantConformational searching of ligand clearly important

Sampling limited to allow search completion in reasonable time frameSampling limited to allow search completion in reasonable time frame

Test required to compare different conformational sampling Test required to compare different conformational sampling methodsmethods Ability to reproduce bioactive conformation testedAbility to reproduce bioactive conformation tested

145 ligands from a 1995 analysis of pdb complexes (Gschwend UCSF 145 ligands from a 1995 analysis of pdb complexes (Gschwend UCSF unpublished)unpublished)

30 compound subset chosen for analysis- selection based on visual and 30 compound subset chosen for analysis- selection based on visual and numerical inspection of diversity in ligand flexibility and functionality numerical inspection of diversity in ligand flexibility and functionality

Relatively small sample of molecules used, many peptidic in natureRelatively small sample of molecules used, many peptidic in nature Peptidic moieties are among the better parameterized systems, so Peptidic moieties are among the better parameterized systems, so

this is in some ways a best case scenario this is in some ways a best case scenario

Page 3: Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols ä Structure-Based Virtual Screening (SBVS) is a proven technique

Ligand Flexibility StudiesProcedure

Ligand Flexibility StudiesProcedure

Multiple sampling techniques chosen:Multiple sampling techniques chosen:

Catalyst-best / Catalyst-fast / Confort / Omega / DOCKCatalyst-best / Catalyst-fast / Confort / Omega / DOCK Variety of sampling levels Variety of sampling levels Starting from Concord structure, conformers generated Starting from Concord structure, conformers generated

and superimposed onto pdb ligand conformation. and superimposed onto pdb ligand conformation. Conformation with lowest heavy atom RMS to used as quality Conformation with lowest heavy atom RMS to used as quality

measure measure

Page 4: Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols ä Structure-Based Virtual Screening (SBVS) is a proven technique

Ligand Flexibility StudiesSearch Settings EmployedLigand Flexibility Studies

Search Settings Employed

Dock - Dock - conformation_cutoff_factor=3/5/10 clash_overlapconformation_cutoff_factor=3/5/10 clash_overlap==0.7 times 0.7 times vdW radius for clash overlap with customized rules for bond increment vdW radius for clash overlap with customized rules for bond increment settingssettings

Confort - Confort - Rough (0.10 kcal) convergence, diverse conformer selection, Rough (0.10 kcal) convergence, diverse conformer selection, boat ring search on - sampling at 5/10 confs per single bond + 500 max boat ring search on - sampling at 5/10 confs per single bond + 500 max

Catalyst- Best/Fast Catalyst- Best/Fast Default settings - sampling at Default settings - sampling at 5/10 confs per 5/10 confs per single bond + 100 max single bond + 100 max

Omega: Omega: Defaults +Defaults + RMS_CUTOFF=1.0, GP_ENERGY_WINDOW=5.0, RMS_CUTOFF=1.0, GP_ENERGY_WINDOW=5.0, sampling at 100 maxsampling at 100 max

In addition Concord generated and Sybyl minimized ligand xray structures In addition Concord generated and Sybyl minimized ligand xray structures also analyzed as “controls”also analyzed as “controls”

Page 5: Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols ä Structure-Based Virtual Screening (SBVS) is a proven technique

Ligand Flexibility Results Overall Performance - RMS/ Rank

Ligand Flexibility Results Overall Performance - RMS/ Rank

0.76 0.81 0.88 0.92 0.870.97 0.96 0.99 0.99 1.00 1.03

1.13

1.76

0.00

2.00

4.00

6.00

8.00

10.00

12.00

14.00

Ave

rag

e in

tern

al r

ank

0.000.200.400.600.801.001.201.401.601.80

Ave

rag

e R

MS

dev

iati

on

Average internal rank

Average rms deviation

Page 6: Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols ä Structure-Based Virtual Screening (SBVS) is a proven technique

Ligand Flexibility ResultsPerformance vs FlexibilityLigand Flexibility ResultsPerformance vs Flexibility

0

0.5

1

1.5

2

2.5

Av

erag

e R

MS

D

evia

tion

3 to 5 single bonds (15)6 to 8 single bonds (7)9 to 14 single bonds (8)

Page 7: Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols ä Structure-Based Virtual Screening (SBVS) is a proven technique

Ligand Flexibility Results The Pain Gain Ratio

Ligand Flexibility Results The Pain Gain Ratio

Does extra noise introduced to scoring functions outweigh this Does extra noise introduced to scoring functions outweigh this improvement? Is it worth the extra CPU?improvement? Is it worth the extra CPU?

425

0.81 0.87 0.88 0.92 0.96 0.97 1.031.125

0102030405060708090

100

CO

NF 5

00

BE

ST 1

00

FAS

T 10

0

CO

NF1

0

DO

CK

10

FAS

T 5

DO

CK

5

DO

CK

3

Search Types

Co

nfo

rmat

ion

s /

mo

lecu

le

0.000.200.400.600.801.001.201.401.601.80

RM

S d

evia

tion

Average conformations / moleculeAverage rms deviation

Page 8: Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols ä Structure-Based Virtual Screening (SBVS) is a proven technique

Ligand Flexibility ResultsVisual Analysis

Ligand Flexibility ResultsVisual Analysis

Even at lower RMS, deviation in hydrogen positions an issueEven at lower RMS, deviation in hydrogen positions an issue As RMS rises (0.9) we begin to see more significant deviations in heavy As RMS rises (0.9) we begin to see more significant deviations in heavy

atom positions - large enough to possibly prove troublesome to atom positions - large enough to possibly prove troublesome to standard force fieldsstandard force fields

RMS=0.65 RMS=0.90

Page 9: Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols ä Structure-Based Virtual Screening (SBVS) is a proven technique

Ligand Flexibility ResultsVisual Analysis

Ligand Flexibility ResultsVisual Analysis

As RMS rises further, hydrogen bond mapping begins to partially break downAs RMS rises further, hydrogen bond mapping begins to partially break down Significant deviation begins to be seen although general shape Significant deviation begins to be seen although general shape

complementarity is still reasonablecomplementarity is still reasonable DOCKing tricky, pharmacophore searches possible with loose tolerances, although DOCKing tricky, pharmacophore searches possible with loose tolerances, although

site point vector definitions (DISCO / Catalyst) a no nosite point vector definitions (DISCO / Catalyst) a no no

RMS=2.19RMS=1.55

Page 10: Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols ä Structure-Based Virtual Screening (SBVS) is a proven technique

Ligand FlexibilityConclusions

Ligand FlexibilityConclusions

At current sampling levels used in virtual screeningAt current sampling levels used in virtual screening Rough search techniques perform comparably to more exhaustive methodsRough search techniques perform comparably to more exhaustive methods

Dock performs quite well, and Fast does slightly better than comparable Best runDock performs quite well, and Fast does slightly better than comparable Best run Results highlight the need for “forgiving” scoring functions and pharmacophore Results highlight the need for “forgiving” scoring functions and pharmacophore

constraint tolerances (especially for flexible molecules)constraint tolerances (especially for flexible molecules) Generating function directly from crystal structure data may not be optimumGenerating function directly from crystal structure data may not be optimum

Use the conformation closest to the biologically relevant structure with chosen Use the conformation closest to the biologically relevant structure with chosen sampling techniquesampling technique

May be better to ignore more flexible molecules when possible (~>8 bonds)May be better to ignore more flexible molecules when possible (~>8 bonds)

Analysis of more extensive data set might provide basis for determining if Analysis of more extensive data set might provide basis for determining if optimum sampling settings exist (Best/Omega/Confort)optimum sampling settings exist (Best/Omega/Confort) Coarseness of poling values for exampleCoarseness of poling values for example

Page 11: Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols ä Structure-Based Virtual Screening (SBVS) is a proven technique

Structure-Based Search ProtocolsAn Analysis of DOCK

Structure-Based Search ProtocolsAn Analysis of DOCK

Working within current DOCK paradigm, what search Working within current DOCK paradigm, what search protocols provide optimum search criterion?protocols provide optimum search criterion? Site point definitionsSite point definitions Alteration of sampling variablesAlteration of sampling variables Different scoring grids Different scoring grids

Comparisons illustrated for 5 test systems with Comparisons illustrated for 5 test systems with diverse active data sets diverse active data sets

Analysis based on ranking within list that includes Analysis based on ranking within list that includes ~10000 “noise” compounds ~10000 “noise” compounds

““Random” selection within bounds of size and flexibility Random” selection within bounds of size and flexibility distribution seen in in-house databasedistribution seen in in-house database

Page 12: Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols ä Structure-Based Virtual Screening (SBVS) is a proven technique

Structure-Based Search ProtocolsDOCK variables

Structure-Based Search ProtocolsDOCK variables

Contains many variables that effect performance Contains many variables that effect performance Ligand sampling within the site being the primary variantLigand sampling within the site being the primary variant

nodesnodes 3/4 3/4

distance_tolerance 0.5/1.0distance_tolerance 0.5/1.0

distance_minimum 3.0distance_minimum 3.0

bump_filter 4bump_filter 4

conformation_cutoff_factor 5conformation_cutoff_factor 5

clash_overlap 0.7clash_overlap 0.7

maximum_orientations 500/5000maximum_orientations 500/5000

Page 13: Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols ä Structure-Based Virtual Screening (SBVS) is a proven technique

Structure-Based Search ProtocolsDOCK and pharmacophoric constraints

Structure-Based Search ProtocolsDOCK and pharmacophoric constraints

It is possible to assign fairly sophisticated pharmacophoric It is possible to assign fairly sophisticated pharmacophoric (henceforth also known as chemical) definitions(henceforth also known as chemical) definitions

name acidname acid

# deprotonated carboxyl# deprotonated carboxyl

definition O.co2 ( C )definition O.co2 ( C )

# tetrazole# tetrazole

definition N.pl3 ( H ) ( N.2 ( N.2 ( N.2 ( C.2 ) ) ) )definition N.pl3 ( H ) ( N.2 ( N.2 ( N.2 ( C.2 ) ) ) )

definition N.pl3 ( H ) ( N.2 ( N.2 ( C.2 ( N.2 ) ) ) )definition N.pl3 ( H ) ( N.2 ( N.2 ( C.2 ( N.2 ) ) ) )

definition N.2 ( N.2 ( N.2 ( C.2 ( N.pl3 ( H ) ) ) ) )definition N.2 ( N.2 ( N.2 ( C.2 ( N.pl3 ( H ) ) ) ) )

definition N.2 ( N.2 ( C.2 ( N.pl3 ( H ) ( N.2 ) ) ) )definition N.2 ( N.2 ( C.2 ( N.pl3 ( H ) ( N.2 ) ) ) )

definition N.2 ( C.2 ( N.2 ( N.pl3 ( H ) ( N.2 ) ) ) )definition N.2 ( C.2 ( N.2 ( N.pl3 ( H ) ( N.2 ) ) ) )

definition N.2 ( N.2 ( C.2 ( N.2 ( N.pl3 ( H ) ) ) ) )definition N.2 ( N.2 ( C.2 ( N.2 ( N.pl3 ( H ) ) ) ) )

definition N.2 ( N.pl3 ( H ) ( N.2 ( N.2 ( C.2 ) ) ) )definition N.2 ( N.pl3 ( H ) ( N.2 ( N.2 ( C.2 ) ) ) )

# acyl sulphonamide # acyl sulphonamide

definition N.am ( S ( 2 O.2 ) ) ( C.2 ( O.2 ) )definition N.am ( S ( 2 O.2 ) ) ( C.2 ( O.2 ) )

definition O.2 ( C.2 ( N.am ( H ) ( S ( 2 O.2 ) ) ) )definition O.2 ( C.2 ( N.am ( H ) ( S ( 2 O.2 ) ) ) )

definition O.2 ( S ( O.2 ) ( N.am ( H ) ( C.2 ( O.2 ) ) )definition O.2 ( S ( O.2 ) ( N.am ( H ) ( C.2 ( O.2 ) ) )

Current types:

heavy atom

donor

acceptor

hydrophobe

aromatic

aromatic_hydrophobic

acid

base

donor_and_acceptor

special (e.g. metal chelator)

Page 14: Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols ä Structure-Based Virtual Screening (SBVS) is a proven technique

Structure-Based Search ProtocolsSite Points Used in Kinase SearchStructure-Based Search ProtocolsSite Points Used in Kinase Search

Region 3

Hydrophobic /

Any heavy atom

Region 1 ( + 4)

acceptor / donor

Region 2

Hydrophobic + 2 donors

Page 15: Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols ä Structure-Based Virtual Screening (SBVS) is a proven technique

Structure-Based Search ProtocolsTest Sets and Site Points Used

Structure-Based Search ProtocolsTest Sets and Site Points Used

Sphgen used to generate site points for “generic” DOCK searchesSphgen used to generate site points for “generic” DOCK searches Pharmacophore points derived from a mixture of non-data set bound ligands and in-house Pharmacophore points derived from a mixture of non-data set bound ligands and in-house

programs that process GRID maps and Connolly surfaces (plus plenty of human programs that process GRID maps and Connolly surfaces (plus plenty of human intervention)intervention)

Active data sets broken down into chemotypes to prevent the problem of common analogue Active data sets broken down into chemotypes to prevent the problem of common analogue bias - an under appreciated issue in all validationsbias - an under appreciated issue in all validations

Target Active ChemotypeDefinitions

PharmacophorePoints / Critical

Regions2 Serineproteases

P1 substituent / P1-P4 linker substituent

P1 (base /hydrophobe) + P4(hydrophobe) pockets

2 Fatty acidbindingproteins

Core linking acidmoiety to remainingsubstituents

Acid binding pocket

Kinase Moiety mimicingadenine / main coreof molecules

Adenine bindingpocket(donor/acceptor) [+rear hydrophobicpocket]

Page 16: Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols ä Structure-Based Virtual Screening (SBVS) is a proven technique

Results - fatty acid binding protein 1 No. of hits after 7 chemotypes located by at least one search ( 500

compounds processed from 28 actives / 8 chemotypes)

Results - fatty acid binding protein 1 No. of hits after 7 chemotypes located by at least one search ( 500

compounds processed from 28 actives / 8 chemotypes)

• Missing chemotype a citrazinate - not covered in chemical definitions - easy to fix - another advantage over electrostatics

0

5

10

15

20

Search Types

Com

poun

ds

0

2

4

6

8

Che

mot

ypes

Chemotypes

Compounds

Page 17: Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols ä Structure-Based Virtual Screening (SBVS) is a proven technique

0

200

400

600

800

1000

1200

1400

1600

1800

2000

Co

mp

ou

nd

s

Search Type

Best hit rateMean hit rateWorst hit rate

Results-OverallCompounds processed for 50% Chemotype Coverage for All Systems

Results-OverallCompounds processed for 50% Chemotype Coverage for All Systems

Rigid conformer screens perform quite well in generic search modeRigid conformer screens perform quite well in generic search mode One system contains predominantly rigid chemotypes, two others One system contains predominantly rigid chemotypes, two others

require a predominantly extended conformation for bindingrequire a predominantly extended conformation for binding

On addition of critical and chemical constraints, inability of rigid search On addition of critical and chemical constraints, inability of rigid search to adapt to more exacting requirements severely compromises resultsto adapt to more exacting requirements severely compromises results

Generic searches with addition of conformational flexibility little improvement relative to rigid searchGeneric searches with addition of conformational flexibility little improvement relative to rigid search signal to noise issuessignal to noise issues

Addition of critical region constraint alone worsens resultsAddition of critical region constraint alone worsens results 500 orientations per conformer too few for search - leads to premature termination of docking analysis 500 orientations per conformer too few for search - leads to premature termination of docking analysis

for many ligandsfor many ligands

Adding chemical in addition to critical constraints provides best balance for sampling parametersAdding chemical in addition to critical constraints provides best balance for sampling parameters still required reasonable tolerances and forgiving scoring function for optimum resultsstill required reasonable tolerances and forgiving scoring function for optimum results

Page 18: Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols ä Structure-Based Virtual Screening (SBVS) is a proven technique

ResultsSample Hit Rate Comparisons

ResultsSample Hit Rate Comparisons

Kinase sites tend to be highly mobile Kinase sites tend to be highly mobile Forgiving DOCK scoring function more appropriateForgiving DOCK scoring function more appropriate

Fatty acid active site deep and fairly rigid Fatty acid active site deep and fairly rigid Prometheus at least comparable performance to DOCK even with more Prometheus at least comparable performance to DOCK even with more

simplistic constraintssimplistic constraints

Kinase

0

2

4

6

8

10

12

100 200 300 400 500 600 700 800Compounds Processed

Chem

otyp

es h

it

PrometheusunconstrainedPrometheusconstrainedDock constrained

DOCKunconstrained

Fatty acid binding protein 1

012345678

Compounds processedCh

emot

ypes

hit

Prometheusconstrained

DOCK constrained

Page 19: Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols ä Structure-Based Virtual Screening (SBVS) is a proven technique

ResultsSample Hit Rate Comparisons

ResultsSample Hit Rate Comparisons

Illustrates how addition of constraints can allow performance of Illustrates how addition of constraints can allow performance of simplistic scoring functions to surpass those deemed more simplistic scoring functions to surpass those deemed more sophisticated sophisticated

Serine protease 1

0

2

4

6

8

10

12

14

16

18

100 200 300 400 500Compounds processed

Co

mp

ou

nd

s h

itDOCKconstrained

ICM

Page 20: Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols ä Structure-Based Virtual Screening (SBVS) is a proven technique

ResultsSample Hit Rate Comparisons

ResultsSample Hit Rate Comparisons

Removing highly flexible molecules from the search reduces the noise at the top of the hit listRemoving highly flexible molecules from the search reduces the noise at the top of the hit list In a database of 250000, the top 100 becomes top 2500 In a database of 250000, the top 100 becomes top 2500 Could be crucial when only small data sets can be assayedCould be crucial when only small data sets can be assayed Smaller molecules generally make better leadsSmaller molecules generally make better leads

012345678

Ch

emo

typ

es h

it

100 200 300 400 500Compounds processed

Average Hit Rates Using Different Flexibility Constraints

Max 15 bonds

Max. 8 bonds

Page 21: Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols ä Structure-Based Virtual Screening (SBVS) is a proven technique

Sampling choices have a profound effect on SBVS resultsFor maximum impact impact current methodology, scoring functions should either

Be designed/utilized with these limitations in mind Forgiving / targeted at less flexible molecules

Improve results by such a high degree that additional sampling (and CPU) is warranted

In the mean time, utility of pharmacophoric hypotheses {critical region(s) with pharmacophoric constraints} is clear

Better results faster Less sensitivity to model coarseness Allows constraints exploiting known structural biologyKey to optimum use is balancing constraints and tolerances to ensure sufficient sampling

benchmarking with known ligands one way to do this

ConclusionsThe hypothesis hypothesis

ConclusionsThe hypothesis hypothesis

Page 22: Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols ä Structure-Based Virtual Screening (SBVS) is a proven technique

AcknowledgementsAcknowledgements

Thank youThank you to my BMS CADD colleagues to my BMS CADD colleagues