simbiosys inc.© 2004 slide #1 enrichment and cross-validation studies of the ehits high throughput...
TRANSCRIPT
SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #1
Enrichment and cross-validation studies of the eHiTS high throughput
screening software package.
Darryl Reid, Zsolt Zsoldos, Aniko Simon, and A. Peter Johnson
SimBioSys Inc., © 2004
Contents:
● Introduction: eHiTS overview, exhaustive search, scoring function
● Validation: Can eHiTS reproduce crystal structures?● Cross-validation: Finding a suitable representative
receptor● Enrichment Study: Virtual High-throughput screening,
finding the diamonds in the rough
http://www.simbiosys.ca/
SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #2
Introduction
Brief overview of eHiTS Validation study with DHFR complexes
Prove docking ability / accuracy Cross-validation study
Show receptor site compatibility Enrichment study
Show applicability for virtual high-throughput screening
SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #3
eHiTS - Overview
eHiTS features an exhaustive systematic flexible docking algorithm
SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #4
eHiTS - Search
Ligand is divided into rigid fragments and
connecting flexible chains All rigid fragments are
docked independently Graph matching Flexible chain fitting Local energy minimisation
SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #5
eHiTS - Scoring
Empirical-based scoring Many components; Hydrogen
bonding, Hydrophobicity, Electrostatic potential, Van der Waals contact energy,Metal ion interactions, etc.
All parameters are configurable Chemical properties mapped to
Connolly surface Flag compatibility matrix score for
receptor-ligand contacts
SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #6
Experiment Objectives
Show ability to reproduce crystal structures
Show that eHiTS can select active ligands of human DHFR from a drug database
Illustrate the ease of use of eHiTS No pdb preparation, no ligand preparation
Show eHiTS can be used in HTS applications to discover active ligands
SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #7
Dihydrofolate Reductase (DHFR)
Plays an essential role in the building of DNA
“juggles” two molecules in this reaction
Folate (purple) and NADPH (green)
The first enzyme targeted for cancer chemotherapy
Oct. 2002 PDB Molecule of the Month:http://www.rcsb.org/pdb/molecules/pdb34_3.html
SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #8
DHFR – Binding site
The drug methotrexate is designed to mimic folate, blocking the enzyme's action
Note the interaction between folate and NADPH, this is essential for the enzyme's function
SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #9
“Actives” Selection
Searched for DHFR complexes in the PDB Obtained 88 complexes, all sources
Upon quick visual inspection, eliminated 17 complexes
Contained no ligand in the binding site
Contained multiple ligands in the binding site Selected 71 DHFR complexes for study Including 19 human complexes
SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #10
19 Human DHFR complex
Com plex Ligand Form ula # at om s CoF Form ula 1dhf FOL 2(C19 H17 N7 O6 --) 49 1dlr MXA C17 H19 N5 O2 43 NDP C21 H27 N7 O17 P3 1dls MTX C20 H22 N8 O5 55 NDP C21 H27 N7 O17 P3 1drf FOL C19 H17 N7 O6 -- 49 1hfp MOT C20 H22 N6 O6 54 NAP C21 H28 N7 O17 P3 1hfq MOT C20 H22 N6 O6 54 NAP C21 H28 N7 O17 P3 1hfr MOT C20 H22 N6 O6 54 NAP C21 H28 N7 O17 P3 1km s LIH C18 H17 N7 42 NDP C21 H30 N7 O17 P3 1km v LII C18 H19 N5 O2 44 NDP C21 H30 N7 O17 P3 1m vs DTM C18 H22 N6 O3 49 1m vt DTM C18 H22 N6 O3 49 1ohj COP C27 H27 N9 O6 69 NDP C21 H30 N7 O17 P3 1ohk COP C27 H27 N9 O6 69 NDP C21 H30 N7 O17 P3 1pd8 CO4 C19 H24 N6 O3 52 NDP C21 H30 N7 O17 P3 1pd9 CO4 C19 H24 N6 O3 52 1s3u TQD C19 H39 N5 O3 66 1s3v TQD C19 H39 N5 O3 66 1s3w TQT C17 H33 N5 55 NAP C21 H28 N7 O17 P3 2dhf DZF 2(C20 H18 N6 O6 --) 50
SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #11
Validation
Each DHFR ligand was removed from the protein and docked back into its binding site
EHiTS was allowed to do this split automatically Results were then judged by evaluating the
RMSD between the crystal structure binding position and the computed docking pose
Standard (default) parameters for eHiTS were used in all the runs
SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #12
Validation – All Sources
Top-ranked Closest< 0.5 3.23% 17.74%< 1.0 22.58% 51.61%< 1.5 59.68% 69.35%< 2.0 67.74% 85.48%< 2.5 83.87% 91.94%< 3.0 88.71% 91.94%Ave RMSD 1.94 1.41 <0.5 <1.0 <1.5 <2.0 <2.5 <3.0
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
Top-Ranked and Closest RMSD Comparison
Top-ranked
Closest
RMSD
Pe
rce
nt
of
Str
uct
ure
s
❑ 71 PDBs❑ 29 Unique Ligands
SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #13
Validation - Human
<0.5 <1.0 <1.5 <2.0 <2.5 <3.00.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
Top-Ranked and Closest RMSD Comparison - Human DHFR
Top-ranked
Closest
RMSD
Pe
rce
nt o
f Str
uct
ure
sTop-ranked Closest
< 0.5 10.53% 21.05%< 1.0 21.05% 63.16%< 1.5 68.42% 73.68%< 2.0 73.68% 84.21%< 2.5 84.21% 94.74%< 3.0 89.47% 94.74%Ave RMSD 1.98 1.18
19 PDBs 12 Unique Ligands
SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #14
Results – The Good
1dyi Complex – x-ray ligand in white
Top-Rank, -139.60.85 RMS
Closest, -105.770.76 RMS
SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #15
Results – The Bad
1ly4 Complex – x-ray ligand in white
Top-Rank, --55.112.23 RMS
Closest, -50.320.89 RMS
SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #16
Results – The Ugly
Top-Rank, -7.874.90 RMS
Closest, 43.224.38 RMS
1rc4 Complex – x-ray ligand in white
SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #17
Validation - Summary
EHiTS was able to reproduce accurately (RMS < 2.0) the crystal structure position of DHFR ligands 85% of the time
67% of the time, eHiTS' highest ranking (best scoring) pose had a RMS < 2.0
This number improves for Human DHFR ligands, 74%
This shows that eHiTS is able to predict docking poses for DHFR ligands
SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #18
Cross-validation
Each ligand is docked against each receptor resulting in a matrix of dockings
A receptor that docks many ligands well is a good candidate for enrichment studies
Tests were ran using standard (default) parameters, with no preprocessing of the pdb data (eHiTS did all processing automatically)
SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #19
Cross-Validation
Color map of cross-validation matrix of 71 DHFR complexes. Green = negative score (good), Red = Positive score (bad)., Grey = no dockProteins are listed to the right, ligands listed across the top.
Ligands
Pro
tein
s
SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #20
Cross-Validation Human
Looking horizontally across matrix, can judge how well the receptor site will accept different ligands
As a representative sample (looking at both human and all sources, we chose 1DLS for our enrichment study
1DLS docks almost every ligand, gives average scores for ligands
Color map of cross-validation matrix of 19 Human DHFR complexes. Green = negative score (good), Red = Positive score (bad), Grey = no dock Proteins are listed to the right, ligands listed across the top.
Pro
tein
s
Ligands
SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #21
Enrichment Study
The object of virtual screening is to select a set of ligands “enriched” with actives, relative to the entire database
1DLS used as receptor site
Two groups of ligands were chosen for enrichment tests
21000 random ligands from MDDR database of “drug-like” ligands
16000 MDDR ligands of comparable size to DHFR ligands (actives) found in PDB, 40-60 atoms in size
Enrichment factor is the ratio between the % actives in sample portion and % actives in entire database
SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #22
Enrichment – All Sources
Port ion size EF1.00% 29.913.00% 11.94
10.00% 5.08
21000 Random MDDR Ligands
❑ Total # ligands: 21239
❑ # Ligands docked: 12133
❑ # actives: 71
❑ # actives docked: 67 0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00%0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
Enrichment Results for Screening 21000 MDDR Ligands
Scored
Random
percent database sampled
pe
rce
nt
of
act
ive
s in
po
rtio
n
SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #23
Enrichment – Human Ligands
Port ion size EF1.00% 52.863.00% 22.83
10.00% 7.37
21000 Random MDDR Ligands
❑ Total # ligands: 21191
❑ # Ligands docked: 12085
❑ # actives: 19
❑ # actives docked: 190.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00%0
0.2
0.4
0.6
0.8
1
1.2
Enrichment Results of Screening Human Ligands / 21000 MDDR Ligands
Scored
Random
precentage of database sampled
perc
enta
ge o
f ac
tives
in p
ortio
n
SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #24
Enrichment – All Sources
Selected 16000 MDDR Ligands, 40-60 atoms
Port ion size EF1.00% 94.233.00% 33.34
❑ Total # ligands: 16636
❑ # Ligands docked: 641
❑ # actives: 71
❑ # actives docked: 67 0.00% 0.50% 1.00% 1.50% 2.00% 2.50% 3.00% 3.50% 4.00% 4.50%0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
Enrichment Results for Screening16000 MDDR selected ligands
ScoredRandom
Percentage database sampled
Per
cent
age
of a
ctiv
es in
por
tion
SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #25
Enrichment – Human ligands
Port ion Size EF1.00% 100.533.00% 33.38
Selected 16000 MDDR Ligands, 40-60 atoms
❑ Total # ligands: 16588
❑ # Ligands docked: 593
❑ # actives: 19
❑ # actives docked: 190.00% 0.50% 1.00% 1.50% 2.00% 2.50% 3.00% 3.50% 4.00%
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
120.00%
Enrichment Results for Screening Human / 40-60
Scored
Random
Percentage database sampled
Pe
rce
nta
ge
of a
ctiv
es
in p
ort
ion
SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #26
Conclusions
EHiTS can accurately reproduce crystal structure poses
Cross-validation studies showed that 1DLS is a representative structure for DHFR family, especially for Human ligands
eHiTS gives very good enrichment results on our given dataset. Especially considering:
Our “actives” are hypothetical (some may not be active to 1DLR)
Our “decoys” could have activity towards 1DLR
SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #27
Conclusions
EHiTS proved suitable for virtual high throughput screening
Docking times averaged ~5mins / ligand for standard parameter and 30 sec / ligand for Enrichment studies using “fast” parameter sets
21000 ligands were screened in under 12 hours on 160 cpu cluster
Good enrichment factors shows effectiveness of screening
SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #28
Acknowledgments
Zsolt Zoldos, SimBioSys Inc. CEO Aniko Simon, Bashir Sadjad, Beihong Wu,
Constantin Tanurkov, James Law, Sing Yoong Khew, Irina Szabo, Zsolt Szabo, David Fung.
Dr. Peter Johnson, Leeds University
http://www.simbiosys.ca