simbiosys inc.© 2004 slide #1 enrichment and cross-validation studies of the ehits high throughput...

28
SimBioSys Inc.© 200 p://www.simbiosys.ca/ Slide #1 Enrichment and cross-validation studies of the eHiTS high throughput screening software package. Darryl Reid, Zsolt Zsoldos, Aniko Simon, and A. Peter Johnson SimBioSys Inc., © 2004 Contents: Introduction: eHiTS overview, exhaustive search, scoring function Validation: Can eHiTS reproduce crystal structures? Cross-validation: Finding a suitable representative receptor Enrichment Study: Virtual High-throughput screening, finding the diamonds in the rough http://www.simbiosys.ca/

Upload: claude-watson

Post on 17-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SimBioSys Inc.© 2004  Slide #1 Enrichment and cross-validation studies of the eHiTS high throughput screening software package

SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #1

Enrichment and cross-validation studies of the eHiTS high throughput

screening software package.

Darryl Reid, Zsolt Zsoldos, Aniko Simon, and A. Peter Johnson

SimBioSys Inc., © 2004

Contents:

● Introduction: eHiTS overview, exhaustive search, scoring function

● Validation: Can eHiTS reproduce crystal structures?● Cross-validation: Finding a suitable representative

receptor● Enrichment Study: Virtual High-throughput screening,

finding the diamonds in the rough

http://www.simbiosys.ca/

Page 2: SimBioSys Inc.© 2004  Slide #1 Enrichment and cross-validation studies of the eHiTS high throughput screening software package

SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #2

Introduction

Brief overview of eHiTS Validation study with DHFR complexes

Prove docking ability / accuracy Cross-validation study

Show receptor site compatibility Enrichment study

Show applicability for virtual high-throughput screening

Page 3: SimBioSys Inc.© 2004  Slide #1 Enrichment and cross-validation studies of the eHiTS high throughput screening software package

SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #3

eHiTS - Overview

eHiTS features an exhaustive systematic flexible docking algorithm

Page 4: SimBioSys Inc.© 2004  Slide #1 Enrichment and cross-validation studies of the eHiTS high throughput screening software package

SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #4

eHiTS - Search

Ligand is divided into rigid fragments and

connecting flexible chains All rigid fragments are

docked independently Graph matching Flexible chain fitting Local energy minimisation

Page 5: SimBioSys Inc.© 2004  Slide #1 Enrichment and cross-validation studies of the eHiTS high throughput screening software package

SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #5

eHiTS - Scoring

Empirical-based scoring Many components; Hydrogen

bonding, Hydrophobicity, Electrostatic potential, Van der Waals contact energy,Metal ion interactions, etc.

All parameters are configurable Chemical properties mapped to

Connolly surface Flag compatibility matrix score for

receptor-ligand contacts

Page 6: SimBioSys Inc.© 2004  Slide #1 Enrichment and cross-validation studies of the eHiTS high throughput screening software package

SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #6

Experiment Objectives

Show ability to reproduce crystal structures

Show that eHiTS can select active ligands of human DHFR from a drug database

Illustrate the ease of use of eHiTS No pdb preparation, no ligand preparation

Show eHiTS can be used in HTS applications to discover active ligands

Page 7: SimBioSys Inc.© 2004  Slide #1 Enrichment and cross-validation studies of the eHiTS high throughput screening software package

SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #7

Dihydrofolate Reductase (DHFR)

Plays an essential role in the building of DNA

“juggles” two molecules in this reaction

Folate (purple) and NADPH (green)

The first enzyme targeted for cancer chemotherapy

Oct. 2002 PDB Molecule of the Month:http://www.rcsb.org/pdb/molecules/pdb34_3.html

Page 8: SimBioSys Inc.© 2004  Slide #1 Enrichment and cross-validation studies of the eHiTS high throughput screening software package

SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #8

DHFR – Binding site

The drug methotrexate is designed to mimic folate, blocking the enzyme's action

Note the interaction between folate and NADPH, this is essential for the enzyme's function

Page 9: SimBioSys Inc.© 2004  Slide #1 Enrichment and cross-validation studies of the eHiTS high throughput screening software package

SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #9

“Actives” Selection

Searched for DHFR complexes in the PDB Obtained 88 complexes, all sources

Upon quick visual inspection, eliminated 17 complexes

Contained no ligand in the binding site

Contained multiple ligands in the binding site Selected 71 DHFR complexes for study Including 19 human complexes

Page 10: SimBioSys Inc.© 2004  Slide #1 Enrichment and cross-validation studies of the eHiTS high throughput screening software package

SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #10

19 Human DHFR complex

Com plex Ligand Form ula # at om s CoF Form ula 1dhf FOL 2(C19 H17 N7 O6 --) 49 1dlr MXA C17 H19 N5 O2 43 NDP C21 H27 N7 O17 P3 1dls MTX C20 H22 N8 O5 55 NDP C21 H27 N7 O17 P3 1drf FOL C19 H17 N7 O6 -- 49 1hfp MOT C20 H22 N6 O6 54 NAP C21 H28 N7 O17 P3 1hfq MOT C20 H22 N6 O6 54 NAP C21 H28 N7 O17 P3 1hfr MOT C20 H22 N6 O6 54 NAP C21 H28 N7 O17 P3 1km s LIH C18 H17 N7 42 NDP C21 H30 N7 O17 P3 1km v LII C18 H19 N5 O2 44 NDP C21 H30 N7 O17 P3 1m vs DTM C18 H22 N6 O3 49 1m vt DTM C18 H22 N6 O3 49 1ohj COP C27 H27 N9 O6 69 NDP C21 H30 N7 O17 P3 1ohk COP C27 H27 N9 O6 69 NDP C21 H30 N7 O17 P3 1pd8 CO4 C19 H24 N6 O3 52 NDP C21 H30 N7 O17 P3 1pd9 CO4 C19 H24 N6 O3 52 1s3u TQD C19 H39 N5 O3 66 1s3v TQD C19 H39 N5 O3 66 1s3w TQT C17 H33 N5 55 NAP C21 H28 N7 O17 P3 2dhf DZF 2(C20 H18 N6 O6 --) 50

Page 11: SimBioSys Inc.© 2004  Slide #1 Enrichment and cross-validation studies of the eHiTS high throughput screening software package

SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #11

Validation

Each DHFR ligand was removed from the protein and docked back into its binding site

EHiTS was allowed to do this split automatically Results were then judged by evaluating the

RMSD between the crystal structure binding position and the computed docking pose

Standard (default) parameters for eHiTS were used in all the runs

Page 12: SimBioSys Inc.© 2004  Slide #1 Enrichment and cross-validation studies of the eHiTS high throughput screening software package

SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #12

Validation – All Sources

Top-ranked Closest< 0.5 3.23% 17.74%< 1.0 22.58% 51.61%< 1.5 59.68% 69.35%< 2.0 67.74% 85.48%< 2.5 83.87% 91.94%< 3.0 88.71% 91.94%Ave RMSD 1.94 1.41 <0.5 <1.0 <1.5 <2.0 <2.5 <3.0

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

100.00%

Top-Ranked and Closest RMSD Comparison

Top-ranked

Closest

RMSD

Pe

rce

nt

of

Str

uct

ure

s

❑ 71 PDBs❑ 29 Unique Ligands

Page 13: SimBioSys Inc.© 2004  Slide #1 Enrichment and cross-validation studies of the eHiTS high throughput screening software package

SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #13

Validation - Human

<0.5 <1.0 <1.5 <2.0 <2.5 <3.00.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

100.00%

Top-Ranked and Closest RMSD Comparison - Human DHFR

Top-ranked

Closest

RMSD

Pe

rce

nt o

f Str

uct

ure

sTop-ranked Closest

< 0.5 10.53% 21.05%< 1.0 21.05% 63.16%< 1.5 68.42% 73.68%< 2.0 73.68% 84.21%< 2.5 84.21% 94.74%< 3.0 89.47% 94.74%Ave RMSD 1.98 1.18

19 PDBs 12 Unique Ligands

Page 14: SimBioSys Inc.© 2004  Slide #1 Enrichment and cross-validation studies of the eHiTS high throughput screening software package

SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #14

Results – The Good

1dyi Complex – x-ray ligand in white

Top-Rank, -139.60.85 RMS

Closest, -105.770.76 RMS

Page 15: SimBioSys Inc.© 2004  Slide #1 Enrichment and cross-validation studies of the eHiTS high throughput screening software package

SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #15

Results – The Bad

1ly4 Complex – x-ray ligand in white

Top-Rank, --55.112.23 RMS

Closest, -50.320.89 RMS

Page 16: SimBioSys Inc.© 2004  Slide #1 Enrichment and cross-validation studies of the eHiTS high throughput screening software package

SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #16

Results – The Ugly

Top-Rank, -7.874.90 RMS

Closest, 43.224.38 RMS

1rc4 Complex – x-ray ligand in white

Page 17: SimBioSys Inc.© 2004  Slide #1 Enrichment and cross-validation studies of the eHiTS high throughput screening software package

SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #17

Validation - Summary

EHiTS was able to reproduce accurately (RMS < 2.0) the crystal structure position of DHFR ligands 85% of the time

67% of the time, eHiTS' highest ranking (best scoring) pose had a RMS < 2.0

This number improves for Human DHFR ligands, 74%

This shows that eHiTS is able to predict docking poses for DHFR ligands

Page 18: SimBioSys Inc.© 2004  Slide #1 Enrichment and cross-validation studies of the eHiTS high throughput screening software package

SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #18

Cross-validation

Each ligand is docked against each receptor resulting in a matrix of dockings

A receptor that docks many ligands well is a good candidate for enrichment studies

Tests were ran using standard (default) parameters, with no preprocessing of the pdb data (eHiTS did all processing automatically)

Page 19: SimBioSys Inc.© 2004  Slide #1 Enrichment and cross-validation studies of the eHiTS high throughput screening software package

SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #19

Cross-Validation

Color map of cross-validation matrix of 71 DHFR complexes. Green = negative score (good), Red = Positive score (bad)., Grey = no dockProteins are listed to the right, ligands listed across the top.

Ligands

Pro

tein

s

Page 20: SimBioSys Inc.© 2004  Slide #1 Enrichment and cross-validation studies of the eHiTS high throughput screening software package

SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #20

Cross-Validation Human

Looking horizontally across matrix, can judge how well the receptor site will accept different ligands

As a representative sample (looking at both human and all sources, we chose 1DLS for our enrichment study

1DLS docks almost every ligand, gives average scores for ligands

Color map of cross-validation matrix of 19 Human DHFR complexes. Green = negative score (good), Red = Positive score (bad), Grey = no dock Proteins are listed to the right, ligands listed across the top.

Pro

tein

s

Ligands

Page 21: SimBioSys Inc.© 2004  Slide #1 Enrichment and cross-validation studies of the eHiTS high throughput screening software package

SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #21

Enrichment Study

The object of virtual screening is to select a set of ligands “enriched” with actives, relative to the entire database

1DLS used as receptor site

Two groups of ligands were chosen for enrichment tests

21000 random ligands from MDDR database of “drug-like” ligands

16000 MDDR ligands of comparable size to DHFR ligands (actives) found in PDB, 40-60 atoms in size

Enrichment factor is the ratio between the % actives in sample portion and % actives in entire database

Page 22: SimBioSys Inc.© 2004  Slide #1 Enrichment and cross-validation studies of the eHiTS high throughput screening software package

SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #22

Enrichment – All Sources

Port ion size EF1.00% 29.913.00% 11.94

10.00% 5.08

21000 Random MDDR Ligands

❑ Total # ligands: 21239

❑ # Ligands docked: 12133

❑ # actives: 71

❑ # actives docked: 67 0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00%0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

100.00%

Enrichment Results for Screening 21000 MDDR Ligands

Scored

Random

percent database sampled

pe

rce

nt

of

act

ive

s in

po

rtio

n

Page 23: SimBioSys Inc.© 2004  Slide #1 Enrichment and cross-validation studies of the eHiTS high throughput screening software package

SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #23

Enrichment – Human Ligands

Port ion size EF1.00% 52.863.00% 22.83

10.00% 7.37

21000 Random MDDR Ligands

❑ Total # ligands: 21191

❑ # Ligands docked: 12085

❑ # actives: 19

❑ # actives docked: 190.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00%0

0.2

0.4

0.6

0.8

1

1.2

Enrichment Results of Screening Human Ligands / 21000 MDDR Ligands

Scored

Random

precentage of database sampled

perc

enta

ge o

f ac

tives

in p

ortio

n

Page 24: SimBioSys Inc.© 2004  Slide #1 Enrichment and cross-validation studies of the eHiTS high throughput screening software package

SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #24

Enrichment – All Sources

Selected 16000 MDDR Ligands, 40-60 atoms

Port ion size EF1.00% 94.233.00% 33.34

❑ Total # ligands: 16636

❑ # Ligands docked: 641

❑ # actives: 71

❑ # actives docked: 67 0.00% 0.50% 1.00% 1.50% 2.00% 2.50% 3.00% 3.50% 4.00% 4.50%0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

100.00%

Enrichment Results for Screening16000 MDDR selected ligands

ScoredRandom

Percentage database sampled

Per

cent

age

of a

ctiv

es in

por

tion

Page 25: SimBioSys Inc.© 2004  Slide #1 Enrichment and cross-validation studies of the eHiTS high throughput screening software package

SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #25

Enrichment – Human ligands

Port ion Size EF1.00% 100.533.00% 33.38

Selected 16000 MDDR Ligands, 40-60 atoms

❑ Total # ligands: 16588

❑ # Ligands docked: 593

❑ # actives: 19

❑ # actives docked: 190.00% 0.50% 1.00% 1.50% 2.00% 2.50% 3.00% 3.50% 4.00%

0.00%

20.00%

40.00%

60.00%

80.00%

100.00%

120.00%

Enrichment Results for Screening Human / 40-60

Scored

Random

Percentage database sampled

Pe

rce

nta

ge

of a

ctiv

es

in p

ort

ion

Page 26: SimBioSys Inc.© 2004  Slide #1 Enrichment and cross-validation studies of the eHiTS high throughput screening software package

SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #26

Conclusions

EHiTS can accurately reproduce crystal structure poses

Cross-validation studies showed that 1DLS is a representative structure for DHFR family, especially for Human ligands

eHiTS gives very good enrichment results on our given dataset. Especially considering:

Our “actives” are hypothetical (some may not be active to 1DLR)

Our “decoys” could have activity towards 1DLR

Page 27: SimBioSys Inc.© 2004  Slide #1 Enrichment and cross-validation studies of the eHiTS high throughput screening software package

SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #27

Conclusions

EHiTS proved suitable for virtual high throughput screening

Docking times averaged ~5mins / ligand for standard parameter and 30 sec / ligand for Enrichment studies using “fast” parameter sets

21000 ligands were screened in under 12 hours on 160 cpu cluster

Good enrichment factors shows effectiveness of screening

Page 28: SimBioSys Inc.© 2004  Slide #1 Enrichment and cross-validation studies of the eHiTS high throughput screening software package

SimBioSys Inc.© 2004http://www.simbiosys.ca/ Slide #28

Acknowledgments

Zsolt Zoldos, SimBioSys Inc. CEO Aniko Simon, Bashir Sadjad, Beihong Wu,

Constantin Tanurkov, James Law, Sing Yoong Khew, Irina Szabo, Zsolt Szabo, David Fung.

Dr. Peter Johnson, Leeds University

http://www.simbiosys.ca