julia salas cs379a 1-24-06

Julia Salas

CS379a

1-24-06

Aim of the Study

• To survey the docking and scoring algorithms available today

• Evaluate protocols for three tasks:1. Prediction of the conformation of ligand bound to protein target

2. Virtual screening of database to identify leads

3. Prediction of binding affinities

General Methods• Investigate several docking programs using a variety of different

target types

• Use a large set of “closely related compounds” (compound set) for each target type

Target Types/Targets Used

• Target Types:Target Types: 7 protein classes represented

• Targets: 8 proteins of interest to GSK

• Variety: Diversity of mechanisms, binding site shape, binding site chemical environment

Goal: Represent a typical pharmaceutical compound collection

Compound Sets Used (Ligands)

• Compound/Ligand Sets: 1303 compounds– 150-200 “closely related” compounds– Compounds have experimentally determined affinities– Affinities of compounds in a single set span a min of 4

orders of magnitude– Each set has shown biological activity towards target

protein– Each set has a max of 20% inactive and 20% extremely

active compounds– Each set has published (2-54) cocrystal structures with

the target protein

Compound Sets Used (Ligands)

• zdc

Docking and Scoring Algorithms

Docking Algorithms• Evaluated 10 programs with

different algorithms and scoring functions:– 19 protocols total

Procedure• Each method evaluated by an

expert, no time restrictions or other constraints

• Evaluators did not have cocrystal structures, only ligand structure and protein active site residues

Same ligand starting structure:

•Optimized to a (local) min

•“Reasonable” bond distances/angles

•Correct atom hybridization

•4 structures provided (differ in ionization)

•SMILES (text-based) structure description

Analysis of Docking Programs and Scoring Functions

• 19 protocols evaluated on three tasks:

1. Prediction of the conformation of ligand bound to

protein target

2. Virtual screening of database to identify leads

3. Prediction of binding affinities

Prediction of Ligand Conformation Bound to Protein Target

• Compare predictions to (136) cocrystal structures using:

1. rmsd for heavy atoms

2. Volume overlap Tanimoto similarity index• Two standards for success: rmsd within

– 2Å (correct orientation) Black Bars– 4Å (within binding site) Gray Bars

• Can evaluate both the scoring function and the overall methods

IX, ID= Vol overlap integrals for crystal and docked structure

OX,D=Vol overlap between crystal and docked pose

0 ≤ Tvol ≤ 1

Prediction of Ligand Conformation Bound to Target: Conclusions

The good…• Docking programs could generate crystal conformations

• For “all” (-HCVP) targets, at least one program could dock ≥40% of ligands within 2%

– 90% of ligands could be docked with 4Å with 100% docked in correct location

The bad…• Program with best performance changes

target to target

• Scoring function lead to consistently incorrect predictions

• HCVP had very weak predictions

Virtual Screening of Database to Identify Leads

• Ability to identify the active compounds1. Enrichment: How quickly did the protocol identify the active compound vs.

random chance?

• Success: Identify at least 50% of the active compounds within the top 10% of the score-ordered list halfway between random and max.

2. Lead Identification: Cost analysis…how many compounds do you need to screen to find at least one active compound from each class?

• All active compound classes ID’d within top 10%• Percent actives vs. percent compounds screened

measured

Prediction of Binding Affinities

• Calculated docking scores compared to measured affinity

• Docking scores were autoscaled and then compared

• Conclusions:

– No statistically significant correlation between scoring function and measured affinity

Conclusions and Discussion Questions

• Docking programs were able to generate poses that resemble cocrystal structures

• Largest difficulties were in determining the small molecule structure, not placing ligand in binding site

• Scoring functions were not successful in predicting the best structures• Active compounds could be identified in a pool of decoys• Docking scores could not be correlated to affinity

Question 1: What factors may have contributed to the failure of these programs to predict small molecule conformation?

Question 2: The failure of the programs to predict HCVP structures was attributed to the enzyme’s large active site. Why? Additionally, should flexibility/dynamics be considered?

Question 3: Compound classes were defined by similar backbone structure. Although all compounds in a class had measured affinities, can we assume they all have the same binding mode?

julia salas cs379a 1-24-06

Documents

conformation of ligand

target proteineach set

ligand structure

active compoundseach

target proteincompound

cocrystal structures

protein target2

large set