a genetic algorithm for structure based de-novo design scott c.-h. pegg, jose j. haresco & irwin...

12
A genetic algorithm for structure based de-novo design Scott C.-H. Pegg, Jose J. Haresco & Irwin D. Kuntz February 21, 2006

Upload: fay-ward

Post on 28-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: A genetic algorithm for structure based de-novo design Scott C.-H. Pegg, Jose J. Haresco & Irwin D. Kuntz February 21, 2006

A genetic algorithm for structure based de-novo design

Scott C.-H. Pegg, Jose J. Haresco & Irwin D. Kuntz

February 21, 2006

Page 2: A genetic algorithm for structure based de-novo design Scott C.-H. Pegg, Jose J. Haresco & Irwin D. Kuntz February 21, 2006

Method and applications

• Goal– Using genetic algorithm in ADAPT to search novel small

molecules for combinatorial library generation

• Method– Initial generation – Fitness function– Breeding next generation

• Applications – Catheprin D – small chemical space, ligand unknown– Dihydrofolate reductase – larger chemical space, ligand known– HIV 1 RT – reproduce known structures

• Questions

Page 3: A genetic algorithm for structure based de-novo design Scott C.-H. Pegg, Jose J. Haresco & Irwin D. Kuntz February 21, 2006

Goal and algorithm choice

“develop new ligands using information from the three dimensional (3D) structure of a protein target without the prior knowledge of other ligands”

Basic genetic algorithm

Goal is to

Challenge is

location of bio-active drugs in complete chemical space is sparse, non-contiguous and difficult to predict a priori

Strategies already tried-Find fragments that fit in some part of active site and link multiple fragments together-Find a fragment that fits in some part of active site and grow in a particular direction

Genetic algorithm is better-Good for searching large part of chemical space quickly -Good for adequate not best solution-Works even when fitness/scoring functions are not known exactly-Works with “whole” molecule properties (ADME) -Generates ensemble solutions as leads

Page 4: A genetic algorithm for structure based de-novo design Scott C.-H. Pegg, Jose J. Haresco & Irwin D. Kuntz February 21, 2006

Important steps

Basic genetic algorithm

Initial generation

Fitness pressure

Breeding

-40 -35 more fit

+

equal/unequal

single/multiplecrossover

mutation

Start with acyclic graph of at most16 fragments with at most 8 connections in SMILES notation

-Generate diverse set by picking a random fragment and adding random fragments at random positions-Generate user defined set by swapping at most 2 fragments from user defined graph randomly

Evaluate the fitness value for each compound using DOCK 4.0 program with 6-12 Van der Waals and 1/r electrostatic terms, Daylight’s clogp program, molecular weight, number of rotatable bonds and number of hydrogen bond donor/acceptors

Select best scoring compounds as parents for the nextgeneration which may or may not include the parents

Crossover from parents happens by randomly swapping nodes of equal or unequal sizes generated from random walks

Mutations of daughter occurs with user defined mutation probability with respect to identity or connectivity

New generation is created, optionally diversity is added and process is cycled until the fitness goal is reached.

Page 5: A genetic algorithm for structure based de-novo design Scott C.-H. Pegg, Jose J. Haresco & Irwin D. Kuntz February 21, 2006

Applications Catheprin D

• Compare results of ADAPT applied to a combinatorial library with experimental binding constant data on the library-Able to select fragments consistently present in best inhibitors tested experimentally-Unable to directly produce known inhibitors due to differences in DOCK score functions and binding constant surface

Dihydrofolate reductase (DHFR)• Study the effect of seeding with a known ligand, methoxtrexate in this case and adding diversity to longer runs in a larger chemical space search (108 compounds)-Able to evolve compounds with motif of known ligands-Able to do so faster when seeded with a known bioactive ligand-Able to do so efficiently in one long run by adding diversity than in multiple short runs

HIV 1 reverse transcriptase • Rediscover specific structural themes of ligands that bind to this active site-Able to reproduce four known inhibitors in “buttefly-like” shape (out of 26?)-Able to reproduce a PETT variant inhibitor like MSC-127 which was experimentally discovered by testing 750 PETT variants

Page 6: A genetic algorithm for structure based de-novo design Scott C.-H. Pegg, Jose J. Haresco & Irwin D. Kuntz February 21, 2006

Catheprin D SetupExperimentally studied ligands 10x10x10 =1000Size of potential chemical space 25 frag 3 sites = 15625

Performed 10 runs of 50 generations each

Page 7: A genetic algorithm for structure based de-novo design Scott C.-H. Pegg, Jose J. Haresco & Irwin D. Kuntz February 21, 2006

Catheprin D ResultsExperimentally studied ligands 10x10x10 =1000Size of potential chemical space 25 frag 3 sites = 15625Size of library generated by ADAPT 8x7x7 = 329

4/7 inhibitors with 100 nM and 0/23 inhibitors with 330 nM activity found in the ADAPT library

Experimental data only exists for 24/392 compounds in ADAPT library

DOCK only fitness function does not accurately map the binding constant surface

Page 8: A genetic algorithm for structure based de-novo design Scott C.-H. Pegg, Jose J. Haresco & Irwin D. Kuntz February 21, 2006

DHFR Setup17 fragments from methotrexate + other = 32 total fragments 3-13 fragments allowed per compoundSize of possible chemical space = 3.5 x 108 unique compounds

1 set of 10 runs to 30 generations – methotrexate seeded1 set of 10 runs to 30 generations – unseeded1 set of 10 runs to 100 generations – unseeded

1 run of 1000 generations with diversity every 200 generations 5 runs of 200 generations

Page 9: A genetic algorithm for structure based de-novo design Scott C.-H. Pegg, Jose J. Haresco & Irwin D. Kuntz February 21, 2006

DHFR Results

94% of solutions in seeded results better than seed0% of solutions in 30 generation unseeded results better28% of solutions in 100 generation unseeded results better

96/98 structures in seeded runs contained pteridine frag.21/100 structures in 30 gen. and 56/100 structures in 100 gen. unseeded run contained pteridine fragment

Fitness score for 1000 generation run was better than 5 200 generation runs.

Page 10: A genetic algorithm for structure based de-novo design Scott C.-H. Pegg, Jose J. Haresco & Irwin D. Kuntz February 21, 2006

HIV 1 RT SetupHIV 1 RT model bound to HEPT in butterfly shape characterized by sphgen & GRID65 fragments from 26 inhibitors with 4 -12 fragments per compound for 5 x 1012 compounds in potential chemical space5250 compounds generated from 10 runs 5 inhibitors superimposed on top of each other fashioned the butterfly shapeLigands with 50% of atoms in both wings count as butterfly-like

Page 11: A genetic algorithm for structure based de-novo design Scott C.-H. Pegg, Jose J. Haresco & Irwin D. Kuntz February 21, 2006

HIV 1 RT Results4/26 known inhibitors found in butterfly like shape in ADAPT library

Effavirenz (SustivaTM),Pyrrolobenzodiazepinone, PETT, Dyarryl Sulfone like scaffolds were found among the butterfly like compounds.

Despite the lack of a structural motif in the initial, unseeded populations, the ADAPT program was able to reproduce a geometric constraint, the ‘butterfly’ motif of known NNI’s from the use of a molecular docking fitness function which is not a best choice

Page 12: A genetic algorithm for structure based de-novo design Scott C.-H. Pegg, Jose J. Haresco & Irwin D. Kuntz February 21, 2006

Questions• What are the time gains/ costs in using

this technique instead of just some screening technique?

• How do you decide what to set the parameters to ?

• How do you test the method / parameter set without a known set of ligands to form the fragment library from?