smashing molecules

33
Smashing Molecules How Molecular Fragments Allow us to Explore Large Chemical Spaces Rajarshi Guha & Trung Nguyen NIH Center for Transla9onal Therapeu9cs Chemaxon UGM September 2011

Upload: rguha

Post on 10-May-2015

661 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Smashing Molecules

Smashing  Molecules  How  Molecular  Fragments  Allow  us  to  Explore  Large  

Chemical  Spaces  

Rajarshi  Guha  &  Trung  Nguyen  NIH  Center  for    

Transla9onal  Therapeu9cs    

Chemaxon  UGM  September  2011  

Page 2: Smashing Molecules

Outline  

•  Fragments  as  the  building  blocks  of  chemistry  •  Fragments  and  SAR  •  Fragments  and  ac9vity  profiles  

Page 3: Smashing Molecules

Big  Data  for  Some  Problems  

•  Halevy  et  al  discuss  the  effec9veness  of  extremely  large  datasets  

•  Their  applica9on  focuses  on  machine  transla9on  –  see  the  Google  n-­‐gram  corpus  

•  They  suggest  that  such  extremely  large  datasets  are  useful  because  they  effec9vely  encompass  all  n-­‐grams  (phrases)  commonly  used  

•  Domain  is  rela9vely  constrained  

Halevy  et  al,  IEEE  Intelligent  Systems,  2009,  24,  8-­‐12  

Page 4: Smashing Molecules

Google  Scale  in  Chemistry?  

•  What  would  be  the  equivalent  of  an  n-­‐gram  corpus  in  chemistry?  –  Fragments  – A  more  direct  analogy  can  be  made  by  using  LINGO’s  

•  It  is  possible  to  generate  arbitrarily  large  (virtual)  compound  and    fragment  collec9ons  

•  But  would  such  a  collec9on  span  all  of  “commonly  used”  chemistry?  – Depending  on  the  ini9al  compound  set,  yes  –  But  we’re  also  interested  in  going  beyond  such  a  “commonly  used”  set  

Fink  T,  Reymond  JL,  J  Chem  Inf  Model,  2007,  47,  342  

Page 5: Smashing Molecules

Fragment  Diversity  

•  Consider  a  set  of  bioac9ves  such  as  the  LOPAC  collec9on,  1280  compounds  

•  Using  exhaus9ve    fragmenta9on  we  get    2,460  unique  fragments  

•  On  the  MLSMR    (~  372K  compounds),    we  get    164,583    fragments  

log Fragment Frequency

Pe

rce

nt

of

To

tal

0

10

20

30

40

0 1 2 3 4

Page 6: Smashing Molecules

PC 1

PC

2

-4

-2

0

2

4

-4 -2 0 2 4

Fragment  Diversity  

•  Distribu9on  of  MLSMR  fragments  in  BCUT  space  

PC 1

PC

2

-4

-2

0

2

4

6

-4 -2 0 2

All  fragments   Fragments  occurring  in    5  to  50  molecules  

Page 7: Smashing Molecules

What  Do  We  Do  with  Fragments?  

•  Assuming  we  obtain  fragments  from  a  large  enough  collec9on  what  do  we  do?  – Learning  from  fragments  –  QSARs,  genera9ve  models  

– Use  fragments  as    filters,  alterna9ve    to  clustering  

– Explore  chemotypes  and  ac9vity  

– Scaffold  level  promiscuity  

White,  D  and  Wilson,  RC,  J  Chem  Inf  Model,  2010,  50,  1257-­‐1274  

Page 8: Smashing Molecules

Scaffold  AcKvity  Diagrams  

•  Network  oriented  view  of  fragment  (scaffold)  collec9ons  – Similar  in  idea  to  Scaffold  Hunter  etc  

– Not  purely  hierarchical  •  Color  by  arbitrary    proper9es  

•  Quickly  assess  u9lity  of  a  scaffold  

•  Try  it  online    

Page 9: Smashing Molecules

What  Makes  a  Good  Scaffold?  

•  What  makes  a  good  scaffold?  – Size,  complexity,  …  – Do  the  members  represent  an  SAR  or  not?  

–  Intui9on  and  experience  also  play  a  role  

Page 10: Smashing Molecules

Scaffold  QSAR  

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!8 !6 !4 !2 0

!8

!6

!4

!2

0

ObservedPredicted

Evaluate  topological    and  physicochemical    descriptors  for  the    R-­‐groups  

Fit  PLS  or  ridge  regression  model  

Characterize  the    SAR  landscape  

Page 11: Smashing Molecules

Scaffold  QSAR  -­‐  Drawbacks  

•  Many  scaffolds  have  few  (5  to  10)  members  •  Invariably,  more  features  than  observa9ons  •  If  the  number  of  R-­‐groups  is  large,  the  feature  matrix  can  be  very  sparse  – Less  of  a  problem  for  combinatorial  libraries  

•  A  linear  fit  may  not  be  the  best  approach  to  correla9ng  R-­‐groups  to  the  ac9vi9es  – Difficult  to  choose  a  model  type  a  priori  

Page 12: Smashing Molecules

Fragment  AcKvity  Profiles  

•  Using  scaffolds  in  HTS  triage  usually  leads  to  two  ques9ons  – What  is  known  about  the  chemical  series  with  respect  to  the  intended  target?  

– What  compound  classes  are  known  to  modulate  the  intended  target  &  how  similar  are  they  to  series  in  ques9on  

•  We’re  interested  in  exploring  summaries  of  ac=vity,  grouped  by  scaffolds  and  targets  

Page 13: Smashing Molecules

Fragment  AcKvity  Profiles  

•  We  use  ChEMBL  (08)  as  the  source  of  bioac9vity  across  mul9ple  targets  

•  Preprocess  the  database  – Generate  scaffolds  (exhaus9ve  enumera9on  of  combina9ons  of  SSSR’s)  

– Normalize  ac9vity  data  so  that  we  compare  the  ac9vity  of  a  molecule  across  different  assays  

Page 14: Smashing Molecules

Database  Setup  

•  Preprocessing  steps  available  as  a  Java  servlet  – hkp://tripod.nih.gov/files/chembl-­‐servlets.zip  

•  Need  ChEMBL  installed  in  Oracle;  we  add  some  extra  tables  – Fragment  structures  and  computed  proper9es  – Aggregated  assay  ac9vity  summary  

•  Only  consider  assays  with  IC50’s  in  nM  and  uncensored  data,  more  than  5  observa9ons  and  a  MAD  >  0  

–  (Robust)  z-­‐scored  ac9vi9es  

Page 15: Smashing Molecules

Some  Fragment  StaKsKcs  

•  Considered  Z-­‐score  range  of  -­‐40  to  15  •  There  were  12,887  molecules  lying  outside  this  range  

log(Number of molecules)

Per

cent

age

of a

ssay

s

0

5

10

15

1.0 1.5 2.0 2.5

Z-score

Num

ber o

f com

poun

ds

0

10

20

30

40

50

-40 -30 -20 -10 0 10

Page 16: Smashing Molecules

Some  Fragment  StaKsKcs  

•  Next,  iden9fy  fragments  with  8  to  20  atoms  and  occurring  in  100  to  900  molecules  

•  Gives  us  1,746  fragments  

Num Molecules

Per

cent

age

of F

ragm

ents

0

10

20

30

40

200 400 600 800

Page 17: Smashing Molecules

Some  Fragment  StaKsKcs  

•  We  can  query  the  fragment  tables  to  get  ac9vity  summaries    for  individual    fragments  

•  For  these  examples  we  consider  the  full  range  of  Z-­‐  scores  

Z-Score

Per

cent

of T

otal

0

10

20

30

40

50

60

-30 -20 -10 0 10

N = 1280

778

-600 -400 -200 0

N = 1918

2723

-50 0 50

N = 2641

4058

-5 0 5 10 15

N = 1489

5390

0 10 20

N = 1578

5486

-60 -40 -20 0 20

0

10

20

30

40

50

60N = 1455

13485

0

10

20

30

40

50

60

-20 0 20

N = 1457

40169

-40 -20 0 20

N = 1595

64473

-20 -10 0 10

N = 1515

115654

Page 18: Smashing Molecules

Exploring  AcKvity  Profiles  

Fragments  from  ChEMBL  

Ac9vity  distribu9ons  of  parent  molecules    across  all  targets   Z-­‐scores  for  individual  

molecules  against  a    specific  target  

Page 19: Smashing Molecules

Exploring  AcKvity  Profiles  

•  User  can  draw  a  molecule  and  fragment  on  the  fly  

•  Use  generated  fragments  to    create    ac9vity    histograms  

Page 20: Smashing Molecules

Target  SelecKon  

•  Employs  the  ChEMBL  target  hierarchy  

•  Can  select  target    families  or  individual  targets  

Page 21: Smashing Molecules

Similar  Fragments  with  Similar  Profiles?  

•  Consider  658  fragments  with  >  10  atoms  and  occurring  in  500  to  1200  molecules  

•  Overall,  the  fragments  tend  to  be  dissimilar    – 95th  percen9le  is  just  0.50  

•  1,873  pairs  do  exhibit  Tc  >  0.8    

Tanimoto Similarity

Per

cent

age

of p

airs

0

5

10

15

20

25

0.0 0.2 0.4 0.6 0.8 1.0

Page 22: Smashing Molecules

Comparing  AcKvity  Profiles  

•  Compare  ac9vity  profiles  with  the  K-­‐S  sta9s9c  •  Color  corresponds  to    p-­‐value  of  the  K-­‐S  test  

•  No  obvious  correla9on  between  fragment  similarity  &  ac9vity  profile  similarity  

•  Probably  not  rigorous  when  a  scaffold  has  few  parent  molecules   Tanimoto Similarity

K-S

sta

tistic

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.80 0.85 0.90 0.95 1.00

0.0

0.2

0.4

0.6

0.8

1.0

Page 23: Smashing Molecules

Exploring  Profiles  for  Fragment  Pairs  

•  Compare  ac9vity  distribu9ons  across  all  targets  in  a  pairwise  fashion  

•  Can  also  generate  comparison  for  a  single  target,  but  requires  data  for  all  the  fragments  

Page 24: Smashing Molecules

Looking  for  SelecKve  Fragments  

•  Interes9ng  to  visually  explore  fragment  pairs  •  Can  become  tedious,  especially  in  a  database  as  big  as  ChEMBL  

•  Can  we  automate  this  type  of  analysis?  –  Iden9fy  fragment  pairs  with  very  different  ac9vity  distribu9ons?  

–  Iden9fy  fragments  with  a  preference  for  a  certain  target  (class)?  

Page 25: Smashing Molecules

Targetwise  AcKvity  Profiles  M

ean

Z−Sc

ore

−10

−50

Acetylc

holine

recep

tor

Adrene

rgic r

ecepto

rAgc

Angiote

nsin r

ecepto

r

ANIONIC C1A

Calciton

in gen

e−rel

ated p

eptide

recep

torCam

k

CATIONIC

CC chem

okine

recep

torCmgc

CXC chem

okine

recep

tor

CYP_11B

1

CYP_11B

2

CYP_19A

1

CYP_1A2

CYP_2C19

CYP_2C9

CYP_2D6

CYP_3A4

CYP_4A1

CYP_4A11

CYP_4A3

CYP_4F2

CYP_5A1

Dopam

ine re

ceptor dru

g

Endoth

elin re

ceptor

GnRH re

ceptor

Histamine

recep

torM10

A

MCH recep

tor

Metabo

tropic

glutam

ate re

ceptor

Neurok

inin re

ceptor

Neurop

eptide

Y recep

tor

Norepin

ephri

ne

NR1H3

NR3A1

NR3A2

NR3C3

Opioid r

ecepto

rPA

F

potas

sium

S1A

Seroton

in rece

ptor

Sodium

_hydr

ogen Tk

3 50 6 14 107 6 2 5 19 1 19 1 3 6 8 14 7 17 13 20 2 24 2 24 9 18 4 2 2 1 2 1 1 2 10 1 59 4 4 2 4 86 3 12 42 7 153

4056459

•  Evaluate  mean  ac9vity  of  parent  molecules  within  a  target  class  

•  Count  number  of  parent  molecules  tested  against  the  target  

•  Selec9vity  of  1-­‐phenylimidazole  for  CYP450  has  been  noted  

Wilkinson  et  al,  Biochem  Pharmacol,  1983,  32,  997-­‐1003  

Page 26: Smashing Molecules

Targetwise  AcKvity  Profiles  

•  Iden9fied  benzylpyrrolidine  as  a  fragment  with  preference  for  a  specific  target  class  

•  But  reported  as  dopamine  agonists  

Mea

n Z−

Scor

e

−8−6

−4−2

02

A2A

Adrene

rgic r

ecep

tor Agc

Angiot

ensin

rece

ptor

Bradyki

nin re

cepto

rC1A

Calcium

sens

ing re

cepto

rCam

k

CATIO

NIC

CC chem

okine

rece

ptor

Cholec

ystok

inin re

cepto

rCmgc

CYP_2D6

CYP_3A4

Dopam

ine

Dopam

ine re

cepto

r

EDG rece

ptor

Endoth

elin re

cepto

r

Glucag

on re

cepto

r

GnRH re

cepto

r

Histamine

rece

ptor

Leuko

triene

rece

ptor

M10A

M12B

MCH rece

ptor

Metabo

tropic

gluta

mate re

cepto

r

Neurok

inin re

cepto

r

Neurop

eptid

e Y re

cepto

r

Norepin

ephri

neNR1I1

NR3C4

Opioid

recep

torOthe

rPA

F

Prostan

oid re

cepto

rReg S1A S21 S9A

Seroton

in

Seroton

in rec

eptor Tk Tkl

5 2 23 7 6 7 24 2 67 102 6 18 3 8 11 19 16 2 1 16 49 1 3 2 33 18 118 1 1 4 2 11 8 3 28 5 38 7 45 4 9 29 2

4055899

Page 27: Smashing Molecules

Fragment  or  Scaffold?  

•  I’ve  been  using  fragment  &  scaffold  interchangeably  –  not  always  true  

•  Chemists  have  an  intui9ve  idea  of  what  a  scaffold  is  

•  Can  we  encode  the  idea  of  scaffold-­‐like  or  fragment-­‐like  

•  We  use  the  concept  of    Signal-­‐to-­‐Noise    Ra9o   SNR = µ

!

Size  of  fragment  

SD  of  number  of  atoms  not  in  the  fragment,    considered  over  the    parent  molecules  

Page 28: Smashing Molecules

Fragment  or  Scaffold  

•  Par9al  distribu9on  of  SNR  values  for  fragments  with  atom  count  >  8  &  <  20  

SNR

Per

cent

age

of F

ragm

ents

0

10

20

30

40

50

60

0 1 2 3 4 5 6

Page 29: Smashing Molecules

•  Large  SNR’s  associated  with  Murcko-­‐like  fragments  •  A  useful  SNR  cutoff  is  an  open  ques9on  

SNR  =  8.50  

Fragment  or  Scaffold  

SNR  =  12.09  SNR  =  9.10  

SNR  =  0.36  SNR  =  0.43  SNR  =  0.83  

Page 30: Smashing Molecules

AcKvity  Profiles  &  SNR  

•  Given  a  fragment,  evaluate  SD  of  the  number  of  atoms  in  the  parent  molecules  that  are  not  part  of  the  fragment  

•  Label  the  parent  molecules  based  on    –  If  number  of  atoms  not  in  the  fragment  >  SD,  non  core-­‐like  

– Otherwise  core-­‐like  •  Visualize  the  ac9vity  distribu9ons  of  the  parent  molecules,  grouped  by  the  label  

 

Page 31: Smashing Molecules

Z-Score

Per

cent

age

of T

otal

20

40

60

80

-50 0 50

Core-like20967

-50 0 50

Not core-like20967

-50 0 50

Core-like44591

-50 0 50

Not core-like44591

Z-Score

Per

cent

age

of T

otal

20

40

60

80

-30 -20 -10 0 10

Core-like801

-30 -20 -10 0 10

Not core-like801

-30 -20 -10 0 10

Core-like68604

-30 -20 -10 0 10

Not core-like68604

Z-Score

Per

cent

age

of T

otal

20

40

60

80

-50 0 50

Core-like20967

-50 0 50

Not core-like20967

-50 0 50

Core-like44591

-50 0 50

Not core-like44591

Z-Score

Per

cent

age

of T

otal

20

40

60

80

-30 -20 -10 0 10

Core-like801

-30 -20 -10 0 10

Not core-like801

-30 -20 -10 0 10

Core-like68604

-30 -20 -10 0 10

Not core-like68604

Z-Score

Per

cent

age

of T

otal

20

40

60

80

-50 0 50

Core-like20967

-50 0 50

Not core-like20967

-50 0 50

Core-like44591

-50 0 50

Not core-like44591

Z-Score

Per

cent

age

of T

otal

20

40

60

80

-30 -20 -10 0 10

Core-like801

-30 -20 -10 0 10

Not core-like801

-30 -20 -10 0 10

Core-like68604

-30 -20 -10 0 10

Not core-like68604

Z-Score

Per

cent

age

of T

otal

20

40

60

80

-50 0 50

Core-like20967

-50 0 50

Not core-like20967

-50 0 50

Core-like44591

-50 0 50

Not core-like44591

Z-Score

Per

cent

age

of T

otal

20

40

60

80

-30 -20 -10 0 10

Core-like801

-30 -20 -10 0 10

Not core-like801

-30 -20 -10 0 10

Core-like68604

-30 -20 -10 0 10

Not core-like68604

AcKvity  Profiles  &  SNR  

High  SNR  

Low  SNR  

Page 32: Smashing Molecules

Downloads  

•  Scaffold  ac9vity  networks  •  Fragment  Ac9vity  Profiler  – SQL  &  servlet  sources  – Client  sources  – Online  version  

Page 33: Smashing Molecules