structure-activity relationships and networks: a generalized approachto exploring structure-activity...

50
StructureAc)vity Rela)onships and Networks: A Generalized Approach to Exploring StructureAc)vity Landscapes Rajarshi Guha NIH Chemical Genomics Center / NIH Center for Transla9onal Therapeu9cs March 29, 2011

Upload: rguha

Post on 10-May-2015

1.043 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes

Structure-­‐Ac)vity  Rela)onships  and  Networks:  A  Generalized  Approach  

to  Exploring  Structure-­‐Ac)vity  Landscapes  

Rajarshi  Guha  NIH  Chemical  Genomics  Center  /  

NIH  Center  for  Transla9onal  Therapeu9cs  

March  29,  2011  

Page 2: Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes

•  Founded  2004  as  part  of  NIH  Roadmap  Molecular  Libraries  Ini9a9ve  –  NCGC  staffed  with  90+  scien9sts  –  biologists,  chemists,  informa9cians,  engineers  

–  Post-­‐doc  program  

•  Mission  –  MLPCN  (screening  &  chemical  synthesis;  compound  repository;  PubChem  database;  

funding  for  assay,  library  and  technology  development  )  •  Complements  individual  inves9gator-­‐ini9ated  research  programs  •  Enables  “pharma-­‐level”  HTS  and  early  chemical  op9miza9on  

–  Develop  new  chemical  probes  for  basic  research  and  leads  for  therapeu9c  development,  par9cularly  for  rare/neglected  diseases  

–  New  paradigms  &  applica9ons  of  HTS  for  chemical  biology  /  chemical  genomics  

•  All  NCGC  projects  are  collabora9ons  with  a  target  or  disease  expert;    currently  >200  collabora9ons  with  inves9gators  worldwide    –  75%  NIH  extramural,  10%  NIH  intramural,  15%  Founda9ons/Research  Consor9a/Pharma/

Biotech  

NIH  Chemical  Genomics  Center  

Page 3: Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes

(C) Detection methods

(B) Target types (A) Disease areas

NCGC  Project  Diversity  

Page 4: Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes

1536-well plates, inter-plate dilution series Assay volumes 2 – 5 μL

Assay concentration ranges over 4 logs (high:~ 100 μM)

Informatics pipeline. Automated curve fitting and classification. 300K samples

Automated concentration-response data collection ~1 CRC/sec

qHTS:    High  Throughput  Dose  Response  A  

B  

C  

Page 5: Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes

Background  

•  Cheminforma9cs  methods  – QSAR,  diversity  analysis,  virtual  screening,    fragments,  polypharmacology,  networks  

•  More  recently  –  RNAi  screening,  high  content  imaging  

•  Extensive  use  of  machine  learning  •  All  9ed  together  with  socware    development  – User-­‐facing  GUI  tools  –  Low  level  programma9c  libraries  

•  Believer  &  prac99oner  of  Open  Source  

Page 6: Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes

Outline  

•  Structure-­‐ac9vity  rela9onships  •  Characterizing  ac9vity  cliffs  •  Working  with  the  structure-­‐ac9vity  landscape  

Page 7: Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes

Structure  Ac)vity  Rela)onships  

•  Similar  molecules  will  have  similar  ac9vi9es  •  Small  changes  in  structure  will  lead  to  small  changes  in  ac9vity  

•  One  implica9on  is  that  SAR’s  are  addi9ve  

•  This  is  the  basis  for  QSAR  modeling  

Mar9n,  Y.C.  et  al.,  J.  Med.  Chem.,  2002,  45,  4350–4358  

Page 8: Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes

Excep)ons  Are  Easy  to  Find  

N

N

O

F3C

NH2

NH2

Cl Cl

Tran,  J.A.  et  al.,  Bioorg.  Med.  Chem.  Le2.,  2007,  15,  5166–5176  

Ki  =  39.0  nM  

N

N

O

F3C

NH2

NH

Cl Cl

O

Ki  =  1.8  nM  

N

N

O

NH2

NH

Cl Cl

O

F3C

NH2

Ki  =  10.0  nM  

N

N

O

F3C

NH2

NH

Cl Cl

O NH2

Ki  =  1.0  nM  

Page 9: Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes

Structure  Ac)vity  Landscapes  

•  Rugged  gorges  or  rolling  hills?  – Small  structural  changes  associated  with  large  ac9vity  changes  represent  steep  slopes  in  the  landscape  

– But  tradi9onally,  QSAR  assumes  gentle  slopes    – Machine  learning  is  not  very  good  for  special  cases  

Maggiora,  G.M.,  J.  Chem.  Inf.  Model.,  2006,  46,  1535–1535  

Page 10: Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes

Structure  Ac)vity  Landscapes  

Page 11: Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes

Characterizing  the  Landscape  

•  A  cliff  can  be  numerically  characterized  •  Structure  Ac9vity  Landscape  Index  (SALI)  

•  Cliffs  are  characterized  by  elements  of  the  matrix  with  very  large  values  

Guha,  R.;  Van  Drie,  J.H.,  J.  Chem.  Inf.  Model.,  2008,  48,  646–658  

SALIi, j =Ai − A j

1− sim(i, j)

Page 12: Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes

Visualizing  the  SALI  Matrix  

Page 13: Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes

Fingerprints  

•  Lots  of  types  of  fingerprints    •  Indicates  the  presence  or  absence  of  a  structural  feature    

•  Length  can  vary  from  166  to  4096  bits  or  more    •  Fingerprints  usually  compared  using  the  Tanimoto  metric  

1 0 1 1 0 0 0 1 0

Page 14: Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes

Varying  Fingerprint  Methods  

•  Shorter  fingerprints  will  lead  to  more  “similar”  pairs  

•  Requires  a  higher  cutoff  to  focus  on  significant  cliffs  

BCI 1052 bit

Tanimoto Similarity

De

nsity

0.70 0.75 0.80 0.85 0.90 0.95 1.00

02

46

8

MACCS 166 bit

Tanimoto Similarity

De

nsity

0.70 0.75 0.80 0.85 0.90 0.95 1.00

02

46

8

CDK 1024 bit

Tanimoto Similarity

De

nsity

0.6 0.7 0.8 0.9 1.0

02

46

8

Page 15: Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes

Varying  the  Similarity  Metric  

Page 16: Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes

Different  Ac)vity  Representa)ons  

•  Using  the  Hill  parameters  from  a  dose-­‐response  curve  represents  richer  data  than  a  single  IC50  

Concentration

Activity

S0

50%

SInf

AC50

S0SinfAC50

H

⎨ ⎪ ⎪

⎩ ⎪ ⎪

⎬ ⎪ ⎪

⎭ ⎪ ⎪

SALIi, j =d(Pi,Pj )1− sim(i, j)

Page 17: Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes

Visualizing  SALI  Values  

•  Alterna9ves?  – A  heatmap  is  an  easy  to  understand  visualiza9on  

– Coupled  with  brushing,  can  be  a  handy  tool  – A  more  flexible  approach  is  to  consider  a  network  view  of  the  matrix    

•  The  SALI  graph  – Compounds  are  nodes  – Nodes  i,j  are  connected  if  SALI(i,j)  >  X  – Only  display  connected  nodes  

Page 18: Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes

Visualizing  SALI  Values  

•  The  SALI  graph  – Compounds  are  nodes  

– Nodes  i,j  are  connected  if  SALI(i,j)  >  X  – Only  display  connected  nodes  

!1

!2!3

!4

!5

!6

!7

!8

!9

!10!12

!13

!15

!17

!18

!19

!20 !21!22

!23!24

!25

!27

!28

!29

!30

!31

!34!35

!37

!38

!39

!40

!41 !42

!43

!44

!45

!46

!49

!50

!51

!52

!53

!54

!55

!56!57

!58

!59

!60

!62

!63

!64

!65

!66

!68

!69

!71 !72 !73

!75

!76

Page 19: Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes

Varying  the  Cutoff  

•  The  cutoff  controls  the  complexity  of  the  graph    •  Higher  cut  offs  will  highlight  the  most  significant  ac9vity  cliffs  

Cutoff = 90%

!1

!2

!3

!4

!5

!6

!8

!9 !12!13!15!17

!19

!22 !23

!24 !25!28

!29 !38

!39

!40

!41

!42

!43

!44

!45

!46

!49

!50!52

!54

!55!56!57

!59

!60 !62

!63!64

Cutoff = 50%

!1

!2

!3

!4

!5

!6

!8

!9 !12!13!15!17

!19

!21 !22

!23 !24

!25

!28

!29 !35

!37

!38

!39

!40

!41 !42

!43

!44

!45

!46

!49

!50!52

!54

!55!56

!57

!58

!59

!60 !62

!63!64

!65 !66

Cutoff = 20%

!1

!2!3

!4

!5

!6

!7

!8

!9

!10!12

!13

!15

!17

!18

!19

!20 !21!22

!23!24

!25

!27

!28

!29

!30

!31

!34!35

!37

!38

!39

!40

!41 !42

!43

!44

!45

!46

!49

!50

!51

!52

!53

!54

!55

!56!57

!58

!59

!60

!62

!63

!64

!65

!66

!68

!69

!71 !72 !73

!75

!76

Page 20: Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes

BePer  Visualiza)on  -­‐  SALIViewer  

hPp://sali.rguha.net  

Page 21: Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes

What  Can  We  Do  With  SALI’s?  

•  SALI  characterizes  cliffs  &  non-­‐cliffs  •  For  a    given  molecular  representa9on,  SALI’s  gives  us  an  idea  of    the  smoothness  of  the    SAR  landscape  

•  Models  try  and  encode  this  landscape  

•  Use  the  landscape  to  guide  descriptor  or  model    selec9on  

Page 22: Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes

Descriptor  Space  Smoothness  

•  Edge  count  of  the  SALI  graph  for  varying  cutoffs  •  Measures  smoothness  of  the  descriptor  space  

•  Can  reduce  this  to  a  single  number  (AUC)  

SALI Cutoff

Num

ber

of E

dges in S

ALI G

raph

0

100

200

300

400

0.0 0.2 0.4 0.6 0.8 1.0

astemizole

cisaprideE-4031 pimozide

dofetilidesertindole haloperidol droperidolthioridazine

terfenadineverapamildomperidoneloratadinehalofantrine mizolastine

bepridilazimilidechlorpromazinemibefradil

imipraminegranisetrondolasetron perhexiline

amitriptylinediltiazemsparfloxacin

grepafloxacin sildenafilmoxifloxacin

gatifloxacin

astemizole

cisapride E-4031 sertindole pimozide dofetilide haloperidoldroperidol thioridazine domperidone loratadine mizolastine bepridil azimilide mibefradil chlorpromazine imipramine

granisetron dolasetron perhexiline amitriptyline diltiazem sparfloxacin grepafloxacin sildenafil moxifloxacin gatifloxacin

astemizole

grepafloxacin sildenafil moxifloxacin gatifloxacin

Page 23: Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes

Other  Examples  

•  Instead  of  fingerprints,    we  use  molecular    descriptors  

•  SALI  denominator  now    uses  Euclidean  distance  

•  2D  &  3D  random    descriptor  sets  – None  are  really  good  – Too  rough,  or  – Too  flat  

SALI Cutoff

Nu

mb

er

of

Ed

ge

s in

SA

LI

Gra

ph

0

100

200

300

400

0.0 0.2 0.4 0.6 0.8 1.0

SALI Cutoff

Nu

mb

er

of

Ed

ge

s in

SA

LI

Gra

ph

0

100

200

300

400

0.0 0.2 0.4 0.6 0.8 1.0

2D  

3D  

Page 24: Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes

Feature  Selec)on  Using  SALI  

•  Surprisingly,  exhaus9ve  search  of  66,000  4-­‐descriptor  combina9ons  did  not  yield  semi-­‐smoothly  decreasing  curves  

•  Not  en9rely  clear  what  type  of  curve  is  desirable  

Page 25: Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes

SALI  Graphs  &  Predic)ve  Models  

•  The  graph  view  allows  us  to  view  SAR’s  and  iden9fy  trends  easily  

•  The  aim  of  a  QSAR  model  is  to  encode  SAR’s  •  Tradi9onally,  we  consider  the  quality  of  a  model  in  terms  of  RMSE  or  R2  

•  But  in  general,  we’re  not  as  interested  in  RMSE’s  as  we  are  in  whether  the  model  predicted  something  as  more  ac9ve  than  something  else    – What  we  want  to  have  is  the  correct  ordering  – We  assume  the  model  is  sta9s9cally  significant  

Page 26: Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes

Measuring  Model  Quality  

•  A  QSAR  model  should  easily  encode  the  “rolling  hills”  

•  A  good  model  captures  the  most  significant  cliffs  •  Can  be  formalized  as    

   How  many  of  the  edge  orderings  of  a  SALI  graph                    does  the  model  predict  correctly?  

•  Define  S  (X  ),  represen9ng  the  number  of  edges  correctly  predicted  for  a  SALI  network  at  a  threshold  X  

•  Repeat  for  varying  X  and  obtain  the  SALI  curve  

Page 27: Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes

SALI  Curves  

0.0 0.2 0.4 0.6 0.8 1.0

!1.0

!0.5

0.0

0.5

1.0

X

S(X

)

3!descriptor

5!descriptor

Scrambled 3!descriptor

0.0 0.2 0.4 0.6 0.8 1.0

!1

.0!

0.5

0.0

0.5

1.0

X

S(X

)

SCI = 0.12

Page 28: Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes

Model  Search  Using  the  SCI  

•  We’ve  used  the  SALI  to  retrospec9vely  analyze  models  

•  Can  we  use  SALI  to  develop  models?  –  Iden9fy  a  model  that  captures  the  cliffs  

•  Tricky  – Cliffs  are  fundamentally  outliers  

– Op9mizing  for  good  SALI  values  implies  overfivng  – Need  to  trade-­‐off  between  SALI  &  generalizability  

Page 29: Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes

The  Objec)ve  Func)on  

•  S0  is  a  measure  of  the  models  ability  to  summarize  the  dataset  (analogous  to  RMSE)  

•  S100  measures  the  models  ability  to  capture  cliffs  

•  Ideally,  the  curve  starts  high  and  stays  high  SALI Cutoff

S(X)

0.6

0.7

0.8

0.9

1.0

0.0 0.2 0.4 0.6 0.8 1.0

S100  S0  

F =1S100

F =1S0

+(S100 − S0)

2

F =1SCI

Page 30: Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes

SALI  Based  Model  Selec)on  

•  Considered  the  BZR  dataset    from  Sutherland  et  al  

•  Iden9fied  “best”  models  using  a  GA  to  select  from  a    pool  of  2D  descriptors  

•  While  SALI  based  op9miza9on  can  lead  to  a  “bexer”  curve,    it  doesn’t  give  the  best  model  

SALI Cutoff

S(X)

-0.5

0.0

0.5

0.0 0.2 0.4 0.6 0.8 1.0

RMSE SCI S(100)

SALI Cutoff

S(X)

-0.5

0.0

0.5

0.00 0.02 0.04 0.06 0.08

RMSE SCI S(100)

Sutherland,  J  et  al,  J.  Chem.  Inf.  Comput.  Sci.,  2003,  43,  1906-­‐1915  

Page 31: Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes

SALI  Based  Model  Selec)on  

•  107  aryl  azoles  as  ER-­‐β  agonists  •  Used  a  GA  and  2D  descriptors  to  iden9fy  models  

•  In  this  case,  a  SALI  based    objec9ve  func9on  was  able  to  iden9fy  the  best  model  

•  Interes9ngly,  SCI  does  not    seem  to  perform  very  well  

Malamas,  M.S.  et  al,  J  Med  Chem,  2004,  47,  5021-­‐5040  

SALI Cutoff

S(X)

-0.5

0.0

0.5

0.0 0.2 0.4 0.6 0.8 1.0

RMSE SCI S(0) + D/2

SALI Cutoff

S(X)

-0.5

0.0

0.5

0.00 0.02 0.04 0.06 0.08

RMSE SCI S(0) + D/2

Page 32: Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes

SALI  Based  Model  Selec)on  

•  The  size  of  the  solu9on  space  explored  depends  on  the  SALI  objec9ve  func9on  

RMSE S(100) SCI

0.90

0.95

1.00

1.05

1.10

1.15

Objective Function

RMSE

BZR  

1/S(0) + D/2 RMSE SCI

0.55

0.60

0.65

Objective Function

RMSE

ER-­‐β  

Page 33: Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes

Predic)ng  the  Landscape  

•  Rather  than  predic9ng  ac9vity  directly,  we  can  try  to  predict  the  SAR  landscape  

•  Implies  that  we  axempt  to  directly  predict  cliffs  – Observa9ons  are  now  pairs  of  molecules  

•  A  more  complex  problem  – Choice  of  features  is  trickier  – S9ll  face  the  problem  of  cliffs  as  outliers  – Somewhat  similar  to  predic9ng  ac9vity  differences  

Scheiber  et  al,  StaHsHcal  Analysis  and  Data  Mining,  2009,  2,  115-­‐122  

Page 34: Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes

Predic)ng  Cliffs  

•  Dependent  variable  are  pairwise  SALI  values,  calculated  using  fingerprints  

•  Independent  variables  are  molecular  descriptors  –  but  considered  pairwise  – Absolute  difference  of  descriptor  pairs,  or  – Geometric  mean  of  descriptor  pairs  – …  

•  Develop  a  model  to  correlate  pairwise  descriptors  to  pairwise  SALI  values  

Page 35: Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes

A  Test  Case  

•  We  first  consider  the  Cavalli  CoMFA  dataset  of  30  molecules  with  pIC50’s  

•  Evaluate  topological  and  physicochemical  descriptors  

•  Developed  random  forest    models  – On  the  original  observed    values  (30  obs)  

– On  the  SALI  values    (435  observa9ons)  

Cavalli,  A.  et  al,  J  Med  Chem,  2002,  45,  3844-­‐3853  

Page 36: Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes

Double  Coun)ng  Structures?  

•  The  dependent  and    independent  variables  both    encode  structure.    

•  But  prexy  low  correla9ons    between  individual  pairwise    descriptors  and  the  SALI    values  

R2

Perc

ent of Tota

l0

10

20

30

40

50

60

0.00 0.05 0.10 0.15

AbsDiff

0

10

20

30

40

50

60

GeoMean

Page 37: Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes

Model    Summaries  

•  All  models  explain  similar  %  of  variance  of  their  respec9ve  datasets    

•  Using  geometric  mean  as  the  descriptor  aggrega9on  func9on  seems  to  perform  best  

•  SALI  models  are  more  robust  due  to  larger  size  of  the  dataset  

Observed pIC50

Pre

dic

ted

pIC

50

4

5

6

7

8

9

4 5 6 7 8 9

!

!

!!!

!

!!

!

!

!!

!

!!

!!

!

!!

!!

!!

!

!

!

!

!

!

Original  pIC50  RMSE  =  0.97  

Observed SALIP

red

icte

d S

AL

I

0

2

4

6

0 2 4 6

!!

!

!

!

!

!! !

!!

!!

!

!!

!!

!!!!

!

!

!

!

!

!

!

!!!!

!!! !!

!

!!

!

!

!!

!

!

! !!

!

!

!!

!

!!

!

!

!!

!!

!!

!

!!

!

!

!

!!

!

!

!

!!

!

!!

!

!

!

!

!!

!! !

!

!

!

!

!

!

!

!!

!

!

!

! !

!

! !

!

!

!

!!!

!!

!

!

!!

!!!!

!

!

!!!

!

!

!

!

!

!!

! !

!

!! !

!

! !

!!

!

!!!!

!!

!

!

!

!

!!

!! !

!!

!

!! !! !!

!

!!

!!!

!

!

!!

!

! !!

!

!

! !!!

!!

!

!!

!

!

!

!

!

!

!

!

!

!!!

!

!!!

!!

!

!!

!!!

!

!

!

!

!

!

!

!!

!!!

!! !

!! !! !

!!!!!

!

!!!

!!

!!!!!

!

!

!

!!

!

!

!

!

!

!!! !!

!!!

!!

!

!

!

!

!!

! !!!

!!

!

!!!

!!

!

!

!

!!

!!

!

!!!!!

!

!

!

!!

!

!!

!! !

! !

!!!

!

!

!!

!

!!

!!!

!!!

!

!

!

!

!

!

!

!

!

!!

!!!

!

!! !

!

!!

!!

!!!

!!

!

!

!

!

!

!!!

!

!

!

!

!

!!!

!!

!

!

!!

!!!!

!

!

!

!!

! !

!!!

!

!!

!

!

!

!

!

!

!!

!

!!!

!

!!!!

!!!

!

!

!!

!

!!!

!!!

!

SALI,  AbsDiff  RMSE  =  1.10  

Observed SALI

Pre

dic

ted

SA

LI

0

2

4

6

0 2 4 6

!

!

!

!

!

! !!

!!

!

!!

!

!

!

!!

!

!!

!

!

!

!

!

!

!

!

!

!!!

!!

! !

!

!

!

!

!

!

! !

!

! ! !!

!!

!

!

!

!

!

!!!!!

!

!

!!!! !

!

!

!!

!!

!!!

!

!

!

!!!

!!!!

! !

!

!!!

!

!

!

!!

!! !! !

!

!

!

!

!!

!

!

!

! !

!

!

!

!

!

!

!

!

!

!

!!!

!!

!!

!

!!

!!

!

!

!

!

!!

!

!

!

!

!

!!!

!!

!

! !!

!

!

!!

!!!!

!!

!

!!

!

!!!

!

!

!

!!!

!!

!

!

!

!

!!

!!!

!

!!!!!!

!

!

!

!

!

!

!

!

!!!!

!

!

! !

!!

!

!

!!

!

!

!

!

!

!!!

!!!!

!

!

!

!!!!!

! !!!!

!!!

!

!!

!!!

!!!

!

!! !

!!!

!!

!

!

!

!

! !

!!!

!

!

! !! !

!

! !!!!!

!! !!! !

! !!!!

!!

!!

!

!!!!

!!

! !!!

!

!

!

!

!!

!!

!

!

!

!!

!

!

!

!

!

!!!

!

!!!

!!!!!

!

! !

!!!

!

!

!

!

!

!!

! !

!!!

!

!! !!!

!

!!!

!!

!

!

!

!

!!

!!!

!!! !

!!!!

!!

!! !

!

!!

!

!

!! !

!

!!

!

!

!!

!

!!!! !

!!

!

! !

!

!!!

!! !!

!

!

!

!

!

SALI,  GeoMean  RMSE  =  1.04  

Page 38: Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes

Test  Case  2  

•  Considered  the  Holloway  docking  dataset,  32  molecules  with  pIC50’s  and  Einter  

•  Similar  strategy  as  before  

•  Need  to  transform  SALI  values    

•  Descriptors  show  minimal    correla9on  

SALI

Perc

ent of T

ota

l

0

10

20

30

40

50

0 20 40 60 80 100 120

log10 (SALI)

Perc

ent of T

ota

l

0

10

20

30

-1 0 1 2

Holloway,  M.K.  et  al,  J  Med  Chem,  1995,  38,  305-­‐317  

Page 39: Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes

Model    Summaries  

•  The  SALI  models  perform  much  poorer  in  terms  of    %  of  variance  explained  

•  Descriptor  aggrega9on  method  does  not  seem  to  have  much  effect  

•  The  SALI  models  appear  to  perform  decently  on  the  cliffs  –  but  misses  the  most  significant    

Observed pIC50

Pre

dic

ted

pIC

50

5

6

7

8

9

10

5 6 7 8 9 10

!!!

!

!!

!!!

!

!

!

!!

!!

!

!

!

!! !

!

!

!

!

!

!

!

!

!

!

Original  pIC50  RMSE  =  1.05  

Observed log10(SALI)P

red

icte

d lo

g1

0(S

AL

I)

!1

0

1

2

!1 0 1 2

!

!!

!

!

!!

!

!

!

!

!

!

!

!

! ! !!

!!!!

!

!

!

!!

!!!

!

!

!

!

!!

!

!

!!

!

!!

!

!!

!

!

!! !

!!

!

! !!!

!!

!

!

!!!

!

!

!

!

!!

!!

!!!

!!!!!

!

!

!

!!!

!!

! !!

!

!

!

!

!

!

!

!!

!

!!

!

!! !

!

!

!

!

!!!

!

!

!!!

!

!

!

!

!

!

!!

!!! !!!

!!

!

!

!

!!

!

!

!

!!!

!

!

!

!

!!

!

!

!! !

!!!

!

!

!

!

!!!

!

!!

!

!

!

!

!

!

!

!

!!!

!

!!!!

!

!

!

!!

!

!!

!

!

!

!

!!

!

!

!

!!

!

!!!!

!

!

!

!!

!

!

!

!

!

!

!

!

!!

!

!! !!! !

!

!

!

!

!!!

!

!

!

!

!!

!

! !

!!

!!!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

! !!

!!!

!!

!

!

!

!!

!

!

!

!!!!

!

!!!

!!

!

!

!

!

!

!

!!

!

!

!!

!!

!!

!

!! !!

! !

!

!!

!

!

!!

!

!!

!!

!! !!!

!! !

!! !!

!

!!!

!!!!!

!

!

!

!!

!!!

!

!!!

!!

!!

!

!

!

!!!

!

! !

! !

!!

!

!

!

!

!

!!!

!

!

!

!

!

!

!!!

!

!

!!

!

!

!

! !

!!

!

!

!

!

!

!!

!!

!

!

!!

!

!

!!!

!

!

!!

!

!

!

!

!!

!!

!

!!!

!

!

!

!!!

!

!!

!

!

!

!

!!!

!

!

!

!

!! !

!

!

!

!

!!!

!!

!

!

!!

!

!

!!!

!

! !!

!

!

!

!

!

!

!

!

SALI,  AbsDiff  RMSE  =  0.48  

Observed log10(SALI)

Pre

dic

ted

lo

g1

0(S

AL

I)

!1

0

1

2

!1 0 1 2

!

!!

!

!

!

!!

!

!

!

!

!!

!

!

! !

!

!!!!

!

!

!

!

!

!

!! !!!

!

!

!

!

!

!

!

!!!

!

!

!

!!

!!

!! !

!

! !!

!

!

!

!!

!!!!

!

!

!

!

!

!

!

!

!!

!

!! !

!

!

!

!

!!

!

!

!

! !!!!

!

!

!

!!

!!

!

!!

!

!! !

!

!

!

!

!!

!

!

!

!!!

!

!

!

!

!

!!

!!

!!

!

!!

!

!

!

!

!

!!

!

!

!

!!!

!

!

!

!!!

!

!

!! !

!

!

!

!

!

!

!

!!!

!!!

!

!

!

!

!!

!

!

!!!

!

!!!!

!

!

!

!!

!

!!

!

!

!

!

! !

!

!

!

!!

!

!!

!!

!

!

!

!!

!

!!

!

!

!

! !!

!

!

!!

!!!!

!

!

!

!

!!!

!

!

!

!

!!!! !

!

!!

! !

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!!!

!

!!

!!

!

!

!

!!

!

! !

!!

!

! !

!

!

!

!!

!

!

!

!

!

!!!

!

!

!

!

!

!

!!

!

!! !!

! !

!

!!

!!!!

!

!!

!!!

!!!!!

!!!

! !!

!

!!!

!

!! !

!

!!

!

!

!

!

!!

!

!!!

!!

!

!

!

!

!

!!

!

!

!!!

!

!!!

!

!!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!! !

!! !

!

!

!!

!!

!

! !

!

!

!

!

!

!

!

!

!

!

! !!

! !

!

!

!

!!

!

!!!

!

!

!

!!!

!

!

! !

!

!

!!!

!

!

!

!

!

!

!!

!

!

!

!!

!

!!

!

!

!

!!

!

!

!!!

!

!

!

!

!

!

!

!

!

!

!

!

SALI,  GeoMean  RMSE  =  0.48  

Page 40: Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes

Model    Summaries  

•  With  untransformed  SALI  values,  models  perform  similarly  in  terms  of    %  of  variance  explained  

•  The  most  significant  cliffs  correspond  to  stereoisomers  

Observed pIC50

Pre

dic

ted

pIC

50

5

6

7

8

9

10

5 6 7 8 9 10

!!!

!

!!

!!!

!

!

!

!!

!!

!

!

!

!! !

!

!

!

!

!

!

!

!

!

!

Original  pIC50  RMSE  =  1.05  

Observed SALIP

red

icte

d S

AL

I

20

40

60

80

100

20 40 60 80 100

!

!!

!

!

!

!!

!

!

!

!!

!

!

!

!!

!!!

!!!

!

!

!!

!!!

!!

!

!

!

!

!

!

!

!

!!

!

!

!!

!!!!!

! !

!!

!!! !

!!!!!!!

!

!

!

! !!

!! !!

!

!!

!!

!!

!

!!!

!

!!!!!

!

!

!

!

!

!

!!

!

!!

!

!!!!!!

!

!!

!!!!

!

!!

!

!

!

!

!

!!

!

!!

!!!

!!

!

!

!

!!!

!!!!!

!

!

!

!!!

!!!

!!

!

!!!

!!

!

!!!!!

!!

!

!

!

!!

!!

!

!!

!

!!!!

!!

!

!!

!!!

!

!

!

!

!!

!

!

!!!

!

!!

!!

!

!

!

!!

!!!

!

!

!

! !!!! !

!

!

!!!!

!!

!

!!!

!!

!

!

!!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!!

!

!

!

!!

!!

!!!

!

!!!!

!

!

!

!!

!!

!

!

!!!

!

!!!!!

!!

!

!

!

!!!

!

!

!

!!

!!

! !

!!!!

!!

!

!!

!!

!!! !

!!!!!!!!!

!!

!

!!

!!!!!

!

!!!!

!!

!

!!

!!!

! !!

!

!!!!

!

!

!

!!

!!!!

!

!

!

!!

!

!!

!

!!

!

!!

!!

!

!

!

! !

!!

!

!

!

!

!

!

!

!!

!!

!! !

!! !

!

!!

!!

!

!

!!

!

!

!

!!!

! !!

!!

!!

!

!

!! !!

!

!!! !!!

!

!

!!!!

!

!

!

!

!

!!!

!

!

!

!

!!!!!

!

!!!

!

!

!!!

!!

!!

!

!!

!

!

!!!

SALI,  AbsDiff  RMSE  =  9.76  

Observed SALI

Pre

dic

ted

SA

LI

20

40

60

80

100

20 40 60 80 100

!

!!!

!

!!!

!

!

!

!

!!

!!!!

!

!!!!

!

!

!

!

!

!!! !

!!!

!

!

!

!

!

!

!

!!!

!

!!

!

!!

!! !

!! !!

!

!

!!!!!!!

!

!

!

!!

!!!!!

!!!

!!

!

!

!

!!

!!

!!!!!!

!

!

!

!!

!!

!!!

!

!!!!

!

!

!

!!

!!!

!!!!

!

!

!

!

!

!!! !!

!

!!!!

!!

!

!!

!!!

!!!

!

!

!

! !!!

!!!!!!!!

!

!

!

!!!!!!

!

!

!

!

!!

!!

!!!

!

!!

!!

!!

!

!!

!!!

!

!

!

!

!!

!!

!!!

!

!!

!!

!

!

!

!!

!!!

!

!

!

!!

!!!

!!

!

!! !!

!

!

!

!!

!!!

!

!

!

!

!

! !

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!!

! !!

!

!!

!!

!

!

!

!

!

!! !

!!

!

! !

!

!

!

!!!! !

!

!

!

!

!

!

!

!

!!!

!!

!

!!

!!!!

!

!!

!!!

!!

!!

!!!!!!!!

!!!

!!

!!!!!

!

!!!!

!!

!

!!

!!!! !!

!

!!!!

!!

!

!!

!!!!

!

!

!!!!!

!

!

!!

!!

!!

!

!!

!! !!

!

!

!

!

!!

!!!! !!!

! !

!

! !!!!

!!

!

!

!!

!

!

!

!

!!! !

!!

!

!! ! !!! !

!!

!!

!!!

! !!

! !!!

!

!

!

!

!

!!!

!

!

!

!!!

!!!!!

!!

!

!

!!!

!!

!!

!

!!

!

!!

!!

SALI,  GeoMean  RMSE  =  10.01  

Page 41: Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes

Model  Caveats  

•  Models  based  on  SALI  values  are  dependent  on  their  being  an  SAR  in  the  original  ac9vity  data  

•  Scrambling  results  for  these  models  are  poorer  than  the  original    models  but  aren’t  as    random  as  expected  

Observed SALI

Pre

dic

ted S

ALI

0

2

4

6

0 2 4 6

Page 42: Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes

SALI  in  Bulk  

•  Much  of  this  material  is  exploratory  •  So  we’re  interested  in  trends  across  many  assays    •  ChEMBL  is  an  excellent  source  for  ac9vity  cliffs  •  Assay  selec9on  

–  Human  target,  binding  assay  –  High  confidence  (score  =  9)  –  Number  of  compounds  between  75  &  300  –  Only  consider  non-­‐NA  ac9vity  values  –  Censored  data  is  considered  the  same  as  exact  data  –  31  assays  

•  We  iden9fy  datasets  with  ac9vity  cliffs  by  the  skewness  of  the  dependent  variable  

Page 43: Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes

SALI  in  Bulk  

•  Used  pIC50’s  and  CDK  hashed  fingerprints  •  Lots  of  material  to  explore  here  

Page 44: Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes

SALI  in  Bulk  

•  But  fingerprint  quality  is  important  •  Assay  379744  has  83  cliffs  of  infinite  height  since  83  pairs  of  molecules  have  Tc  =  1.0  

•  Probably  should  simply  ignore  such  “iden9cal”  molecules  

Page 45: Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes

Conclusions  

•  SALI  is  the  first  step  in  characterizing  the  SAR  landscape  

•  Allows  us  to  directly  analyze  the  landscape,  as  opposed  to  individual  molecules  

•  Being  able  to  predict  the  landscape  could  serve  as  a  useful  way  to  extend  an  SAR    landscape  

Page 46: Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes

Acknowledgements  

•  John  Van  Drie  •  Gerry  Maggiora  

•  Mic  Lajiness  

•  Jurgen  Bajorath  •  Ajit  Jadhav  •  Trung  Ngyuen  

Page 47: Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes
Page 48: Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes

ER-­‐β  Dataset  

•  107  molecules,  censored  data  taken  as  exact  

•  A  few  big  cliffs  •  The  best  linear  model  performs    decently  

pIC50

Frequency

-4.0 -3.5 -3.0 -2.5 -2.0 -1.5 -1.0 -0.5

05

10

15

20

25

Observed pIC50

Pre

dic

ted

pIC

50

-4

-3

-2

-1

-4 -3 -2 -1

Page 49: Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes

SALI  Curves  from  DRCs  

•  No  difference  in  major  cliffs  •  Some  of  the  minor  cliffs  are  highlighted  using  the  DRC  instead  of  IC50  

Page 50: Structure-Activity Relationships and Networks: A Generalized Approachto Exploring Structure-Activity Landscapes

Clustering  in  the  Holloway  Dataset  17

14

23

25

26

18

9

27

16

19

1

10

32

6

29

8

33

12

30

11

4

22

5

28

2

7

13

3

31

24

20

15

21

0.5

0.6

0.7

0.8

0.9

1.0

hclust (*, "complete")

Height