gdb and the chemical spacebigchem.eu/sites/default/files/school2_reymond.pdf1 gdb and the chemical...
TRANSCRIPT
1
GDB and the Chemical Space
The Chemical Space Project J.-L. Reymond, Acc. Chem. Res. 2015, 48, 722-730
1. GDB-17 and FDB-17
2. Visualization
3. Virtual Screening
4. Extension to larger molecules
Jean-Louis Reymond
19 April 2017, BigChem BCN Meeting
gdb.unibe.ch
2
Molecules
166,443,860,262
Hydrocarbons
5,422,153
Graphs
114,304,569,097
Skeletons
1,330,958,530
GDB-17
Rod Sphere
Disc
T. Fink et al. Angew. Chem. Int. Ed. 2005, 44, 1504-1508,
J. Chem. Inf. Model. 2007, 47, 342-353 (GDB-11)
L. C. Blum, J.-L. Reymond, J. Am. Chem. Soc. 2009, 131, 8732-3 (GDB-13);
L. Ruddigkeit et al., J. Chem. Inf. Model. 2012, 52, 2864-2875 (GDB-17)
Ueber die analytischen Figuren, welche in der Mathematik
Bäume genannt werden und ihre Anwendung auf die Theorie
chemischer Verbindungen. E. Cayley, Chem. Ber. 1875, 8, 1056-
1059. Practical Graph Isomorphism. B. D. McKay, Congressus
Numerantium 1981, 30, 45-87.
ChEMBL-17
Fragment Database FDB-17
3Fragment Database FDB-17. R. Visini, M. Awale, J.-L. Reymond., J. Chem. Inf. Model. 2017, doi:10.1021/acs.jcim.7b00020
GDB-17: Molecules
166,443,860,262
Hydrocarbons
5,422,153
2) Unsaturations
Graphs
114,304,569,097
1) Ring strain
Skeletons
1,330,958,530
3) Heteroatoms
4.6 G Subset: Fragments
4,606,078,406
4) Limit complexity
5) Even sampling
FDB-17
10,101,204
Limit Complexity
4
Table 1. Filtering criteria to reduce GDB-17 to its 4.6G fragment subset.
Scaffolds FG Density Problematic/Superfluous FGs
3 rings 5 Nitrogen + Oxygen atoms No aldehydes No aromatic ring > 6 atoms
2 small (3- or 4-
membered) rings
1 positive charge at neutral pH No epoxides, aziridines 1 CN (cyanide)
2 quaternary centers 1 negative charge at neutral pH No O-(C=O)-O (carbonate) No non-aromatic C=C
4 stereocenters 3 H-bond acceptor atoms No O-C=N (imidate) No CC (triple bonds)
3 rotatable bonds 2 H-bond donor atoms No NO2 (nitro) No halogens
Even Sampling
5
a) b)
c) d)
1000
10000
100000
1000000
10000000
100000000
1000000000
0 50 100 150
Num
be
r o
f co
mp
ound
s
Bin by size
1000000
10000000
100000000
1000000000
10000000000
0 1 2 3 4
Num
be
r o
f co
mp
ound
s
Stereocenters
1000000
10000000
100000000
1000000000
10000000000
≤1 2 3 4 ≥5
Num
be
r o
f co
mp
ound
s
Heteroatoms
1000000
10000000
100000000
1000000000
10000000000
≤11 12 13 14 15 16 17
Num
be
r o
f co
mp
ound
s
Heavy atom count
4.6G fragment subset
FDB-17
value triplets (HAC, heteroatoms, stereocenters)
Property Profiles
6
0
10
20
30
40
50
60
0 1 2 3 4 5
% o
f d
ata
se
t
Hydrogen bond donor
0
10
20
30
40
50
60
70
6 10 14
% o
f d
ata
se
t
Heavy atom count
0
5
10
15
20
25
30
35
40
45
80 180 280
% o
f d
ata
se
t
Mass in dalton
0
1
2
3
4
5
-4 -3 -2 -1 0 1 2 3 4 5
% o
f d
ata
se
t
Clog P
0
10
20
30
40
50
60
0 1 2 3 4 5 6
% o
f d
ata
se
t
Ring count
0
20
40
60
80
0 2 4 6 8
% o
f d
ata
se
t
Stereo centers
0
10
20
30
40
50
0 2 4 6 8 10
% o
f d
ata
se
t
O+N count
0
5
10
15
20
25
30
35
0.0 0.2 0.4 0.6 0.8 1.0
% o
f d
ata
se
t
fsp3 ratio
0
5
10
15
20
25
30
35
0 1 2 3 4 5 6 7
% o
f d
ata
se
t
Rotable bond count
a) b) c)
e)
d)
g)f)
h) j)i) k)
0
10
20
30
40
50
60
70
he
tero
aro
ma
tic
aro
ma
tic
he
tero
cyclic
ca
rbo
cyclic
acyclic
% o
f D
ata
se
t
GDB-17
4.6G fragment subset
FDB-17
commercial fragments
0
10
20
30
40
50
60
70
0 1 2 3 4 5 6 7 8
% o
f d
ata
se
t
Hydrogen bond acceptor
Nearest Neighbor Searches
7
fencamfamine rimantadine levetiracetamgabapentin
a)
b)
c)
d)
0
500
1000
1500
2000
2500
3000
3500
4000
0.5 1 1.5 2
num
ber
of com
pounds
ROCS score
GDB-17 MQN
4.6G frag MQN
FDB-17 MQN
FDB-17 Xfp
FDB-17 Random
Commercial
0
100
200
300
400
500
600
700
800
900
0 0.2 0.4 0.6 0.8 1
num
ber
of com
pounds
TSfp
0
500
1000
1500
2000
2500
3000
3500
4000
4500
0.5 1 1.5 2
num
ber
of com
pounds
ROCS score
0
10
20
30
40
50
60
70
80
90
0 0.2 0.4 0.6 0.8 1
num
ber
of com
pounds
TSfp
0
500
1000
1500
2000
2500
3000
3500
4000
0.5 1 1.5 2
num
ber
of com
pounds
ROCS score
0
100
200
300
400
500
600
700
0 0.2 0.4 0.6 0.8 1
num
ber
of com
pounds
TSfp
0
500
1000
1500
2000
2500
3000
3500
4000
4500
0.5 1 1.5 2
num
ber
of com
pounds
ROCS score
0
10
20
30
40
50
60
70
0 0.2 0.4 0.6 0.8 1
num
ber
of com
pounds
TSfp
Understanding Molecular Diversity
88
1. GDB-17 and FDB-17
2. Visualization
3. Virtual Screening
4. Extension to larger molecules
9
Property 1
Property 2
Property 3
Property 4
Property 5
Property 6
PCA
PC1
PC2
n-dimensional Space
search
2D (or 3D) projection
Visualise
10
Molecular Quantum Numbers
Polar groupsH-Bond donor atoms 3 1H-Bond donor sites 3 1H-Bond acceptor atoms 3 4H-Bond acceptor sites 3 7Positive charges 1 0Negative charges 0 1
BondsAcyclic single bonds 3 8Acyclic double bonds 0 3Acyclic triple bonds 0 0Cyclic single bonds 18 11Cyclic double bonds 4 3Cyclic triple bonds 0 0Rotatable bonds 0 4
TopologyAcyclic monovalent nodes 3 6Acyclic divalent nodes 0 2Acyclic trivalent nodes 0 2Acyclic tetravalent nodes 0 0Cyclic divalent nodes 8 6Cyclic trivalent nodes 9 6Cyclic tetravalent nodes 1 13-Membered rings 0 04-Membered rings 0 15-Membered rings 1 16-Membered rings 4 17-Membered rings 0 08-Membered rings 0 09-Membered rings 0 0 10 membered rings 0 0Atoms shared by fused rings 7 2Bonds shared by fused rings 6 1
AtomsCarbon 17 16Fluorine 0 0Chlorine 0 0Bromine 0 0Iodine 0 0Sulphur 0 1Phosphor 0 0Acyclic nitrogen 0 1Cyclic nitrogen 1 1Acyclic oxygen 2 4Cyclic oxygen 1 0Heavy atom count 21 23
HO
O
HO
NMe
H
H
Morphine
N
S
CO2H
HN
O
O
Penicillin G
R. van Deursen et al. J. Chem. Inf. Model. 2010, 50, 1924-1934
GDB-17
11Visualization and Virtual Screening of the Chemical Universe Database GDB-17. L. Ruddigkeit, L. C. Blum, J.-L. Reymond, J.
Chem. Inf. Model. 2013, 53, 56-65.
PubChem Ro5 + Oligomers
12
Oligosaccharides
DNA
Diamondoids
GraphenesSmall Molecules
Acyclic Alkanes
Carbon Fraction
(Polarity)
Peptides
13
Oligosaccharides
Diamondoids
Graphenes
Small Molecules
Acyclic Alkanes
Fraction Cyclic Atoms
(Rigidity)
DNA
Peptides
Ro3
Ro5
14
> N-dimensional fingerprints:
– APfp: 20 counts for atom pairs (shape)
– MQN: 42 molecular quantum numbers
– SMIfp: 34 counts for SMILES characters
– Xfp: 55 counts for atom pairs with properties (pharmacophore)
– Sfp: 1024-bit binary substructure fp
– ECfp4: 1024-bit binary extended connectivity fp
> Tools at gdb.unibe.ch
– Web-browsers for DrugBank, ChEMBL, ZINC, PubChem, GDB-11, GDB-13,
GDB-17
– Mapplets (downloadable Java applications, ca. 100 Mb)
– Similarity Mapplets (tailored mapplets by e-mail)
– WebDrugCS (3D-viewer, platform independent, DrugBank)
– WebMolCS (3D-viewer, platform independent, up to 5000 molecules from
user)
Visualisation and subsets of the chemical universe database GDB-13 for virtual screening. L. C. Blum, R. van Deursen, J.-
L. Reymond, J. Comput. Aided Mol. Des. 2011, 25, 637-647
A multi-fingerprint browser for the ZINC database. M. Awale, J.-L. Reymond, Nucleic Acids Res. 2014, 42, 234-239
The MQN-Mapplet: Visualization of Chemical Space with Interactive Maps of DrugBank, ChEMBL, PubChem, GDB-11 and
GDB-13. M. Awale, R. van Deursen, J. L. Reymond, J. Chem. Inf. Model. 2013, 53, 509-518
Similarity Mapplet: Interactive Visualization of the Directory of Useful Decoys and ChEMBL in High Dimensional Chemical
Spaces. Awale M, Reymond JL, J. Chem. Inf. Model., 2015, 55, 1509-1516. 14
15
MQN-Mapplet
The MQN-Mapplet: Visualization of Chemical Space with Interactive Maps of DrugBank, ChEMBL, PubChem, GDB-11 and
GDB-13. M. Awale, R. van Deursen, J. L. Reymond, J. Chem. Inf. Model. 2013, 53, 509-518
16
a) b)
c) d)
Similarity Maps
17Similarity Mapplet: Interactive Visualization of the Directory of Useful Decoys and ChEMBL in High Dimensional
Chemical Spaces. Awale M, Reymond JL, J. Chem. Inf. Model., 2015, 55, 1509-1516
Similarity Maps
18Similarity Mapplet: Interactive Visualization of the Directory of Useful Decoys and ChEMBL in High Dimensional
Chemical Spaces. Awale M, Reymond JL, J. Chem. Inf. Model., 2015, 55, 1509-1516
18
Adenosine A2a receptor - aa2ar (Sfp: 78 %, 7 %)
dis
tan
ce
co
rre
lati
on
co
eff
.
0.4
0.5
0.6
0.7
0.8
0.9
1
1 2 3 4 5 6 7 8 9 10 11 12
cu
mu
lati
ve
va
ria
nc
e
PC
0
20
40
60
80
100
1 2 3 4 5 6 7 8 9 10 11 12
a) b)
PC
0.4
0.5
0.6
0.7
0.8
0.9
1
1 2 3 4 5 6 7 8 9101112
MQN
SMIfp
APfp
Xfp
Sfp
c)
0
20
40
60
80
100M
…
A…
S…
% of DB covered by single occupancybins in original fingerprint-space
% of DB covered by single occupancypoints in 3D-space
% of DB covered by single occupancypixels in 2D-space
% o
f d
ata
ba
se
(D
B)
0
20
40
60
80
100
MQN SMIfp APfp Xfp Sfp
Ori
3D
2D
DrugBank in 3D Chemical Spaces
Web-based 3D-visualization of the DrugBank chemical space Awale M and Reymond JL, J. Cheminform., 2016,
doi:10.1186/s13321-016-0138-2.
a b
c dc
WebMolCS
21WebMolCS: a Web-Based Interface for Visualizing Molecules in 3D Chemical Spaces. Awale M, Reymond JL, J.
Chem. Inf. Model., 2017, doi:10.1021/acs.jcim.6b00690
Does this work?
22
1. GDB-17 and FDB-17
2. Visualization
3. Virtual Screening
4. Extension to larger molecules
23
GDB-17
166.4 G
xLOS
3D pharmacophore + shape
Sorted VS Hit list
Reference
Ligand(s)
MQN-neighbors
Xfp-neighbors
1.0 G
20 M
Docking
Synthesis/Testing
3D-Shape and pharmacophore (xLOS)
24
𝐱𝐋𝐎𝐒 𝐬𝐜𝐨𝐫𝐞 = 𝒆−𝒅𝒊𝒋
𝟐𝒏𝒅𝒊=𝟏
𝒎𝒅𝒋=𝟏
𝒎𝒅 + 𝒏𝒅
+ 𝒆−𝒅𝒊𝒋
𝟐𝒏𝒂𝒊=𝟏
𝒎𝒂𝒋=𝟏
𝒎𝒂 + 𝒏𝒂
+2 × 𝒆−𝒅𝒊𝒋
𝟐𝒏𝒉𝒊=𝟏
𝒎𝒉𝒋=𝟏
𝒎𝒉 + 𝒏𝒉
0.00
0.20
0.40
0.60
0.80
1.00
0 0.5 1 1.5 2 2.5 3
ov
erl
ap
sco
re (
e(-
dij2))
Atom-Atom Distance dij, Å
dij
Atom i
Atom j
Seed (n atoms)
Query (m atoms)
Optimization of TRPV6 Calcium Channel Inhibitors Using a 3D Ligand-Based Virtual Screening Method. C Simonin,
M Awale et al., Angew. Chem., Int. Ed. Engl. 2015, 54, 14748-14752.
0
200
400
600
800
1000
1200
1400
1600
top scoring 1000 cpdsfrom 5 referencemolecules
MW
no
. o
f cp
ds
1,3,5
4
2
p2 (y)
p3 (z)
p1 (x)
25
xLOS
TS
fp
no
. o
f c
pd
s
0
50000
100000
150000
200000
250000
300000
350000
0.00
0.20
0.40
0.60
0.80
1.00
0.00 0.40 0.80 1.20 1.60
xLOS vs Sfp for tested cpdsxLOS vs Sfp for hit cpdsxLOS score histogram
9
10
Table S1: In vitro pharmacology profile of cis-22a on several ion channels.a
Channel % Inh. @ 10 µM
TRP channels
TRPV1 (agonist effect) (h) -15.5 ± 0.7c
TRPV1 (antagonist effect) (h) 29.6 ± 8.6
TRPV3 (antagonist effect) (h) -11.5 ± 2.0
TRPV5 (antagonist effect) (r) 79.9 ± 1.4 (IC50 = 2.4 M)
TRPM8 (agonist effect) (h) -7.4 ± 1.4c
TRPM8 (antagonist effect) (h) 21.2 ± 7.5
Voltage-gated ion channels
Ca2+
channel L-type, dihydropyridine (r) -1.6 ± 20.5
Ca2+
channel L-type, diltiazem (r) 18.4 ± 13.0
Ca2+
channel L-type, verapamil (r) 31.9 ± 3.0
Ca2+
channels, N-type (r) 4.4 ± 8.1
K+ channel [Kv] (r) -15.9 ± 3.8
hERG (h) 82.3 ± 4.0d
Na+ channels, site 2 (r) 52.1 ± 4.2
Ligand-gated ion channels
5-HT3 (h) -1.3 ± 5.9
GABA, central benzodiazepine (r)b -10.7 ± 8.7
Glutamate, NMDA (r) 8.2 ± 1.5
Store-operated Ca2+
channels -10.1 ± 6.0
GPCR
Adrenergic α1A (h) 94 ± 5
Adrenergic α2A (h) 31.2 ± 4.9
Adrenergic β1 (h)b 14.3 ± 3.1
Adrenergic β2 (h)b 11.5 ± 0.6
Cannabinoid CB1 (h ) b -19.2 ± 4.2
Cannabinoid CB2 (h) b
-1.2 ± 7.5
Dopamine D1 (h) 63.9 ± 0.2
Dopamine D2S (h) b 90.3 ± 0.6
Dopamine D3 (h) 50.4 ± 2.1
Dopamine D4.4 (h) 93.8 ± 0.5
Muscarinic M1 (h) 74.3 ± 2.7
Muscarinic M2 (h) 87.2 ± 2.7
Opiate µ (h) b 98.1 ± 0.6
Serotonine 5-HT1A (h) 88.5 ± 4.5
Serotonine 5-HT2A (h) 56.2 ± 2.8
Serotonine 5-HT1B (r) 28.4 ± 4
Serotonine 5-HT2B (h) 23.0 ± 4.0
T47D
LNCaP
SKOV3
HEK u
ntran
sfec
ted
HEK
cl.1
5
0.0
0.5
1.0
1.5
TRPV6 mRNA
fold
mR
NA
ex
pre
ss
ion
(re
l. to
T4
7D
)
ND 0.000271
70
80
90
100
110
5000400030002000100050030010050
% g
row
th
conc. (nM)
cis-22a
T47D
SKOV3
70
80
90
100
110
5000400030002000100050010050
% g
row
th
conc. (nM)
trans-22a
T47DSKOV3
Optimization of TRPV6 Calcium Channel Inhibitors Using a 3D Ligand-Based Virtual Screening Method. C Simonin,
M Awale et al., Angew. Chem., Int. Ed. Engl. 2015, 54, 14748-14752.
26
HRD rotated
K162
E181
Q185-R255
D274-R255
A213
F275R137
R220
HRD
K162
E181
D274-H254
A213
F275
Q185-W277
L139
L263
V147
K162
G142
A160
R195
L194L196
A273
L210
F275
A213
R220
L139
G216
A213
L263
R137
V147
K162G142
A160
F275
R195
L194L196
A273
L210
Y212
Kinase
Inhibitors
1) xLOS
2) Biochemical screen
3) Structure-based optimization
Aurora A
IC50 = 2 nM
Discovery of a Selective Aurora A Kinase Inhibitor by Virtual Screening. Falco Kilchmann et al., J. Med. Chem., 2016,
doi:10.1021/acs.jmedchem.6b00709
27
KD (nM)
- 9 795 ± 129
+ 9 -
DNA Aurora A TPX2 Merge
Con
tro
l9
(10
µM
)
1
(0.2
5µ
M)
5 μm
9 (µM) 9 rac-92
28
New Molecules
29
1. GDB-17 and FDB-17
2. Visualization
3. Virtual Screening
4. Extension to larger molecules
rod sphere
disk
www.cheminfo.org/pdbexplorer
30Jin X, Awale M, Zasso M, Kostro D, Patiny L, Reymond JL: PDB-Explorer: a web-based interactive map of the protein data
bank in shape space. BMC Bioinformatics 2015, 16, 339.
31
gdb.unibe.ch
Xian Jin
Mahendra
Awale
Ivan Dibonaventura
Tamis
Darbre Céline Simonin
Bee-Ha
Gan
Ricardo Visini
Runze He
Daniel Probst
Josep Arus
Luc Patiny, EPFL
Daniel Bertrand, HiQscreen
Matthias Hediger, UniBE