molecular path for ligand search
TRANSCRIPT
Molecular path for ligand search
Tao Lu, Yuan Yuan Qiao *, Pan Wen Shen
College of Chemistry, Nankai University, Tianjin 300071, China
Received 28 February 2011
Abstract
A ligand is a small molecule bind to several residues of a receptor. We adapt the concept of molecular path for effective ligand
search with its contacting residues. Additionally, we allow wild type definitions on atoms and bonds of molecular paths for fuzzy
algorithms on structural match. We choose hydrogen bond interactions to characterize the binding mode of a ligand by several
proper molecular paths and use them to query the deposited ligands in PDBe that interact with their residues in the same way.
Expression of molecular path and format of database entries are described with examples. Our molecular path provides a new
approach to explore the ligand–receptor interactions and to provide structural framework reference on new ligand design.
# 2011 Yuan Yuan Qiao. Published by Elsevier B.V. on behalf of Chinese Chemical Society. All rights reserved.
Keywords: Ligand; Residue; Path; Database
Substructure handling is fundamental for chemoinformatics research, including structural homomorphism
analysis, maximal common substructure match, similarity or diversity comparison, fragmentation or fingerprints for
feature screening, and structure-activity relationship modeling, etc. In 1997, Xiao et al. reported their work using
multilayer code for global and substructure search [1]. In 2007, Siegel and Vieth demonstrated their look at fragments
as drugs containing other drugs (DCOD) and drugs in other drugs (DIOD) [2]. Atom-pair is another simple method to
connect the structural features of ligands, usually a pair of atoms or functional groups, to their chemical and
biological properties [3]. Derived from this, a series of concepts of druglikeness, nondruglikeness, leadlikeness and
metabolite-likeness [4] drive the exploration of chemical space by Lipinski’s rule-of-five or other approaches [5].
Besides, fingerprint is a hologram or composed of various indicators extracted from many aspects of the structures, or
just bit strings representing status of predefined fragments [6,7]. All these methods have been widely used to screen
databases. But, the queries for screening are based on information from ligands only. There is no information from the
receptor side involved at all. Here, we put forward our method on molecular path which describes structural features
of a ligand by a path composed of atoms and bonds with its interacting residues from receptor. Once a molecular path
is defined, it would be a quick and effective step to pick out deposited ligands that interact with the residues in the
same way from a database. We shall describe our concept and algorithm on molecular path, as well as the format of
our database, followed by illustration of examples.
www.elsevier.com/locate/cclet
Available online at www.sciencedirect.com
Chinese Chemical Letters 22 (2011) 1130–1134
* Corresponding author.
E-mail address: [email protected] (Y.Y. Qiao).
1001-8417/$ – see front matter # 2011 Yuan Yuan Qiao. Published by Elsevier B.V. on behalf of Chinese Chemical Society. All rights reserved.
doi:10.1016/j.cclet.2011.05.001
1. Methods
1.1. Concept
Take the complex of INNO-406 (NS-187, bafetinib) [8,9] combined to c-Abl kinase domain (PDB ID = 2E2B,
Ligand ID = 406) as an example for understanding the concept of molecular path. INNO-406 is an analogue of STI-
571 (imatinib, Gleevec) but exhibits a 25–55-fold increase over it in in vitro activity against BCR-ABL [10]. For
instance, based on the four hydrogen bonds between INNO-406 and BCR-ABL, it is easy to define hydrogen bond
oriented molecular paths as Fig. 1. Types of atoms and bonds in molecular path could follow the format of MDL MOL
[11] supported by various commercial and free or open source chemical structure editors, such as ChemDraw [12] and
JME [13]. For example, ‘A’ means any atom and ‘Any’ means any type of bond. The molecular paths in Fig. 1 can be
drawn in wild type as Fig. 2.
1.2. Algorithm
A molecular path is a set of atoms connected by bonds in a molecule. As illustrated in Table 1(A), a path ceer 4 with
a connectivity of 2 denoted as ‘N 4 20, which has two unvisited neighbors, ‘C 3 30 and ‘C 5 20. Suppose it takes ‘C 3 30 as
the next, the following atoms will be ‘C 2 20, ‘C 1 20, ‘C 6 20 and ‘C 5 20. As there is no unvisited neighbors for ‘C 5 20,this section ends here and the path has to backtrack to the visited atoms to find an unvisited neighbor until all the atoms
are visited. In this way, ‘C 3 30 is selected as the start for a new section and it goes to ‘C 7 10. So far, all the atoms are
visited but only one bond connects ‘N 4 20and ‘C 5 20 is left unvisited and it must be the last section for this molecule. In
format of molecular paths, the count of atoms in the molecule is given at the beginning, followed by a set of sections
quoted in a pair of ‘#’. For each section, its composing bonds are listed and delimited by a sign of ‘@’; for each bond,
the two connecting atoms are denoted as above, followed by the bond value, such as ‘N 4 2 C 3 3 10. Besides, an extra
Boolean value for each bond remarks status of a section: ‘10 when a section ends here and ‘00 when it continues.
Conventionally, all hydrogen atoms are omitted. For the structure of Table 1(B), there are two sections for this 4-atom
3-bond molecule.
Molecular path provides ‘local’ connectivity information which helps to locate matching blocks from two
molecules effectively. For instance, the first bond of ‘N 2 1 C 1 3 10 from Table 1(B) has two possible matches in
Table 1(A): ‘N 4 2 C 3 3 10 and ‘N 4 2 C 5 2 10. As the connectivity of ‘C 5 20 in Table 1(A) is lower than that of ‘C 1 30 in
T. Lu et al. / Chinese Chemical Letters 22 (2011) 1130–1134 1131
[(Fig._1)TD$FIG]
Fig. 1. Molecular paths based on hydrogen bonds between INNO-406 and its receptor (PDB ID = 2E2B). Dashed lines stand for hydrogen bonds and
thick lines for selected bonds of the molecular paths.
[(Fig._2)TD$FIG]
Fig. 2. Wild and definite type of molecular path.
Table 1(B), the only matching atom left is ‘C 3 30 in Table 1(A). Step by step, the matching pairs will eventually be ‘N 4
20–‘C 3 30–‘C 2 20–‘C 7 10 of Table 1(A) and ‘N 2 10–‘C 1 30–‘C 3 10–‘C 4 10 of Table 1(B).
In order to match the two paths from Table 1(A) and Table 1(B), we adopted the algorithm of Generic Matching
Algorithm (GMA) [14] to compare their topological connections using backtracking approach. For ‘N 4 20 and ‘N 2 10,as they are identical in atom types, and the connectivity of the former is which is greater than that of the later, they are
treated as identical. The next are ‘C 3 30 and ‘C 1 30. Similarly, they have the same number of connectivity and treated
as identical as well. Then, ‘C 2 20 has greater connectivity than ‘C 3 10, which tells that the later should be a
substructure of the former. The comparison between ‘C 7 10 and ‘C 4 10 does not change the result of the matching
procedure. Therefore, Table 1(B) is a substructure of Table 1(A). In these steps, the sequence of the atoms are random
selected, and the molecular path do not need to be formalized to be a unique expression.
1.3. Database
In order to involve the contacting residues, we use database to combine the sections of paths and the residues in
terms of tables. We built a database of two tables using SQLite [15]. The first is ‘‘Ligand Table’’, where molecular
paths are coded from the original MDL MOL format of ligands in PDBe [16]. The second is ‘‘Interaction Table’’
(Table 2) derived from PDBe as well; for each interaction between a ligand and its receptor, its binding type,
contacting atom of ligand and its neighbors, contacting atom of the corresponding residues are listed in the
columns.
For molecular paths prompted to query our database, we shall see if the types of atom and residues, as well as the
interaction of the query are identical to the paths in the database when the query is explicitly defined, or they are just a
subset when the query contains wild types. The GMA procedure can handle these comparisons standalone or in an
iterative mode when screening database.
T. Lu et al. / Chinese Chemical Letters 22 (2011) 1130–11341132
Table 1
Expressions for molecular path.
Structure Atoms Sections Match pairs
(A) [TD$INLINE] N
(A)
N 4 2
C 3 3 7#
C 2 2 N 4 2 C 3 3 1 0@
C 1 2 C 3 3 C 2 2 1 0@ N 4 2
C 6 2 C 2 2 C 1 2 1 0@ C 3 3
C 5 2 C 1 2 C 6 2 1 0@ C 2 2
C 3 3 C 6 2 C 5 2 1 1@ C 7 1
C 7 1 C 3 3 C 7 1 1 1@
N 4 2 N 4 2 C 5 2 1 1#
C 5 2
(B) [TD$INLINE]
N
(B)
N 2 1
C 1 3 4# N 2 1
C 3 1 N 2 1 C 1 3 1 0@ C 1 3
C 1 3 C 1 3 C 3 1 1 1@ C 3 1
C 4 1 C 1 3 C 4 1 1 1# C 4 1
Table 2
Interaction table.
ID Type Ligand Receptor
Atom Atom ID Neighbors Atom Atom ID Residue Residue ID
1029 H-Bond N 22 C, C N N MET 318
1029 H-Bond N 32 C, C O OG1 THR 315
1029 H-Bond N 40 C, C O OE2 GLU 286
1029 H-Bond O 42 C N N ASP 381
2. Results and discussion
In order to demonstrate the functions of molecular path, we choose INNO-406 and its hydrogen bond interactions as
a detailed example to illustrate our development approach of its molecular paths. We listed a group of molecular paths
as queries in Table 3. As more atoms or bonds are explicitly defined, fewer hits are returned. It tells that molecular path
is a sensitive manner for ligand searching. The hits obtained by molecular path, as shown in Fig. 3, help us to
investigate the binding pairs quickly by listing all the available ligands interact with the residues in the database.
Besides INNO-406, we also tried other ligands using various molecular paths. For example, in the complex of
1A9 M, there are five hydrogen bond interactions among the ligand of U0E and the residues. We tried to search our
database using several molecular paths, but found two hits (Q50 and DKT) only, except U0E itself [17].
3. Conclusion
Our approach based on molecular path has advantages over many other structural searching and matching methods.
It allows us to connect various structural fragments by atoms reserved for contacting residues. In this way, it gives us a
direct view on ligands and their binding modes. We shall improve it by offering useful suggestions on fragment
selection and residue neighbor selection. Prospectively, our molecular path will be an effective tool to design new
ligands.
T. Lu et al. / Chinese Chemical Letters 22 (2011) 1130–1134 1133
Table 3
Molecular paths for INNO-406 in hydrogen bond interaction and hits report.
No. Molecular path (residues are not shown) Ligand ID of hits
1
[TD$INLINE]
AQZ, CA5, COT, FR8, FR9, PRC, STI, WBT
2
[TD$INLINE]
MPZ, PRC, STI
3
[TD$INLINE]
PRC
[(Fig._3)TD$FIG]
N
N
NH
NNH
NN
O
STI
F
HN
N N
OO
WBT
N NH2
N
HNN
NHO
O
O
H
FR8
N
N
N
NH
NH
N
NO
O
O
O
AQZ
N
N
N
NH
NNH
O
PRC
NH2
HN
N
HNN
NHO
O
O
H
FR9
Fig. 3. Some of ligands hit by molecular paths for INNO-406.
References
[1] Y.D. Xiao, Y.Y. Qiao, J.P. Zhang, et al. J. Chem. Inf. Comput. Sci. 37 (1997) 701.
[2] M.G. Siegel, M. Vieth, Drug Discov. Today 12 (2007) 71.
[3] R.E. Carhart, D.H. Smith, R. Venkataraghavan, J. Chem. Inf. Comput. Sci. 25 (1985) 64.
[4] P.D. Dobson, Y. Patela, D.B. Kella, Drug Discov. Today 14 (2009) 31.
[5] C. Lipinski, A. Hopkins, Nature 432 (2004) 855.
[6] J. Hert, P. Willett, D.J. Wilton, et al. J. Chem. Inf. Comput. Sci. 44 (2004) 1177.
[7] P. Willett, Drug Discov. Today 11 (2006) 1046.
[8] T. Asaki, Y. Sugiyama, T. Hamamoto, et al. Bioorg. Med. Chem. Lett. 16 (2006) 1421.
[9] T. Horio, T. Hamasaki, T. Inoue, et al. Bioorg. Med. Chem. Lett. 17 (2007) 2712.
[10] U. Rix, L.L. Remsing, A.S. Terker, et al. Leukemia 24 (2010) 44.
[11] Symyx Solutions, Inc. CTFile formats, http://www.symyx.com/downloads/index.jsp. February 2011 accessed.
[12] P. Ertl, JME Molecular Editor Applet, http://www.molinspiration.com/jme/. February 2011 accessed.
[13] CambridgeSoft Corporation, ChemDraw, http://www.cambridgesoft.com/software/ChemDraw/. February 2011 accessed.
[14] J. Xu, J. Chem. Inf. Comput. Sci. 36 (1996) 25.
[15] D. Richard Hipp, SQLite Professional Support, http://www.sqlite.org/. February 2011 accessed.
[16] European Bioinformatics Institute, Protein Data Bank in Europe (PDBe), http://www.ebi.ac.uk/pdbe/. February 2011 accessed.
[17] L. Hong, X.J. Zhang, S. Foundling, et al. FEBS Lett. 420 (1997) 11.
T. Lu et al. / Chinese Chemical Letters 22 (2011) 1130–11341134