molecular path for ligand search

Molecular path for ligand search

Tao Lu, Yuan Yuan Qiao *, Pan Wen Shen

College of Chemistry, Nankai University, Tianjin 300071, China

Received 28 February 2011

Abstract

A ligand is a small molecule bind to several residues of a receptor. We adapt the concept of molecular path for effective ligand

search with its contacting residues. Additionally, we allow wild type definitions on atoms and bonds of molecular paths for fuzzy

algorithms on structural match. We choose hydrogen bond interactions to characterize the binding mode of a ligand by several

proper molecular paths and use them to query the deposited ligands in PDBe that interact with their residues in the same way.

Expression of molecular path and format of database entries are described with examples. Our molecular path provides a new

approach to explore the ligand–receptor interactions and to provide structural framework reference on new ligand design.

# 2011 Yuan Yuan Qiao. Published by Elsevier B.V. on behalf of Chinese Chemical Society. All rights reserved.

Keywords: Ligand; Residue; Path; Database

Substructure handling is fundamental for chemoinformatics research, including structural homomorphism

analysis, maximal common substructure match, similarity or diversity comparison, fragmentation or fingerprints for

feature screening, and structure-activity relationship modeling, etc. In 1997, Xiao et al. reported their work using

multilayer code for global and substructure search [1]. In 2007, Siegel and Vieth demonstrated their look at fragments

as drugs containing other drugs (DCOD) and drugs in other drugs (DIOD) [2]. Atom-pair is another simple method to

connect the structural features of ligands, usually a pair of atoms or functional groups, to their chemical and

biological properties [3]. Derived from this, a series of concepts of druglikeness, nondruglikeness, leadlikeness and

metabolite-likeness [4] drive the exploration of chemical space by Lipinski’s rule-of-five or other approaches [5].

Besides, fingerprint is a hologram or composed of various indicators extracted from many aspects of the structures, or

just bit strings representing status of predefined fragments [6,7]. All these methods have been widely used to screen

databases. But, the queries for screening are based on information from ligands only. There is no information from the

receptor side involved at all. Here, we put forward our method on molecular path which describes structural features

of a ligand by a path composed of atoms and bonds with its interacting residues from receptor. Once a molecular path

is defined, it would be a quick and effective step to pick out deposited ligands that interact with the residues in the

same way from a database. We shall describe our concept and algorithm on molecular path, as well as the format of

our database, followed by illustration of examples.

www.elsevier.com/locate/cclet

Available online at www.sciencedirect.com

Chinese Chemical Letters 22 (2011) 1130–1134

* Corresponding author.

E-mail address: [email protected] (Y.Y. Qiao).

1001-8417/$ – see front matter # 2011 Yuan Yuan Qiao. Published by Elsevier B.V. on behalf of Chinese Chemical Society. All rights reserved.

doi:10.1016/j.cclet.2011.05.001

http://dx.doi.org/10.1016/j.cclet.2011.05.001

mailto:[email protected]

http://dx.doi.org/10.1016/j.cclet.2011.05.001

1. Methods

1.1. Concept

Take the complex of INNO-406 (NS-187, bafetinib) [8,9] combined to c-Abl kinase domain (PDB ID = 2E2B,

Ligand ID = 406) as an example for understanding the concept of molecular path. INNO-406 is an analogue of STI-

571 (imatinib, Gleevec) but exhibits a 25–55-fold increase over it in in vitro activity against BCR-ABL [10]. For

instance, based on the four hydrogen bonds between INNO-406 and BCR-ABL, it is easy to define hydrogen bond

oriented molecular paths as Fig. 1. Types of atoms and bonds in molecular path could follow the format of MDL MOL

[11] supported by various commercial and free or open source chemical structure editors, such as ChemDraw [12] and

JME [13]. For example, ‘A’ means any atom and ‘Any’ means any type of bond. The molecular paths in Fig. 1 can be

drawn in wild type as Fig. 2.

1.2. Algorithm

A molecular path is a set of atoms connected by bonds in a molecule. As illustrated in Table 1(A), a path ceer 4 with

a connectivity of 2 denoted as ‘N 4 20, which has two unvisited neighbors, ‘C 3 30 and ‘C 5 20. Suppose it takes ‘C 3 30 as

the next, the following atoms will be ‘C 2 20, ‘C 1 20, ‘C 6 20 and ‘C 5 20. As there is no unvisited neighbors for ‘C 5 20,this section ends here and the path has to backtrack to the visited atoms to find an unvisited neighbor until all the atoms

are visited. In this way, ‘C 3 30 is selected as the start for a new section and it goes to ‘C 7 10. So far, all the atoms are

visited but only one bond connects ‘N 4 20and ‘C 5 20 is left unvisited and it must be the last section for this molecule. In

format of molecular paths, the count of atoms in the molecule is given at the beginning, followed by a set of sections

quoted in a pair of ‘#’. For each section, its composing bonds are listed and delimited by a sign of ‘@’; for each bond,

the two connecting atoms are denoted as above, followed by the bond value, such as ‘N 4 2 C 3 3 10. Besides, an extra

Boolean value for each bond remarks status of a section: ‘10 when a section ends here and ‘00 when it continues.

Conventionally, all hydrogen atoms are omitted. For the structure of Table 1(B), there are two sections for this 4-atom

3-bond molecule.

Molecular path provides ‘local’ connectivity information which helps to locate matching blocks from two

molecules effectively. For instance, the first bond of ‘N 2 1 C 1 3 10 from Table 1(B) has two possible matches in

Table 1(A): ‘N 4 2 C 3 3 10 and ‘N 4 2 C 5 2 10. As the connectivity of ‘C 5 20 in Table 1(A) is lower than that of ‘C 1 30 in

T. Lu et al. / Chinese Chemical Letters 22 (2011) 1130–1134 1131

[(Fig._1)TD$FIG]

Fig. 1. Molecular paths based on hydrogen bonds between INNO-406 and its receptor (PDB ID = 2E2B). Dashed lines stand for hydrogen bonds and

thick lines for selected bonds of the molecular paths.

[(Fig._2)TD$FIG]

Fig. 2. Wild and definite type of molecular path.

Table 1(B), the only matching atom left is ‘C 3 30 in Table 1(A). Step by step, the matching pairs will eventually be ‘N 4

20–‘C 3 30–‘C 2 20–‘C 7 10 of Table 1(A) and ‘N 2 10–‘C 1 30–‘C 3 10–‘C 4 10 of Table 1(B).

In order to match the two paths from Table 1(A) and Table 1(B), we adopted the algorithm of Generic Matching

Algorithm (GMA) [14] to compare their topological connections using backtracking approach. For ‘N 4 20 and ‘N 2 10,as they are identical in atom types, and the connectivity of the former is which is greater than that of the later, they are

treated as identical. The next are ‘C 3 30 and ‘C 1 30. Similarly, they have the same number of connectivity and treated

as identical as well. Then, ‘C 2 20 has greater connectivity than ‘C 3 10, which tells that the later should be a

substructure of the former. The comparison between ‘C 7 10 and ‘C 4 10 does not change the result of the matching

procedure. Therefore, Table 1(B) is a substructure of Table 1(A). In these steps, the sequence of the atoms are random

selected, and the molecular path do not need to be formalized to be a unique expression.

1.3. Database

In order to involve the contacting residues, we use database to combine the sections of paths and the residues in

terms of tables. We built a database of two tables using SQLite [15]. The first is ‘‘Ligand Table’’, where molecular

paths are coded from the original MDL MOL format of ligands in PDBe [16]. The second is ‘‘Interaction Table’’

(Table 2) derived from PDBe as well; for each interaction between a ligand and its receptor, its binding type,

contacting atom of ligand and its neighbors, contacting atom of the corresponding residues are listed in the

columns.

For molecular paths prompted to query our database, we shall see if the types of atom and residues, as well as the

interaction of the query are identical to the paths in the database when the query is explicitly defined, or they are just a

subset when the query contains wild types. The GMA procedure can handle these comparisons standalone or in an

iterative mode when screening database.

T. Lu et al. / Chinese Chemical Letters 22 (2011) 1130–11341132

Table 1

Expressions for molecular path.

Structure Atoms Sections Match pairs

(A) [TD$INLINE] N

(A)

N 4 2

C 3 3 7#

C 2 2 N 4 2 C 3 3 1 0@

C 1 2 C 3 3 C 2 2 1 0@ N 4 2

C 6 2 C 2 2 C 1 2 1 0@ C 3 3

C 5 2 C 1 2 C 6 2 1 0@ C 2 2

C 3 3 C 6 2 C 5 2 1 1@ C 7 1

C 7 1 C 3 3 C 7 1 1 1@

N 4 2 N 4 2 C 5 2 1 1#

C 5 2

(B) [TD$INLINE]

N

(B)

N 2 1

C 1 3 4# N 2 1

C 3 1 N 2 1 C 1 3 1 0@ C 1 3

C 1 3 C 1 3 C 3 1 1 1@ C 3 1

C 4 1 C 1 3 C 4 1 1 1# C 4 1

Table 2

Interaction table.

ID Type Ligand Receptor

Atom Atom ID Neighbors Atom Atom ID Residue Residue ID

1029 H-Bond N 22 C, C N N MET 318

1029 H-Bond N 32 C, C O OG1 THR 315

1029 H-Bond N 40 C, C O OE2 GLU 286

1029 H-Bond O 42 C N N ASP 381

2. Results and discussion

In order to demonstrate the functions of molecular path, we choose INNO-406 and its hydrogen bond interactions as

a detailed example to illustrate our development approach of its molecular paths. We listed a group of molecular paths

as queries in Table 3. As more atoms or bonds are explicitly defined, fewer hits are returned. It tells that molecular path

is a sensitive manner for ligand searching. The hits obtained by molecular path, as shown in Fig. 3, help us to

investigate the binding pairs quickly by listing all the available ligands interact with the residues in the database.

Besides INNO-406, we also tried other ligands using various molecular paths. For example, in the complex of

1A9 M, there are five hydrogen bond interactions among the ligand of U0E and the residues. We tried to search our

database using several molecular paths, but found two hits (Q50 and DKT) only, except U0E itself [17].

3. Conclusion

Our approach based on molecular path has advantages over many other structural searching and matching methods.

It allows us to connect various structural fragments by atoms reserved for contacting residues. In this way, it gives us a

direct view on ligands and their binding modes. We shall improve it by offering useful suggestions on fragment

selection and residue neighbor selection. Prospectively, our molecular path will be an effective tool to design new

ligands.

T. Lu et al. / Chinese Chemical Letters 22 (2011) 1130–1134 1133

Table 3

Molecular paths for INNO-406 in hydrogen bond interaction and hits report.

No. Molecular path (residues are not shown) Ligand ID of hits

1

[TD$INLINE]

AQZ, CA5, COT, FR8, FR9, PRC, STI, WBT

2

[TD$INLINE]

MPZ, PRC, STI

3

[TD$INLINE]

PRC

[(Fig._3)TD$FIG]

N

N

NH

NNH

NN

O

STI

F

HN

N N

OO

WBT

N NH2

N

HNN

NHO

O

O

H

FR8

N

N

N

NH

NH

N

NO

O

O

O

AQZ

N

N

N

NH

NNH

O

PRC

NH2

HN

N

HNN

NHO

O

O

H

FR9

Fig. 3. Some of ligands hit by molecular paths for INNO-406.

References

[1] Y.D. Xiao, Y.Y. Qiao, J.P. Zhang, et al. J. Chem. Inf. Comput. Sci. 37 (1997) 701.

[2] M.G. Siegel, M. Vieth, Drug Discov. Today 12 (2007) 71.

[3] R.E. Carhart, D.H. Smith, R. Venkataraghavan, J. Chem. Inf. Comput. Sci. 25 (1985) 64.

[4] P.D. Dobson, Y. Patela, D.B. Kella, Drug Discov. Today 14 (2009) 31.

[5] C. Lipinski, A. Hopkins, Nature 432 (2004) 855.

[6] J. Hert, P. Willett, D.J. Wilton, et al. J. Chem. Inf. Comput. Sci. 44 (2004) 1177.

[7] P. Willett, Drug Discov. Today 11 (2006) 1046.

[8] T. Asaki, Y. Sugiyama, T. Hamamoto, et al. Bioorg. Med. Chem. Lett. 16 (2006) 1421.

[9] T. Horio, T. Hamasaki, T. Inoue, et al. Bioorg. Med. Chem. Lett. 17 (2007) 2712.

[10] U. Rix, L.L. Remsing, A.S. Terker, et al. Leukemia 24 (2010) 44.

[11] Symyx Solutions, Inc. CTFile formats, http://www.symyx.com/downloads/index.jsp. February 2011 accessed.

[12] P. Ertl, JME Molecular Editor Applet, http://www.molinspiration.com/jme/. February 2011 accessed.

[13] CambridgeSoft Corporation, ChemDraw, http://www.cambridgesoft.com/software/ChemDraw/. February 2011 accessed.

[14] J. Xu, J. Chem. Inf. Comput. Sci. 36 (1996) 25.

[15] D. Richard Hipp, SQLite Professional Support, http://www.sqlite.org/. February 2011 accessed.

[16] European Bioinformatics Institute, Protein Data Bank in Europe (PDBe), http://www.ebi.ac.uk/pdbe/. February 2011 accessed.

[17] L. Hong, X.J. Zhang, S. Foundling, et al. FEBS Lett. 420 (1997) 11.

T. Lu et al. / Chinese Chemical Letters 22 (2011) 1130–11341134

http://www.symyx.com/downloads/index.jsp

http://www.molinspiration.com/jme/

http://www.cambridgesoft.com/software/ChemDraw/

http://www.sqlite.org/

http://www.ebi.ac.uk/pdbe/

molecular path for ligand search

Documents