chemical landscape analysis – the case of tautomers
DESCRIPTION
Poster at 6th Joint Sheffield Conference on Chemoinformatics by Nina Jeliazkova, Nikolay T. Kochev, Vedrin Jeliazkov The Structure-Activity Relationships (SAR) landscape and activity cliffs concepts are a popular analysis and visualisation technique, with origins in medicinal chemistry and receptor-ligand interactions modelling. While intuitive, the definition of an activity cliff as a “pair of structurally similar compounds with large differences in potency” is commonly recognized as ambiguous. We have recently proposed a new and efficient method for identifying activity cliffs and visualization of activity landscapes [1]. The method introduces a probabilistic measure - the likelihood of a compound having large activity difference compared to other compounds, while being highly similar to them. The likelihood is effectively a quantification of a SAS Map with defined thresholds and does not require the storage of the pairwise similarity matrix. The method generates a list of individual compounds, ranked according to their likelihood of being involved in the formation of activity cliffs , and goes beyond characterizing cliffs by structure pairs only. Every compound is associated with zero, one, or more compounds with similarity and activity difference above the defined threshold. The paired structures can be easily retrieved by a standard similarity query. The arrangement as a graph naturally emerges from the set of top ranked compounds, as they are usually interconnected as activity cliffs pairs. The popular matched molecular pairs approach could be considered a special case, but is also improved by being able to identify multiple matching pairs at once. We now extend the landscape analysis and visualisation of datasets, where the chemical structures are represented by more than one tautomer, and study the influence of the tautomerization on the SAR landscape. The tautomer generation relies on the Ambit-Tautomer open source package, developed by the authors [2]. Finally, the method is implemented as part of an existing open source Ambit package [3] and could be accessed via an OpenTox API compliant web service. OpenTox API provides a uniform REST web service application programming interface (API) to chemical structures, experimental data and calculated properties, descriptor calculation, model building, validation and reporting [4]. The AMBIT web services package [3] is being developed by Ideaconsult Ltd. and is one of the several existing independent implementations of the OpenTox API, providing data sharing and remote calculations capabilities. Visualisation of the ranked activity cliffs by bubble charts is presented and interactive visualisation at http://toxmatch.sf.net are available. [1] http://www.ncbi.nlm.nih.gov/pubmed/23110534 [2] http://dx.doi.org/10.1002/minf.201200133 [3] http://www.jcheminf.com/content/3/1/18 [4] http://www.jcheminf.com/content/2/1/7TRANSCRIPT
The method generates a ranking list of individual compounds, ordered
according to their likelihood of being involved in the formation of activity cliffs.
The following examples are using the PubChem Thrombin inhibitors assay AID
1215, the Tanimoto similarity is calculated by The CDK library fingerprints.
Chemical Landscape Analysis – the case of tautomers
References
[1] Jeliazkova, N., Jeliazkov V., Chemical Landscape Analysis with the Opentox Framework, Current Topics in Medicinal Chemistry, 2012, 12(18);1987-2001(15).
[2] Kochev N., Paskaleva V., Jeliazkova N., Ambit-Tautomer: An Open Source Tool for Tautomer Generation, Molecular Informatics, 2013, 32(5-6):481-504.
[3] Jeliazkova N., Jeliazkov V., AMBIT RESTful web services: an implementation of the Open Tox application programming interface, Journal of Chemoinformatics 2011, 3:18
[4] AMBIT project, http://ambit.sourceforge.net
Nina Jeliazkova*1, Nikolay T. Kochev2, Vedrin Jeliazkov1
*e-mail : [email protected] twitter: @10705013; 1Ideaconsult Ltd, 4 Angel Kanchev Str., Sofia 1000, Bulgaria; 2University of Plovdiv, Department of Analytical Chemistry and Computer Chemistry, Bulgaria
What: The Structure-Activity Relationships (SAR) landscape and activity cliffs
Why: analysis and visualisation technique
Origin: medicinal chemistry and receptor-ligand interactions modelling
Activity cliff definition: a “pair of structurally similar compounds with large
differences in potency”
State of the art : SAS Maps, network graphs, quantification by SALI , SARI ,
number of methods analyzing the pairwise similarity matrix; various
extensions
Pros: intuitive Cons: ambiguous, not scalable to large datasets
Method:
We have recently proposed a new and efficient method for identifying
activity cliffs and visualization of activity landscapes [1]. The method ranks
the activity cliffs by a probabilistic measure - the likelihood of a compound
having large activity difference compared to other compounds, while being
highly similar to them.
Table 1. Conditional probability of events co-occurrence
𝑮𝟐 = 𝒂 𝐥𝐥𝐥𝒂 𝒄 + 𝒅𝒄 𝒂 + 𝒃
+ 𝒃 𝐥𝐥𝐥𝒃 𝒄 + 𝒅𝒅 𝒂 + 𝒃
Background Activity cliffs ranking The tautomers in the chemical landscape
s (high similarity) ! s ( low similarity) t (large activity difference) a ~ P(s| t) b ~ P(!s| t) !t (small activity difference) c ~ P( s| !t) d ~ P (!s | !t)
d, scaffold hops III
c, smooth IV
a, activity cliffs I
b, nondescript II
G2 Rank ID a b c d IC50, µM G2
1 12371 2 216 0 310 50 (inactive)
32.34
2 12413 1 310 1 216 5.84 0.07
3 12439 1 308 1 218 10.90 0.07
Visualisation: Bubble Chart
The circles area is proportional to G2.
The activity cliffs are as in the Table 2 ranking
Fig 2.The result of a similarity query for the top ranked compound ID = 12731
We extend the landscape analysis and visualisation of datasets, where the
chemical structures are represented by more than one tautomer, and study
the influence of the tautomerization on the SAR landscape. The Thrombin
inhibitors dataset (AID 1215) contains 529 structures. The tautomers
enriched dataset (generated by Ambit-Tautomers package [2]) consists of
6145 structures.
Table 2. Activity cliffs ranking by G2 (Tanimoto threshold> 0.8 and activity difference > 21.6)
Fig 1. SAS Map of Pubchem Thrombin inhibitor assay AID 1215 ( IC50 , μM); Tanimoto similarity on hashed 1024 bit fingerprints (The CDK library) Counts a, b, c, d as in Table 1 .
If taking into account only structure pairs
between a given compound and all other
compounds in the analysed dataset, the G2
characterizes the likelihood of this particular
compound to form activity cliffs with the
compounds in the dataset. By estimating G2 of
all structures in the dataset, a ranking can be
established, thus identifying the most eminent
activity cliffs.
Note that this is a ranking of individual structures, not pairs of
structures. This is a significant advantage, especially when processing large
datasets, as only the likelihood (or the four counts) need to be stored
per compound, instead of the entire pairwise matrix. The column a
gives the number of pairs that form activity cliffs with the compound. The
paired structures can be easily retrieved by a standard similarity query. The
arrangement as a graph naturally emerges from the set of top ranked
compounds, as they are usually interconnected as activity cliffs pairs.
The network graph
The bubble chart is space efficient and can represent a
large number of values in a small space.
More (interactive) examples at:
http://toxmatch.sf.net
Combined bubble chart of G2 ranked compounds. Similarity threshold 0.8; each color corresponds to a
different activity difference threshold. The gray color at the right indicates the structures with count a = 0, but
G2>0, due to the additive smoothing. These are potential activity cliffs at different similarity thresholds.
PubChem AID1215, Tautomers enriched
The network graph The bubble chart
There are 8 activity cliffs pair instead of only one in the original dataset (Fig
2). The bubble chart shows that the G2 ranking is not the same for all the
tautomers of the same compound (the size of the circles of the same color
differs).
The enriched dataset contains 8 tautomers per each of the three structures at Fig. 2. Blue: tautomers of ID = 12731 Red : tautomers of ID = 12413 Green : tautomers of ID = 12439
Fig 3. Activity cliffs ranking of the tautomer enriched AID 1215 dataset
The network graph at Fig 3 shows that the activity cliffs never involve more
than one tautomer. Therefore, if the correct combination of tautomers is
missing in a particular dataset, the activity cliffs might not be identified.
DSSTox CPDBAS dataset (carcinogenicity) Multiple activities and thresholds (1519 structures).
Fig 4. The tautomers of ID = 12731
(Pubchem SID 861943) 1-[5-(4-bromophenyl)-7-(4-methoxyphenyl)-1,7-dihydro-[1,2,4]triazolo[1,
5-a]pyrimidin-2-yl]pyrrolidine-2,5-dione
The activity cliffs ranking method is
implemented as part of the open source
Ambit package [3, 4] and could be accessed
via REST web service (OpenTox API
compliant). All the user interface and charts
are JavaScript based and accessible through
modern web browsers.
Finally, each of the original structures is
assigned the maximum G2, taken over the
set of its tautomers. Then the structures
are ordered and the rank is assigned.
Fig.5 is an illustration how the activity cliff
ranking changes, compared to the
ranking derived form the original
structures only.
1
101
201
301
401
501
1101201301401501
Ran
k
(tau
tom
ers
enri
ched
dat
aset
)
Rank (original dataset)
Fig 5. Activity cliff ranking in the original and tautomer enriched datasets
The method goes beyond
finding structure pairs only.
* Additive (Laplace) smoothing is used to deal with zero counts
The likelihood G2 is effectively a quantification of a SAS Map with
defined thresholds. It can be calculated for the entire dataset (Fig. 1), for a
selected set of compounds, or for an individual compound.