utilisation of non-adverse data to predict molecular ......using an adverse outcome pathway (aop)...
TRANSCRIPT
Categories 5ARi ARM ERM GR-me
Data
Number of substances Per MEAD 1261 4849 6952 4030
Active substances (equivocals removed) 66% (83%) 57% (62%) 53% (55%) 61% (65%)
Response type known for active substances NA 68% 51.2% 1843
Data points verified (from primary publication) 66% 64% 46% 35%
Exp
ert
Sys
tem
Total number of SAR alerts in custom KB 24 24 48 24
Number of existing teratogenicity alerts in DX 1 2 3 1
Number of new MIE-based alerts 23 22 45 23
False negatives now correctly predicted 78% 59% 62% 67%
Utilisation of Non-adverse Data to Predict Molecular Initiating Events and
Teratogenicity for a Broader Chemical Space
Alun Myden*, Adrian Fowkes, Emma Hill, Jeffrey Plante, Alex Cayley and Bashir Surfraz
Lhasa Limited, Granary Wharf House, 2 Canal Wharf, Leeds, LS11 5PS
Benefits of utilising non-adverse data
The scarcity of teratogenicity data and the cost of in vivo reproductive toxicity studies are driving the use of a wider range of assays, where the relationship between data and teratogenicity can be established. Similarly this lack of
data is affecting the reliability domain of prediction systems for teratogenicity. Using an adverse outcome pathway (AOP) framework, key events (KE) leading to teratogenicity can be mapped and suitable in vitro and in vivo
assays, which model the KEs can be identified. This type of relevant data is available for a significantly larger number of chemicals in comparison to teratogenicity data, which in turn can be mined to extract useful knowledge
allowing for teratogenicity predictions for a broader chemical space. This is demonstrated with the four molecular initiating events (MIEs) from the steroidogenesis pathway (Figure 1). Relevant data for oestrogen receptor
modulation (ERM), androgen receptor modulation (ARM), 5-alpha reductase inhibition (5aRI) and glucocorticoid receptor-mediated effects (GR-me) were gathered from ChEMBL [1] and curated into structured
mechanistic expert activity datasets (MEADs, Figure 2). These MEADs also take into consideration potency and mode of action (Figure 3). Each of these purposeful datasets were then mined to populate structural alerts in
separate Derek Nexus (DX) custom knowledge bases (KBs) in order to create transparent in silico expert systems for the prediction of MIEs and teratogenicity (Table 1).
MIE-specific expert systems
Publically available data relevant to the four MIEs (ARM, ERM, GR-me and 5aRI) were downloaded from
ChEMBL [1], analysed and curated using automation (KNIME) [2]. A sample data verification was performed
manually to assess quality. Activity calls such as active, weakly active, equivocal and inactive were attributed to
each data entry based on expert-derived rules and an overall activity for each compound was then assigned
using a conservative approach to create a mechanistic dataset (MEAD) for each MIE. Clustering analysis using
in house software [3] performed on each mechanistic dataset provided a list of clusters which were prioritised
based on coverage of false negatives (active compounds which were not correctly predicted in Derek Nexus
[4]). Structural features identified this way were then converted into structural alerts (Table 1).
MEADs: chemical space visualisation of active substances
The chemical space occupied by predicted and unpredicted active chemicals was analysed for each of the mechanistic datasets (MEADs) using SLogP and Molecular Weight (MW) as chosen descriptors (Figure 7). Although low
MW and SLogP compounds are not well predicted, significant improvement is achieved elsewhere in terms of the coverage of active substances.
Conclusions
MIE-based mechanistic expert call datasets are created successfully by applying expert rules to relevant pharmacological data.
Expert systems created using alternative data offer predictions for teratogenicity and 4 specific MIEs for a significantly broader chemical space.
A majority of false negatives are predicted by all 4 MIE-specific expert systems.
Active chemicals not predicted by the expert systems, such as low MW and SLogP substances, can be further investigated using machine-learnt methods.
0
20
40
60
80
100
1 6 11 16 21 26
Number of alerts
ARM MIE alerts
5ARi MIE alerts
0
20
40
60
80
100
1 11 21 31 41
Number of alerts
GR-me MIE alertsERM MIE alerts
Fals
e n
ega
tive
s n
ow
co
rre
ctly
pre
dic
ted
(%
)
Fals
e n
ega
tive
s n
ow
co
rre
ctly
pre
dic
ted
(%
)
Figure 1: Enzymes and receptors in steroidogenesis pathway
investigated as targets towards establishing a link to teratogenicity.
Figure 2: Workflow describing data handling (in green) followed by
knowledge injection (in blue) to create purposeful mechanistic datasets.
Figure 3: Illustration of the invaluable input from experts to create a Lhasa ERM
dataset from the heterogeneous data downloaded from ChEMBL.
Figure 4: Improvements in the prediction of active compounds not predicted in Derek Nexus.
Figure 5: Improvement of expert models against corresponding datasets (MEADs) following the addition of MIE alerts.
Table 1: Detailed analysis of the 4 MEADs (data) and custom knowledge bases (expert systems) to provide MIE and teratogenicity predictions for a considerably wider chemical space.
Figure 6: Data sources used to create evaluation sets for each MIE (left). Comparison of performance of each MIE
expert model using corresponding training sets (MEADs) and evaluation sets (right).
Balanced Accuracy
Sensitivity Specificity Positive Predictivity
Negative Predictivity
0
20
40
60
80
100
(%)
5aRI: Teratogenicity alerts 5aRI: Teratogenicity and MIE alerts
ARM: Teratogenicity alerts ARM: Teratogenicity and MIE alerts
Balanced Accuracy
Sensitivity Specificity Positive Predictivity
Negative Predictivity
0
20
40
60
80
100
(%)
ERM:Teratogenicity alerts ERM: Teratogenicity and MIE alerts
GR-me: Teratogenicity alerts GR-me: Teratogenicity and MIE alerts
Expert systems: performance and validation
A considerable increase in coverage of respective mechanistic datasets by each expert model is obtained
following the addition of new MIE alerts (Figure 5). To validate these models, an evaluation set is created for
each MIE based on data from various sources [1,5-8] and excluded chemicals present in the training sets
(Figure 6, left). The validation results for each model is quite promising (Figure 6, right). As each validation
set is skewed towards inactive compounds the relatively low sensitivities are unsurprising. In contrast, the
models provide good negative predictions (> 85%) despite the fact that the training sets created from ChEMBL
data are biased towards active compounds.
Figure 7: Chemical space occupied by active compounds present in specific MEADS ( known teratogens, chemicals activating existing teratogenicity alerts in DX, compounds activating new MIE-based alerts and false negatives not predicted.
References[1] Bento et al, Nucleic Acids Res., 42, 2014, 1083-1090;
[2] Berthold et al, KNIME: The Konstanz Information Miner, Springer, 1007;
[3] Serhod et al, J Chem Inf Model., 54, 2014, 1864-1879;
[4] Derek Nexus v2.1.0 - Lhasa Limited; https://www.lhasalimited.org/products/derek-
nexus.htm;
[5] Ding et al, BMC Bioinformatics, 11, 2010, S5,
[6] Shen et al, Toxicological Science, 135, 2013, 277-291;
[7] https://www.epa.gov/chemical-research/toxicity-forecasting;
[8] https://ncats.nih.gov/tox21
0
20
40
60
80
100
%Actives Balanced Accuracy
Sensitivity Specificity PPV NPV
(%)
ARM MEAD ARM evaluation set ERM MEADERM evaluation set GR-me MEAD GR-me evaluation set
Lhasa GR, 4%
Tox21, 58%
ToxCast, 38%
GR-me evaluation set: 7626 chemicals
FDA EADB, 23%
FDA EDKB, 14%Tox21, 51%
ToxCast, 12%
ERM evaluation set: 11104 chemicals
FDA EDKB, 2%
Tox21, 79%
ToxCast, 19%
ARM evaluation set: 7327 chemicals