getting under the skin of an in silico approach to ... under the s… · getting under the skin of...

1
Getting under the skin of an in silico approach to predicting dermal sensitisation Donna S. Macmillan & Martyn L. Chilton Granary Wharf House, 2 Canal Wharf, Leeds, LS11 5PS n Introduction Derek Nexus is an expert toxicity prediction tool established by Lhasa Limited in 1983. It uses structure-activity relationships and reasoning rules developed by Lhasa experts to predict over 70 different toxicity endpoints including genotoxicity, carcinogenicity, skin sensitisation, HERG channel inhibition, hepatotoxicity and irritation. Skin sensitisation has been a key endpoint over the past few years, in part due to the implementation of EU regulation 1223/2009 1 which prohibits the sale and marketing of any cosmetics and cosmetic ingredients which have been tested on animals, alongside REACH 2 and CLP 3 which state that non-animal methods must be exhausted prior to considering the use of animal tests. The use of in silico tools such as Derek Nexus is increasingly popular due to their rapid toxicity assessment of chemicals, transparent predictions and access to the wealth of toxicity data and mechanistic information provided. n Method Skin sensitisation alerts - The performance of the Derek Nexus skin sensitisation alerts from 2014-2018 was assessed using an in-house dataset of 1243 sensitisers and 1300 non-sensitisers (n = 2543) with murine local lymph node assay (LLNA) and/or guinea pig data (conservative call applied when results for both were present). Chemicals activating an alert with a likelihood of equivocal or above were classified as sensitisers. Chemicals activating an alert with a likelihood of improbable or non-alerting chemicals were classified as non-sensitisers. The following metrics were calculated: sensitivity (Se), (TP/[TP+FN]*100); specificity (Sp), (TN/[TN+FP]*100); positive predictivity (PP), (TP/[TP+FP]*100); negative predictivity (NP), (TN/[TN+FN]*100); accuracy (Acc), (TP+TN/[TN+TP+FP+FN]*100). EC3 prediction model – a k-Nearest Neighbours model was developed based on a curated in-house dataset of over 1000 publicly available, LLNA studies. The model was validated using 103 previously unseen chemicals with LLNA data 4 . Integrated testing strategy (ITS) - a previously published dataset of 213 compounds with LLNA data and in chemico/in vitro data (DPRA, n = 194; KeratinoSens, n = 187; LuSens, n = 78; h-CLAT, n = 166; U-SENS, n = 149) was used, with Derek predictions and physicochemical parameters, to develop a decision tree for ITS-1 5 and ITS-2. n Conclusion Skin sensitisation alerts in Derek currently perform with an accuracy of 76%, a sensitivity of 80% and a specificity of 72%. The alerts are continually being improved through the refinement of existing alerts and the development of new alerts. Furthermore, new functionality such as the EC3 prediction model can aid users in evaluating potency, and incorporating Derek in an ITS can increase the accuracy compared to other ITS. Overall, Derek can be used to contribute to the replacement, reduction and refinement of the use of animals in research. Skin sensitisation alerts Derek alerts use structure-activity relationships (SAR) created by Lhasa experts to predict the toxicity of a given chemical. The predictions are supported by a graphical explanation of the SAR, mechanistic rationale, toxicity data of known compounds within the SAR and key references. Public, proprietary and regulatory data are used to build the alerts thereby providing extensive coverage of chemical space. The number and performance of the skin sensitisation alerts has increased steadily from 2014-2018 (Figure 1). The alerts are reviewed regularly, particularly if new public data is sourced, leading to alert refinement and/or the development of new alerts e.g. alerts 867, 878-879 and 882 (Figure 2). When proprietary data is donated, the data are anonymised and used to expand alert coverage, allowing all Derek members to benefit from improved alert predictivity and/or new alerts. EC3 prediction model validation Moving from qualitative predictions of sensitiser/non-sensitiser to quantitative predictions is a challenging but important area of research. Given the desire and/or regulatory requirement(s) to reduce, refine and replace animal testing, a methodology to quantitatively predict LLNA EC3 values was explored. A k-Nearest Neighbours model, using the weighted mean of EC3s from up to 10 similar compounds in the same mechanistic domain was developed 4 (Figure 3). Query chemical with unknown skin sensitisation potential Outcome - Detailed overview of skin sensitisation potential of the query chemical including alerting features, toxicity data, mechanistic rationale, EC3 prediction, and how Derek can be used effectively in an ITS Comparison of integrated testing strategies (ITS)/defined approaches (DA) It is generally accepted that no in chemico or in vitro assay can be used as a standalone method to replace animal models for the prediction of skin sensitisation potential. The focus has instead turned to combining multiple assays and/or molecular descriptors to derive a more accurate assessment of hazard or risk. These ITS are also known as DA and are key elements within integrated approaches to testing and assessment (IATA) for skin sensitisation, used for regulatory decision-making. Lhasa’s first approach (ITS-1) used a Derek prediction, assigned the test chemical as in or out of the applicability domain of the in chemico/in vitro assay (based on physicochemical parameters), alongside up to two in chemico/in vitro assays to predict sensitiser/non-sensitiser 5 (S/NS). Lhasa’s subsequent, as yet unpublished, ITS-2 is similar to ITS-1 (Figure 6) but (1) utilises even more information from Derek, including alert likelihood and the new negative prediction functionality and (2) additional reactivity properties are also considered when defining the applicability domain. Finally, (3) the hazard (S/NS) is predicted alongside an estimate of GHS category, based on data in the EC3 prediction model. ITS-2 improves significantly upon the specificity of ITS-1, with only a minor reduction in sensitivity (Table 2). Alert 439 Alert 867 KB n PP (%) n PP (%) Derek 2018 (unreleased) 101 58 9 100 Derek 2015 125 56 N/A N/A Derek 2014 79 57 N/A N/A categories compared to ECETOC’s 4. The majority, 66/103 (64%) were correctly assigned to their GHS category. 26/103 (25%) were predicted as more potent, and 11/103 (11%) were predicted as less potent (Figure 5). As many industries only require distinction between strong and weak sensitisers, a model which correctly assigns the GHS category may be sufficient for most use cases. Figure 4. GHS and ECETOC categories relative to EC3 values. Figure 5. Validation of EC3 prediction model using an unseen dataset. Figure 1. Improvement in Derek performance metrics between 2014-2018. Table 1. Refinement of alert 439 by separating out vinylic or allylic anisoles. ITS/DA Acc (%) Se (%) Sp (%) n Information used Lhasa ITS-1 5 86 96 57 213 Physchem properties, in silico tools, in chemico/in vitro assays Lhasa ITS-2 6 88 91 80 213 Physchem properties, in silico tools, in chemico/in vitro assays Urbisch et al 7 79 82 72 180 In chemico/in vitro assays Van der Veen et al 8 83 93 64 41 Bayesian QSAR, in chemico/in vitro assays and gene signature Patlewicz et al 9 74 74 74 100 Physchem properties, in silico tools Takenouchi et al 10 84 89 70 139 In silico tools, in chemico/in vitro assays Table 2. Comparison of ITS-1 and ITS-2 against other published ITS/DA. Figure 6. General workflow of ITS-1 and ITS-2. New features in ITS-2 are highlighted in bold. Case study: Refinement of substituted phenol alert References: (1) European Union. (2013) Off. J. Eur. Union 56, 34–66. (2) http ://www.hse.gov.uk/reach/ (3) https ://echa.europa.eu/testing-clp (4) Canipa et al (2017) J. Appl. Toxicol. 37, 985–995. (5) Macmillan et al (2016) Regul. Toxicol. Pharmacol. 76, 30–38. (6) Chilton & Macmillan, unpublished. (7) Urbisch et al (2015) Regul. Toxicol. Pharmacol. 71, 337–351. (8) van der Veen et al (2014) Regul. Toxicol. Pharmacol. 69, 371–379. (9) Patlewicz et al (2014) Regul. Toxicol. Pharmacol. 69, 529–545. (10) Takenouchi et al (2015) J. Appl. Toxicol. 35, 1318–1332. Figure 3. Methodology for predicting skin sensitisation potency based on a k-NN model. The EC3 prediction model was evaluated using an external validation dataset (n = 103), consisting of LLNA data donated to Lhasa Limited by a number of members for the specific purpose of providing a test set of unseen compounds. The experimental EC3 values in this dataset were compared to the EC3s predicted by the model and were judged as correct when the experimental (Exp) and predicted (Pred) values fell within the same GHS (2 classes) or ECETOC (4 classes) category (Figure 4). 37/103 (36%) were assigned to the correct ECETOC category, 47/103 (46%) were predicted as more potent, and the remaining 19/103 (19%) were predicted as less potent. GHS categories may be less challenging to predict than ECETOC categories as there are only 2 possible The scope of alert 439, substituted phenol or precursor, was investigated for any improvements to increase predictivity. It was found that substituted phenols produce mixed positive and negative toxicity data in mouse, guinea pig and human assays, whereas a specific sub-class of vinylic or allylic anisoles is consistently positive in the same assays. Consequently, a new alert, with excellent predictivity, was created for these anisoles (alert 867) and alert 439 refined to exclude these, leading to a slight improvement in performance (Table 1). Figure 2. Improvement in selected Derek alerts between 2014-2018. Abstract no. 68 Lhasa Limited Registered Office, Registered Charity (290866) • Granary Wharf House, 2 Canal Wharf, Leeds LS11 5PS • +44 (0)113 394 6020 • [email protected]www.lhasalimited.org Company Registration Number 01765239. Registered in England and Wales. VAT Registration Number GB 396 8737 77.

Upload: others

Post on 18-Jun-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Getting under the skin of an in silico approach to ... under the s… · Getting under the skin of an in silico approach to predicting dermal sensitisation Donna S. Macmillan & Martyn

Getting under the skin of an in silico approach to predicting dermal sensitisationDonna S. Macmillan & Martyn L. ChiltonGranary Wharf House, 2 Canal Wharf, Leeds, LS11 5PS

n Introduction

Derek Nexus is an expert toxicity prediction tool established by LhasaLimited in 1983. It uses structure-activity relationships and reasoningrules developed by Lhasa experts to predict over 70 different toxicityendpoints including genotoxicity, carcinogenicity, skin sensitisation,HERG channel inhibition, hepatotoxicity and irritation. Skin sensitisationhas been a key endpoint over the past few years, in part due to theimplementation of EU regulation 1223/20091 which prohibits the saleand marketing of any cosmetics and cosmetic ingredients which havebeen tested on animals, alongside REACH2 and CLP3 which state thatnon-animal methods must be exhausted prior to considering the use ofanimal tests. The use of in silico tools such as Derek Nexus isincreasingly popular due to their rapid toxicity assessment ofchemicals, transparent predictions and access to the wealth of toxicitydata and mechanistic information provided.

n Method

Skin sensitisation alerts - The performance of the Derek Nexus skinsensitisation alerts from 2014-2018 was assessed using an in-housedataset of 1243 sensitisers and 1300 non-sensitisers (n = 2543) withmurine local lymph node assay (LLNA) and/or guinea pig data(conservative call applied when results for both were present). Chemicalsactivating an alert with a likelihood of equivocal or above were classifiedas sensitisers. Chemicals activating an alert with a likelihood ofimprobable or non-alerting chemicals were classified as non-sensitisers.The following metrics were calculated: sensitivity (Se), (TP/[TP+FN]*100);specificity (Sp), (TN/[TN+FP]*100); positive predictivity (PP),(TP/[TP+FP]*100); negative predictivity (NP), (TN/[TN+FN]*100);accuracy (Acc), (TP+TN/[TN+TP+FP+FN]*100).

EC3 prediction model – a k-Nearest Neighbours model was developedbased on a curated in-house dataset of over 1000 publicly available,LLNA studies. The model was validated using 103 previously unseenchemicals with LLNA data4.

Integrated testing strategy (ITS) - a previously published dataset of 213compounds with LLNA data and in chemico/in vitro data (DPRA, n = 194;KeratinoSens, n = 187; LuSens, n = 78; h-CLAT, n = 166; U-SENS, n =149) was used, with Derek predictions and physicochemical parameters,to develop a decision tree for ITS-15 and ITS-2.

n Conclusion

Skin sensitisation alerts in Derek currently perform with an accuracy of76%, a sensitivity of 80% and a specificity of 72%. The alerts arecontinually being improved through the refinement of existing alerts andthe development of new alerts. Furthermore, new functionality such asthe EC3 prediction model can aid users in evaluating potency, andincorporating Derek in an ITS can increase the accuracy compared toother ITS. Overall, Derek can be used to contribute to the replacement,reduction and refinement of the use of animals in research.

Skin sensitisation alertsDerek alerts use structure-activity relationships (SAR) created byLhasa experts to predict the toxicity of a given chemical. Thepredictions are supported by a graphical explanation of the SAR,mechanistic rationale, toxicity data of known compounds within theSAR and key references. Public, proprietary and regulatory dataare used to build the alerts thereby providing extensive coverageof chemical space. The number and performance of the skinsensitisation alerts has increased steadily from 2014-2018 (Figure1). The alerts are reviewed regularly, particularly if new public datais sourced, leading to alert refinement and/or the development ofnew alerts e.g. alerts 867, 878-879 and 882 (Figure 2). Whenproprietary data is donated, the data are anonymised and used toexpand alert coverage, allowing all Derek members to benefit fromimproved alert predictivity and/or new alerts.

EC3 prediction model validationMoving from qualitative predictions of sensitiser/non-sensitiser toquantitative predictions is a challenging but important area ofresearch. Given the desire and/or regulatory requirement(s) toreduce, refine and replace animal testing, a methodology toquantitatively predict LLNA EC3 values was explored. A k-NearestNeighbours model, using the weighted mean of EC3s from up to10 similar compounds in the same mechanistic domain wasdeveloped4 (Figure 3).

Query chemical with unknown skin sensitisation potential

Outcome - Detailed overview of skin sensitisation potential of the query chemical including alerting features, toxicitydata, mechanistic rationale, EC3 prediction, and how Derek can be used effectively in an ITS

Comparison of integrated testing strategies (ITS)/defined approaches (DA)

It is generally accepted that no in chemico or in vitro assay can beused as a standalone method to replace animal models for theprediction of skin sensitisation potential. The focus has insteadturned to combining multiple assays and/or molecular descriptorsto derive a more accurate assessment of hazard or risk. These ITSare also known as DA and are key elements within integratedapproaches to testing and assessment (IATA) for skinsensitisation, used for regulatory decision-making.

Lhasa’s first approach (ITS-1) used a Derek prediction, assignedthe test chemical as in or out of the applicability domain of the inchemico/in vitro assay (based on physicochemical parameters),alongside up to two in chemico/in vitro assays to predictsensitiser/non-sensitiser5 (S/NS). Lhasa’s subsequent, as yetunpublished, ITS-2 is similar to ITS-1 (Figure 6) but (1) utiliseseven more information from Derek, including alert likelihood andthe new negative prediction functionality and (2) additionalreactivity properties are also considered when defining theapplicability domain. Finally, (3) the hazard (S/NS) is predictedalongside an estimate of GHS category, based on data in the EC3prediction model. ITS-2 improves significantly upon the specificityof ITS-1, with only a minor reduction in sensitivity (Table 2).

Alert 439 Alert 867

KB n PP (%) n PP (%)

Derek 2018 (unreleased) 101 58 9 100

Derek 2015 125 56 N/A N/A

Derek 2014 79 57 N/A N/A

categories compared toECETOC’s 4. The majority,66/103 (64%) were correctlyassigned to their GHS category.26/103 (25%) were predicted asmore potent, and 11/103 (11%)were predicted as less potent(Figure 5). As many industriesonly require distinction betweenstrong and weak sensitisers, amodel which correctly assignsthe GHS category may besufficient for most use cases.

Figure 4. GHS and ECETOC categories relative to EC3 values.

Figure 5. Validation of EC3 prediction model using an unseen dataset.

Figure 1. Improvement in Derek performance metrics between 2014-2018.

Table 1. Refinement of alert 439 by separating out vinylic or allylic anisoles.

ITS/DA Acc(%)

Se(%)

Sp(%) n Information used

Lhasa ITS-15 86 96 57 213 Physchem properties, in silico tools, in chemico/in vitro assays

Lhasa ITS-26 88 91 80 213 Physchem properties, in silico tools, in chemico/in vitro assays

Urbisch et al7 79 82 72 180 In chemico/in vitro assays

Van der Veen et al8 83 93 64 41 Bayesian QSAR, in chemico/in vitro assays and gene signature

Patlewicz et al9 74 74 74 100 Physchem properties, in silico tools

Takenouchi et al10 84 89 70 139 In silico tools, in chemico/in vitroassays

Table 2. Comparison of ITS-1 and ITS-2 against other published ITS/DA.

Figure 6. General workflow of ITS-1 and ITS-2. New features in ITS-2 are highlighted in bold.

Case study: Refinement of substituted phenol alert

References: (1) European Union. (2013) Off. J. Eur. Union 56, 34–66. (2) http://www.hse.gov.uk/reach/ (3) https://echa.europa.eu/testing-clp (4) Canipa et al (2017) J. Appl. Toxicol. 37, 985–995. (5) Macmillan et al (2016) Regul. Toxicol. Pharmacol. 76, 30–38. (6) Chilton & Macmillan, unpublished. (7) Urbisch et al(2015) Regul. Toxicol. Pharmacol. 71, 337–351. (8) van der Veen et al (2014) Regul. Toxicol. Pharmacol. 69, 371–379. (9) Patlewicz et al (2014) Regul. Toxicol. Pharmacol. 69, 529–545. (10) Takenouchi et al (2015) J. Appl. Toxicol. 35, 1318–1332.

Figure 3. Methodology for predicting skin sensitisation potency based on a k-NN model.

The EC3 prediction model was evaluated using an externalvalidation dataset (n = 103), consisting of LLNA data donated toLhasa Limited by a number of members for the specific purpose ofproviding a test set of unseen compounds. The experimental EC3values in this dataset were compared to the EC3s predicted by themodel and were judged as correct when the experimental (Exp)and predicted (Pred) values fell within the same GHS (2 classes)or ECETOC (4 classes) category (Figure 4). 37/103 (36%) wereassigned to the correct ECETOC category, 47/103 (46%) werepredicted as more potent, and the remaining 19/103 (19%) werepredicted as less potent. GHS categories may be less challengingto predict than ECETOC categories as there are only 2 possible

The scope of alert 439, substituted phenol or precursor, wasinvestigated for any improvements to increase predictivity. It wasfound that substituted phenols produce mixed positive andnegative toxicity data in mouse, guinea pig and human assays,whereas a specific sub-class of vinylic or allylic anisoles isconsistently positive in the same assays. Consequently, a newalert, with excellent predictivity, was created for these anisoles(alert 867) and alert 439 refined to exclude these, leading to aslight improvement in performance (Table 1).

Figure 2. Improvement in selected Derek alerts between 2014-2018.

Abstract no. 68

Lhasa Limited Registered Office, Registered Charity (290866) • Granary Wharf House, 2 Canal Wharf, Leeds LS11 5PS • +44 (0)113 394 6020 • [email protected] • www.lhasalimited.org Company Registration Number 01765239. Registered in England and Wales. VAT Registration Number GB 396 8737 77.