derek nexus and the prediction of human skin sensitisation

1
64 41 8 39 56 36 25 21 59 1A 1B 2 1A 1B 2 Derek Nexus GHS Classification Human GHS Classification Overpredicted Correctly predicted Underpredicted 19% 60% 20% 24% 51% 24% Derek Nexus LLNA Model Performance in predicting human GHS classifications 52% 84% 75% Specificity Sensitivity Accuracy Metric 94% 76% 52% 2 1B 1A Percentage of chemicals with correctly predicted hazard GHS potency classification 45% 48% 94% 88% 82% 78% Specificity Sensitivity Accuracy Metric Derek Nexus LLNA 126 150 113 2 1B 1A GHS potency classification Derek Nexus and the Prediction of Human Skin Sensitisation Potential: An Evaluation Donna S. Macmillan 1 , Martyn L. Chilton 1 , David A. Basketter 2 1. Lhasa Limited, Granary Wharf House, 2 Canal Wharf, Leeds, LS11 5PS, UK 2. DABMEB Consultancy Ltd, Sharnbrook, UK Hazard performance Hazard predictions from Derek Nexus with a likelihood of equivocal or above were considered positive, whereas explicit negative predictions or those with a likelihood of doubted or below were considered negative . This enabled the accuracy, sensitivity and specificity of the alerts to be calculated (Figure 2 ) . Figure 2. Performance of Derek Nexus when predicting human skin sensitisation. Table 1. Some chemical classes that were commonly mispredicted in Derek Nexus. Figure 3. Performance of Derek Nexus broken down by GHS potency classification. Figure 4. Performance of Derek Nexus and the LLNA in predicting human skin sensitisation. The false positive and false negative predictions were further analysed , and some common chemical classes were identified (Table 1 ) . Furthermore, one third of the false positive predictions (n = 18 ) were found to consist of alerts firing in Derek Nexus with a likelihood of equivocal, indicating that there was a either a lack of evidence or conflicting evidence that these chemicals cause sensitisation in humans . If all equivocal predictions were discounted then the specificity would rise to 62 % , at the cost of a drop in coverage ( 90 % ) . The performance of both Derek Nexus and the LLNA in predicting human sensitisation were compared (Figure 4 ) . This analysis was carried out on the chemicals for which LLNA data were also available ( n = 166 , 43 % of dataset) . Derek Nexus displaye d a high sensitivity when predicting human skin sensitisation ( 84 % ), although the specificity was lower at 52 % . When considering the GHS potency classifications of the chemicals ( Figure 3 ), strong human sensitisers were predicted particularly well (sensitivity = 94 % ) . Abstract number 3068 References 1. (a) OECD, Test No. 442C , 2015 , OECD Publishing, Paris; (b) OECD, Test No. 442D , 2017 , OECD Publishing, Paris; (c) OECD, Test No. 442E , 2017 , OECD Publishing, Paris 2. (a) D. A. Basketter et al., Dermatitis , 2014 , 25 , 11 - 21; (b) A. M. Api et al., Dermatitis , 2017 , 28 , 299 - 307 3. Derek Nexus v6.0.1 (Lhasa Limited), www.lhasalimited.org/products/derek - nexus.htm 4. (a) D. Kayser, E. Schlede (Eds.), Chemikalien und Kontaktallergie Eine bewertende Zusammenstellung , 2001 , Verlag Urban & Vogel, Munchen; (b) E. Schlede et al., Toxicology , 2003 , 193 , 219 - 259 5. S. J. Canipa et al., J. Appl. Toxicol . , 2017 , 37 , 985 - 995 6. (a) I. Kimber, M. A. Pemberton, Regul . Toxicol . Pharmacol . , 2014 , 70 , 24 - 32; (b) A. Lazarov , J. Eur. Acad. Dermatol. Venereol . , 2006 , 21 , 169 - 171 7. A. - T. Karlberg et al., Contact Dermatitis , 2013 , 69 , 323 - 334 Method Two skin sensitisation datasets, both containing expert - derived human potency categorisations, were combined (Figure 1 ) . In the first dataset (“Basketter”), chemicals were placed into one of 6 potency categories based on analysis of their human data alone . 2 This data included No Observed Effect Levels values from human patch tests, diagnostic patch test data from dermatology clinics, and usage information (as a surrogate for exposure) . The second dataset (“BgVV”) contained chemicals placed into one of 3 potency categories, based on analysis of their human clinical and experimental data, alongside animal data . 4 Hazard predictions were made for the human dataset using the skin sensitisation alerts in Derek Nexus v 6 . 0 . 1 (knowledge base = 2018 1 . 1 , species = human) . Potency predictions were also made using Derek Nexus’ alert - based k - nearest neighbours model, which is built upon a dataset of LLNA EC 3 values for > 650 chemicals . 5 Figure 1. Construction and composition of the human skin sensitisation dataset. Potency performance The GHS classifications of the chemicals in the human skin sensitisation dataset were predicted using Derek Nexus (Figure 5 ) . If a chemical was already known to the model, it was removed from its training set prior to a prediction being made . Potency predictions were made for a total of 349 chemicals, and of these 51 % were placed into the correct GHS category . For comparison, the accuracy of the LLNA in predicting human GHS classifications was 60 % , based on analysis of the chemicals which had both sets of in vivo data (n = 166 ) . Figure 5. Performance of Derek Nexus and the LLNA in predicting human GHS classifications . Basketter category BgVV category Hazard classification GHS potency classification 1 or 2 A Sensitiser 1A 3 or 4 B Sensitiser 1B 5 or 6 C Non - sensitiser 2 Where a chemical had data present in both datasets, preference was given to the most potent categorisation Basketter dataset Refs 2a and 2b Combine datasets Human dataset n = 215 n = 389 n = 255 BgVV dataset Refs 4a and 4b Overall classification Standardise structures n = 414 1 Extreme sensitiser 2 Strong sensitiser 3 Moderate sensitiser 4 Weak sensitiser 5 Very weak sensitiser 6 Non - sensitiser A Significant contact allergen B Solid - based indication for contact allergenic effects C Insignificant contact allergen or questionable C contact allergenic effect Remove stereochemistry Neutralise salts Discard undefined structures False positives Comments False negatives Comments 16 acrylate esters All in BgVV category C 11 have + ve GPMT data, 1 is ve Chemicals with this substructure are known to be weak sensitisers in mice, and can sensitise humans given prolonged exposure 6 As such , these predictions could be considered to be true positives Updated performance would be: sensitivity = 85%, specificity = 61 % 5 terpenoids with an allylic H atom All in GHS category 1B 1 has + ve LLNA data, 1 is ve Suspected prehaptens 7 Possible candidates for inclusion in equivocal terpenoid alert 712 3 substituted aromatic aldehydes All in GHS category 1B 2 have + ve animal data, 1 is ve Excluded from aldehyde alert 419 Data are conflicting for this class Conclusions Derek Nexus identified the majority of human skin sensitisers in the dataset, displaying a sensitivity of 84 % , which increased to 94 % when only considering strong sensitisers . However, the lower specificity shows that it can over - predict the sensitising ability of human non - sensitisers . As most of Derek Nexus’ alerts use animal data as supporting evidence, this may reflect a tendency for such data to over - predict human sensitisation potential ; the LLNA also displayed a high sensitivity and a low specificity against the human data in this study . GHS potency classification of the chemicals using Derek Nexus’ EC 3 model proved more challenging, with 51 % of the chemicals being correctly predicted . This performance is again not dissimilar to that of the LLNA ( 60 % correctly classified) . Future work will focus on incorporating more expert - derived human potency data into in silico hazard and potency models in order to make more reliable predictions of skin sensitisation in humans . Introduction Skin sensitisation is an important toxicological endpoint, which has traditionally been assessed using in vivo animal tests conducted on mice or guinea pigs . Recent years have seen much effort put into the development, use and validation of alternative in chemico and in vitro assays, 1 which are typically assessed by comparison to the existing in vivo data . However, for skin sensitisation there is a distinct lack of human in vivo reference data available to validate these methods against, as a result of both ethical considerations and also concerns about the quality of such data when it is generated . Recently, a group of experts in the field have endeavoured to classify the skin sensitisation potency of a number of substances based on analysis of their human data alone, in order to provide a dataset of relevant human reference data to validate non - animal methodologies against . 2 Derek Nexus is an in silico , expert - knowledge based system capable of predicting skin sensitisation hazard and potency, based on a knowledge base containing 90 structural alerts . 3 These alerts are known to perform well against in vivo animal data : e . g . sensitivity = 79 % and specificity = 71 % against an in - house dataset of > 2500 chemicals with murine Local Lymph Node Assay (LLNA) and/or Guinea Pig Maximisation Test (GPMT) data . The aim of this work was to investigate how well the model predicts human skin sensitisation . * Over - predicted: predicted potency is more potent than observed potency * Under - predicted: predicted potency is less potent than observed potency

Upload: others

Post on 24-Mar-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Derek Nexus and the Prediction of Human Skin Sensitisation

64 41 8

39 56 36

25 21 59

1A

1B

2

1A 1B 2

Derek Nexus GHS Classification

Hu

man

GH

S C

lass

ific

atio

n

Over−predicted

Correctly predicted

Under−predicted

19%

60%

20%

24%

51%

24%

Derek Nexus LLNA

Model

Per

form

ance

in p

red

icti

ng

hu

man

GH

S c

lass

ific

atio

ns

52%

84%

75%

Specificity

Sensitivity

Accuracy

Met

ric

94%

76%

52%2

1B

1A

Percentage of chemicals with correctly predicted hazard

GH

S p

ote

ncy

clas

sifi

cati

on

45%

48%

94%

88%

82%

78%

Specificity

Sensitivity

Accuracy

Met

ric Derek Nexus

LLNA

126

150

1132

1B

1A

GH

S p

ote

ncy

clas

sifi

cati

on

Derek Nexus and the Prediction of HumanSkin Sensitisation Potential: An EvaluationDonna S. Macmillan1, Martyn L. Chilton1, David A. Basketter2

1. Lhasa Limited, Granary Wharf House, 2 Canal Wharf, Leeds, LS11 5PS, UK2. DABMEB Consultancy Ltd, Sharnbrook, UK

� Hazard performance

Hazard predictions from Derek Nexus with a likelihood of equivocal or above were

considered positive, whereas explicit negative predictions or those with a likelihood of

doubted or below were considered negative. This enabled the accuracy, sensitivity and

specificity of the alerts to be calculated (Figure 2).

Figure 2. Performance of Derek Nexus when predicting human skin sensitisation.

Table 1. Some chemical classes that were commonly mispredicted in Derek Nexus.

Figure 3. Performance of Derek Nexus broken down by GHS potency classification.

Figure 4. Performance of Derek Nexus and the LLNA in predicting human skin sensitisation.

The false positive and false negative predictions were further analysed, and some common

chemical classes were identified (Table 1). Furthermore, one third of the false positive

predictions (n = 18) were found to consist of alerts firing in Derek Nexus with a likelihood of

equivocal, indicating that there was a either a lack of evidence or conflicting evidence that

these chemicals cause sensitisation in humans. If all equivocal predictions were discounted

then the specificity would rise to 62%, at the cost of a drop in coverage (90%).

The performance of both Derek Nexus and the LLNA in predicting human sensitisation were

compared (Figure 4). This analysis was carried out on the chemicals for which LLNA data

were also available (n = 166, 43% of dataset).

Derek Nexus displayed a high sensitivity when predicting human skin sensitisation (84%),

although the specificity was lower at 52%. When considering the GHS potency

classifications of the chemicals (Figure 3), strong human sensitisers were predicted

particularly well (sensitivity = 94%).

Abstract number – 3068

� References1. (a) OECD, Test No. 442C, 2015, OECD Publishing, Paris; (b) OECD, Test No. 442D, 2017, OECD Publishing, Paris;

(c) OECD, Test No. 442E, 2017, OECD Publishing, Paris2. (a) D. A. Basketter et al., Dermatitis, 2014, 25, 11-21; (b) A. M. Api et al., Dermatitis, 2017, 28, 299-3073. Derek Nexus v6.0.1 (Lhasa Limited), www.lhasalimited.org/products/derek-nexus.htm4. (a) D. Kayser, E. Schlede (Eds.), Chemikalien und Kontaktallergie – Eine bewertende Zusammenstellung, 2001, Verlag

Urban & Vogel, Munchen; (b) E. Schlede et al., Toxicology, 2003, 193, 219-2595. S. J. Canipa et al., J. Appl. Toxicol., 2017, 37, 985-9956. (a) I. Kimber, M. A. Pemberton, Regul. Toxicol. Pharmacol., 2014, 70, 24-32; (b) A. Lazarov, J. Eur. Acad. Dermatol.

Venereol., 2006, 21, 169-1717. A.-T. Karlberg et al., Contact Dermatitis, 2013, 69, 323-334

� Method

Two skin sensitisation datasets, both containing expert-derived human potency

categorisations, were combined (Figure 1). In the first dataset (“Basketter”), chemicals were

placed into one of 6 potency categories based on analysis of their human data alone.2 This

data included No Observed Effect Levels values from human patch tests, diagnostic patch

test data from dermatology clinics, and usage information (as a surrogate for exposure). The

second dataset (“BgVV”) contained chemicals placed into one of 3 potency categories,

based on analysis of their human clinical and experimental data, alongside animal data.4

Hazard predictions were made for the human dataset using the skin sensitisation alerts in

Derek Nexus v6.0.1 (knowledge base = 2018 1.1, species = human). Potency predictions

were also made using Derek Nexus’ alert-based k-nearest neighbours model, which is built

upon a dataset of LLNA EC3 values for >650 chemicals.5

Figure 1. Construction and composition of the human skin sensitisation dataset.

� Potency performance

The GHS classifications of the chemicals in the human skin sensitisation dataset were

predicted using Derek Nexus (Figure 5). If a chemical was already known to the model, it

was removed from its training set prior to a prediction being made. Potency predictions were

made for a total of 349 chemicals, and of these 51% were placed into the correct GHS

category. For comparison, the accuracy of the LLNA in predicting human GHS classifications

was 60%, based on analysis of the chemicals which had both sets of in vivo data (n = 166).

Figure 5. Performance of Derek Nexus and the LLNA in predicting human GHS classifications.

Baskettercategory

BgVVcategory

Hazardclassification

GHS potencyclassification

1 or 2 A Sensitiser 1A

3 or 4 B Sensitiser 1B

5 or 6 C Non-sensitiser 2

Where a chemical had data present in both datasets,preference was given to the most potent categorisation

Basketterdataset

Refs 2aand 2b

Combinedatasets

Humandataset

n = 215

n = 389

n = 255

BgVVdataset

Refs 4aand 4b

Overallclassification

Standardisestructures

n = 414

1 – Extreme sensitiser2 – Strong sensitiser3 – Moderate sensitiser4 – Weak sensitiser5 – Very weak sensitiser6 – Non-sensitiser

A – Significant contact allergenB – Solid-based indication for contact allergenic effectsC – Insignificant contact allergen or questionableC – contact allergenic effect

• Removestereochemistry

• Neutralise salts• Discard undefined

structures

False positives Comments False negatives Comments

• 16 acrylate esters• All in BgVV category C• 11 have +ve GPMT data, 1 is –ve• Chemicals with this substructure

are known to be weak sensitisersin mice, and can sensitise humansgiven prolonged exposure6

• As such, these predictions couldbe considered to be true positives

• Updated performance would be:sensitivity = 85%, specificity = 61%

• 5 terpenoids with an allylic H atom• All in GHS category 1B• 1 has +ve LLNA data, 1 is –ve• Suspected prehaptens7

• Possible candidates for inclusionin equivocal terpenoid alert 712

• 3 substituted aromatic aldehydes• All in GHS category 1B• 2 have +ve animal data, 1 is –ve• Excluded from aldehyde alert 419• Data are conflicting for this class

� Conclusions

Derek Nexus identified the majority of human skin sensitisers in the dataset, displaying a

sensitivity of 84%, which increased to 94% when only considering strong sensitisers.

However, the lower specificity shows that it can over-predict the sensitising ability of human

non-sensitisers. As most of Derek Nexus’ alerts use animal data as supporting evidence, this

may reflect a tendency for such data to over-predict human sensitisation potential; the LLNA

also displayed a high sensitivity and a low specificity against the human data in this study.

GHS potency classification of the chemicals using Derek Nexus’ EC3 model proved more

challenging, with 51% of the chemicals being correctly predicted. This performance is again

not dissimilar to that of the LLNA (60% correctly classified). Future work will focus on

incorporating more expert-derived human potency data into in silico hazard and potency

models in order to make more reliable predictions of skin sensitisation in humans.

� Introduction

Skin sensitisation is an important toxicological endpoint, which has traditionally been

assessed using in vivo animal tests conducted on mice or guinea pigs. Recent years have

seen much effort put into the development, use and validation of alternative in chemico and

in vitro assays,1 which are typically assessed by comparison to the existing in vivo data.

However, for skin sensitisation there is a distinct lack of human in vivo reference data

available to validate these methods against, as a result of both ethical considerations and

also concerns about the quality of such data when it is generated. Recently, a group of

experts in the field have endeavoured to classify the skin sensitisation potency of a number

of substances based on analysis of their human data alone, in order to provide a dataset of

relevant human reference data to validate non-animal methodologies against.2

Derek Nexus is an in silico, expert-knowledge based system capable of predicting skin

sensitisation hazard and potency, based on a knowledge base containing 90 structural

alerts.3 These alerts are known to perform well against in vivo animal data: e.g. sensitivity =

79% and specificity = 71% against an in-house dataset of >2500 chemicals with murine

Local Lymph Node Assay (LLNA) and/or Guinea Pig Maximisation Test (GPMT) data. The

aim of this work was to investigate how well the model predicts human skin sensitisation.

*Over-predicted: predicted potency ismore potent than observed potency*Under-predicted: predicted potency isless potent than observed potency