prediction of salmonella mutagenicity€¦ · two computer-based systems (case0 and topkat®), the...

14
Mutagenesis vol.11 no.5 pp.471^(84, 19% Prediction of Salmonella mutagenicity E-Zeiger 17 , J.Ashby 2 , G.Bakale 3 , K.Enslein 4 , G.Klopman 5 and HiJ.Rosenkranz 6 'Environmental Toxicology Program, N1EHS, Research Triangle Park, NC, USA, 2 Zeneca Central Toxicology Laboratory, Macclesfield, Cheshire, UK, 'Department of Radiology, Case Western Reserve University, Cleveland, OH, 4 Health Designs, Inc., Rochester, NY, 'Department of Chemistry, Case Western Reserve University, Cleveland, OH and 'Department of Environmental and Occupational Health, University of Pittsburgh, Pittsburgh, PA, USA 7 To whom correspondence should be addressed at: WC-05, NIEHS, PO Box 12233, Research Triangle Park, NC 27709, USA The ability of a number of prediction systems was examined to determine how well they could predict Salmonella mutagenicity. The prediction systems included two computer-based systems (CASE 0 and TOPKAT®), the measurement of a physiochemical parameter (k e ) and the use of structural alerts by an expert chemist. The computer- based systems operators and the chemist were supplied with the structures of 100 chemicals that had been tested for mutagenicity in the Salmonella test,* the actual chemicals were needed for the physiochemical measurement None of the participants was provided with the chemical names or Salmonella test results prior to submitting their predic- tions. The three systems that predicted the mutagenicity from the structure of the chemicals produced equivalent results (71-76% concordance with the Salmonella results); the physiochemical system produced a lower (60-61%) concordance. Introduction During the past few years there has been increasing interest in the use of structure-activity relationships (SARs), and/ or physicochemical properties, for predicting the biological activity of chemicals. The reasons for the interest in these systems include decreased cost and time per chemical as compared with animal or cell systems for identifying toxico- logical effects of chemicals; the reduction in the use of animals for toxicological testing; and the acquisition of information that may aid in the elucidation of the mechanisms of action of different classes of chemicals. In the study presented here, the predictivity for Salmonella mutagenesis of two independent computer-based systems, one physicochemical screening test and one expert (non- computer-based) system were compared using 100 chemicals. These chemicals were among those tested in Salmonella by the US National Toxicology Program (NTP), but whose results had not yet been published. The test systems used were CASE 0 , developed by Klopman (1984), Klopman etal. (1990); TOPKAT <J , developed by Enslein (1988); the physiochemical measurement of k e (the electron rate attachment constant), which describes the potential electrophilicity of a chemical, as described by Bakale and McCreary (1987); and the identifica- tion of chemical 'structural alerts' (SA) as described by Ashby (1985) and Ashby and Tennant (1988). These prediction systems were selected because they had been developed to predict Salmonella mutagenicity and were all designed to be used with noncongeneric sets of chemicals; that is, chemicals of diverse structure. They were not limited to predicting within a specific chemical structural class. Addi- tionally, the computer-based test systems were not based on supposed mechanisms of action of the chemicals but were based solely on empirical associations between chemical structure and known biological activities. The objectives of the exercise were: to measure the per- formances of the different prediction systems as they were configured at the time of the study; to use the results of this exercise to identify the strengths and weaknesses of each system; and to bring understanding and light to the structure- activity prediction enterprise. Materials and methods Study design The NTP database used for this exercise contained a series of chemicals that had been tested for mutagenicity in Salmonella, but for which the results had not been published by the NTP (although a number of the chemicals had been tested and the results published by other laboratories). Unequivocal (positive; weak positive; or negative) results were available for 292 organic chemicals containing a denned structure and no metal atoms. These chemicals were used as the master list, from which 100 chemicals (Table I) were selected using a random number list to create the test set. Inadvertently, 1,2-epoxydode- cane was included among these chemicals although it had been previously published (Canter et al., 1986). This prior publication did not compromise the SAR predictions (see below). The structures of the chemicals were drawn and sent for computational or SA analyses without any other information as to the chemicals' identity or biological or chemical properties. The calculations of kg required samples of the chemicals. Of the 100 randomly selected chemicals 88 were sent coded, and the molecular weights, needed for calculations of the k? values, were provided. The participants were not informed of the identity of the chemicals or the results of the Salmonella tests until all predictions were submitted. However, most of the chemicals were identifiable from their structures. Descriptions of the test systems CASE?'. CASE 0 (Computer Automated Structure Evaluator) is a knowledge- based artificial intelligence program designed for the specific purpose of organizing toxicological data obtained from the evaluation of diverse chem- icals. It is based on the premise that a relationship exists between the structure of a chemical and its biological properties. CASE identifies and records molecular substructures that have a high statistical association with observed toxic activity or inactivity. The learning set of chemicals comprises both active and inactive molecules of diverse structures. Substructures found to be linked to activity are called biophores, while structures associated with lack of activity are called biophobes. New, untested molecules can be submitted to the program and a prediction of the potential activity of the new molecules can be obtained based on the presence or absence of these biophores and biophobes (Klopman, 1984; Klopman and Rosenkranz, 1984; Klopman, et al, 1990; Rosenkranz and Klopman, 1990a). In this exercise the CASE program predicted the activity of the test set of molecules based on two learning sets. The first learning set was a compilation of 820 chemicals tested for mutagenicity by the NTP (CASE/n). A subset of this learning set and the conclusions obtained from its study were described by Rosenkranz and Klopman (1990a). The second set consisted of 808 chemicals evaluated by the US-EPA Gene-Tox Program (Klopman et al., 1990) (CASE/e). There is some overlap between the two databases, and also discrepancies between the results of some chemicals. The two learning sets were not combined because the criteria used for establishing the databases were somewhat different. 471

Upload: others

Post on 21-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Prediction of Salmonella mutagenicity€¦ · two computer-based systems (CASE0 and TOPKAT®), the measurement of a physiochemical parameter (ke) and the use of structural alerts

Mutagenesis vol.11 no.5 pp.471^(84, 19%

Prediction of Salmonella mutagenicity

E-Zeiger17, J.Ashby2, G.Bakale3, K.Enslein4, G.Klopman5

and HiJ.Rosenkranz6

'Environmental Toxicology Program, N1EHS, Research Triangle Park,NC, USA, 2Zeneca Central Toxicology Laboratory, Macclesfield,Cheshire, UK, 'Department of Radiology, Case Western Reserve University,Cleveland, OH, 4Health Designs, Inc., Rochester, NY, 'Department ofChemistry, Case Western Reserve University, Cleveland, OH and'Department of Environmental and Occupational Health, University ofPittsburgh, Pittsburgh, PA, USA7To whom correspondence should be addressed at: WC-05, NIEHS,PO Box 12233, Research Triangle Park, NC 27709, USA

The ability of a number of prediction systems wasexamined to determine how well they could predictSalmonella mutagenicity. The prediction systems includedtwo computer-based systems (CASE0 and TOPKAT®), themeasurement of a physiochemical parameter (ke) and theuse of structural alerts by an expert chemist. The computer-based systems operators and the chemist were suppliedwith the structures of 100 chemicals that had been testedfor mutagenicity in the Salmonella test,* the actual chemicalswere needed for the physiochemical measurement None ofthe participants was provided with the chemical names orSalmonella test results prior to submitting their predic-tions. The three systems that predicted the mutagenicityfrom the structure of the chemicals produced equivalentresults (71-76% concordance with the Salmonella results);the physiochemical system produced a lower (60-61%)concordance.

IntroductionDuring the past few years there has been increasing interestin the use of structure-activity relationships (SARs), and/or physicochemical properties, for predicting the biologicalactivity of chemicals. The reasons for the interest in thesesystems include decreased cost and time per chemical ascompared with animal or cell systems for identifying toxico-logical effects of chemicals; the reduction in the use of animalsfor toxicological testing; and the acquisition of informationthat may aid in the elucidation of the mechanisms of actionof different classes of chemicals.

In the study presented here, the predictivity for Salmonellamutagenesis of two independent computer-based systems,one physicochemical screening test and one expert (non-computer-based) system were compared using 100 chemicals.These chemicals were among those tested in Salmonella bythe US National Toxicology Program (NTP), but whose resultshad not yet been published. The test systems used wereCASE0, developed by Klopman (1984), Klopman etal. (1990);TOPKAT<J, developed by Enslein (1988); the physiochemicalmeasurement of ke (the electron rate attachment constant),which describes the potential electrophilicity of a chemical, asdescribed by Bakale and McCreary (1987); and the identifica-tion of chemical 'structural alerts' (SA) as described by Ashby(1985) and Ashby and Tennant (1988).

These prediction systems were selected because they hadbeen developed to predict Salmonella mutagenicity and wereall designed to be used with noncongeneric sets of chemicals;that is, chemicals of diverse structure. They were not limitedto predicting within a specific chemical structural class. Addi-tionally, the computer-based test systems were not based onsupposed mechanisms of action of the chemicals but were basedsolely on empirical associations between chemical structure andknown biological activities.

The objectives of the exercise were: to measure the per-formances of the different prediction systems as they wereconfigured at the time of the study; to use the results of thisexercise to identify the strengths and weaknesses of eachsystem; and to bring understanding and light to the structure-activity prediction enterprise.

Materials and methods

Study designThe NTP database used for this exercise contained a series of chemicals thathad been tested for mutagenicity in Salmonella, but for which the results hadnot been published by the NTP (although a number of the chemicals had beentested and the results published by other laboratories). Unequivocal (positive;weak positive; or negative) results were available for 292 organic chemicalscontaining a denned structure and no metal atoms. These chemicals wereused as the master list, from which 100 chemicals (Table I) were selectedusing a random number list to create the test set. Inadvertently, 1,2-epoxydode-cane was included among these chemicals although it had been previouslypublished (Canter et al., 1986). This prior publication did not compromisethe SAR predictions (see below).

The structures of the chemicals were drawn and sent for computationalor SA analyses without any other information as to the chemicals' identity orbiological or chemical properties. The calculations of kg required samples ofthe chemicals. Of the 100 randomly selected chemicals 88 were sent coded,and the molecular weights, needed for calculations of the k? values, wereprovided. The participants were not informed of the identity of the chemicalsor the results of the Salmonella tests until all predictions were submitted.However, most of the chemicals were identifiable from their structures.

Descriptions of the test systemsCASE?'. CASE0 (Computer Automated Structure Evaluator) is a knowledge-based artificial intelligence program designed for the specific purpose oforganizing toxicological data obtained from the evaluation of diverse chem-icals. It is based on the premise that a relationship exists between the structureof a chemical and its biological properties. CASE identifies and recordsmolecular substructures that have a high statistical association with observedtoxic activity or inactivity. The learning set of chemicals comprises bothactive and inactive molecules of diverse structures. Substructures found to belinked to activity are called biophores, while structures associated with lackof activity are called biophobes. New, untested molecules can be submittedto the program and a prediction of the potential activity of the new moleculescan be obtained based on the presence or absence of these biophores andbiophobes (Klopman, 1984; Klopman and Rosenkranz, 1984; Klopman, et al,1990; Rosenkranz and Klopman, 1990a).

In this exercise the CASE program predicted the activity of the test set ofmolecules based on two learning sets. The first learning set was a compilationof 820 chemicals tested for mutagenicity by the NTP (CASE/n). A subset ofthis learning set and the conclusions obtained from its study were describedby Rosenkranz and Klopman (1990a). The second set consisted of 808chemicals evaluated by the US-EPA Gene-Tox Program (Klopman et al.,1990) (CASE/e). There is some overlap between the two databases, and alsodiscrepancies between the results of some chemicals. The two learning setswere not combined because the criteria used for establishing the databaseswere somewhat different.

471

Page 2: Prediction of Salmonella mutagenicity€¦ · two computer-based systems (CASE0 and TOPKAT®), the measurement of a physiochemical parameter (ke) and the use of structural alerts

E.Zeiger el al.

472

Table I. Salmonella test results and predictions

No. Chomlcol Purity

1 .• 6

4-Ac*y1 pyrkfna 1122-54-8

«fH,H O - M = C H - C - » - C H ,

CH,

9 6 . •

1648-75-93 .

9 1 *

p-Amtnoazobenzsna erx»-34 .

2-Antinobanzlmldazolo 034-32-75 .

99 +

t AMMHHmmwimnny

• • O C H ,- 9 8

2-Arntno-6-m»lrK>xYt>an*o<hl«zo4« 1747-80-0

HO

2-An*>o-4-(m»thvt»ijBoovfl phenol 88-30-8

2-<4-Ar*io|]hanvO-6-n»thvt>oraottitazolt 02-36-4

HjC

P-Azoxvantoot*

0A

—^ \

1562-04-3

Bwuv<vtoM4B

CI-CH,-Br . 6 8

74-97-6

69 +

p-BromotohMn* 108-36-7

18. HO69 +

14.

O H

1M4-33-0

86 +

B^utyrotecton*I f .

8 6

Ccp>y4y4 ohlortte 111-84-8

Cnbn 133-06-2

99 +

104-66-1

SAL 8.A. AohbyS.A.+

TOPKAT

confld.

mod

hi

ind

mod/hi

4

mod

lo/mod

lo/mod

Ind

n/a

hi

hi

mod/htl

hi

CASE/N

eonfld.

CASE/E

oonfld.

(m)

2.S

1.5

3.3

0 .3

3 . *

oq'm

3.9

4 . 7

3 .7

0 .1

4 .6

0.3

0 .1

3 .8

4 . 2

3.1

Page 3: Prediction of Salmonella mutagenicity€¦ · two computer-based systems (CASE0 and TOPKAT®), the measurement of a physiochemical parameter (ke) and the use of structural alerts

Prediction of Salmonella mutagenldty

No.

1 8 .

1 4 .

16.

2 1 .

2 2 .

2 1 .

2 4 .

2B.

2 6 .

2 7 .

2 9 .

1 0 .

1 1 .

1 2 .

1 1 .

Chamlcal

o-CKorobonzotxffiuoftdoCl

H,CO—^ i—OCH,

ClCrrioronob

1-CMoro-1-nllropn>t>anaH»°v

«$?••I -H ,PO.

OCH,Codeine phoaotat*

H,N

Cvdotaxvi «nli ran Bate

83-19-4

2675-77-«

600-294

S2-20-6

7779-19-0

( y - N H-80, —^V-CH,

N-Cyctoh*xyM-m*(hy<bsiu*nasutfonainid*80-304

Cl

ClDfchtoran tZ6-Dtetibro-4-nltro«rflne)

Cl

Cl^^^k jr~~NH,

3.4-DlchJo<o*BEne

90-30-8

95-76-1C H . - C I ,

Dtchtoromethma (MeOiytane dkMorida) 75-09-2

NO.

H,C-C-CI,

1.1-Dfchk>ro-1-nftros4hans 594-72-0

HjC-CHt-O-CHt-CMj-O-CH.-CHjOH ^^

EBivtona atvool moncMttivl attwr 111-000

X—7 CM,—CH,

N.N~ntottiYi ^ -̂to^LH l̂lfcttt ( u t t i^

-c ^CH,

2^-Cttnettrytoutejra

CH,H ^ - C H - C H - C H ,

CH,

2.3-0 !met}ivQ>ut»n»

CH,

HjC

1.6-Otm«Ihytn«pWh*)en«

CH,

NO,

13442-3

75-63-2

7 M M

575-43-9

KM1-0

Purity

99 +

9 1 . 7 '

n. . .

n.a.

n.a.

99 +

9 8 '

9 8 *

n.a.

• 49

- 9 9

n.a.

99 +

99 +

99 +

9 7 '

SAL

-

+

-

-

+

-

*

-

-

-

+

8.A.

-

*

-

+

-

-

-

AshbyS.A.+

-

*

+

-

+

*

-

-

-

*

TOPKAT

contld.

hi

hi

n/a

n/a

hi

mod/hi

hi

Ind

hi

n/a

hi

hi

n/a

n/a

Ind

mod

CA8E/N

contld.

-

-

. . .

-

m

*.

*

-

-

-

***

*

CA8E/E

confld.

-

-

+

+

+

< * >

-

**•

• *•

-

-

* • •

k .

l . S

4 . 1

2 . 1

n.a.

n.a.

4 . 2

B.7

4 . 9

n.t.

1.4

0 .1

1.7

n.t.

n.t.

2.1

4.4

473

Page 4: Prediction of Salmonella mutagenicity€¦ · two computer-based systems (CASE0 and TOPKAT®), the measurement of a physiochemical parameter (ke) and the use of structural alerts

E.Zeiger et al.

Table I. continued.

M . .

1 4 .

3 5 .

I f .

1 7 .

1 0 .

1 0 .

4 0 .

4 1 .

4 2 .

4 1 .

4 4 .

4 8 .

4 8 .

4 7 .

4 8 .

4 9 .

5 0 .

Chomltwl

1,3-DknXhvM-nttrobanzafW

1,4-DtmathY(-2-nltroc«fuon«

yV-CH.ifl^CHJ^-CH,^ - ' CH,

DtrnerhvtoctedecvfeenzylamrnaituTTiCI

H J C - O - C - C H J - C H J - C - O - C H ,

Dfenrtfrvtwcdnat*

O I * C C H !

NO,

NO,

O,M—y V-OH

2 • 4-Dvutropo#O0i

—/~V-MH-/"^

0

0

O

H ;̂'—C H-<CHJo-CH,

1.2-EpoxydodecaiM

HjC-CH.-0-C-CH,

Ettvtaootri*H,C-CH,-CI

EVw oreonos

H,C — 4 ~ \ - * O , -M H-CH, -CH

H^I-C-«-CH,-CH,

N-EOiyt-N-nlMMuraa

n-Hm*anaf

OH

2-Hvdnxvbsnzainid*

»-87-2

80-58-7

122-19-0

106-65-0

87-66-5

606-71-0

51-28-0

h-fCH.h-C

101-67-7

476-86-4

2658-19-8

141-76-8

75-00-3

80-30-7

7S9-73-9

111-71-7

H,

41122-70-7

86-45-2

Purity

* • '

n.o.

93.9

93.1

99 +

n.o.

9 9

H;SS

69 +

• 96

86.7*

69 +

69 +

60

n.o.

9 8 '

SAL

+

-

-

-

+

-

-

-

-

+

-

-

S.A.

+

-

-

-

*

•f

-

-

*

-

*

-

AshbyS.A.+

-

-

-

+

-

*

-

+

-

-

TOPKAT

confld.

Ind

mod

n / i

mod/hi

hi

hi

lo

mod

n/o

mod/til

mod/f

mod/T

mod/hi

(>)

hi

Ind

mod

mod/hi

CA8E/N

confld.

+

-

-

-

*

+

-

• • *

-

* .

-

-

**

-

CASE/E

confld.

+

+

-

-

+

+

-

-

+

-

(+)

-

k0

4 . 1

4 .5

0 . 1

0.1

0.1

2 .8

2.4

0.2

0.1

0 .2

0.9

n.o.

5 . 1

n.o.

0.1

4 .8

1.1

474

Page 5: Prediction of Salmonella mutagenicity€¦ · two computer-based systems (CASE0 and TOPKAT®), the measurement of a physiochemical parameter (ke) and the use of structural alerts

Prediction of Salmonella mutagenidty

No. Ch«mle»lPurity

SAL I 8.A. A«tiby|TOI«AT|cA*E/N

eonfW.I mnfld.

CA8E/E| M u t i

conlld.| k.

8 1 .

S 3 .

8 4 .

8 8 .

8 7 .

» • .

• 0.

8 1 .

• 2.

6 3 .

6 4 .

6 6 .

^ > - C - N H - O H

OH

Cl

H,C-CH,-CH-CH,

OH

H.C-CH-OH

? -CH

CH,

MedroxvpooaiOTna aoatat*

5-M«thoxvp*ora)an

f̂ T TT 1OCH,

B-MothoxvDsoreten

I4»ttiy< cvdopantane

O H HO

CI-O-CH.-Q-CI

CH,

H J C - C H . - O - C - C H ,

CH,

2-M«hyt-2-aOitnyprof>uw (t-8utytt

H.C-O-CHO

MeOiyl formate

H , C - C - C H , - C H - C H ,

M»9ivt booutyi kstone

O,N CH,

2-M*thyt-3-n)troni£]ne

C H ,

O,N

\L ff NHg

<95-ie-i

130-26-7

78-7B-4

97-83-0

71-58-9

484-20-0

298-81-7

9W7-7

07-23-4

iyl atttor)637-82-3

107-31-3

108-10-1

803-83-8

99-52-5

119-32-4

S78-46-1

= . 3

99 +

. . •

9 9 *

99 +

• 98

n.i.

99 +

99 +

- 9 9

99 +

.9 +

n.«.

n.a.

n.a.

-

-

+

*

+

-

+

*

*

*

-

-

+

+

-

-

+

-

*

-

-

-

-

-

+

*

<•)

hi

mod

n/a

n/a

hi

mod

mod

n/a

mod

Ind

lo/mod

hi

mod

mod

mod

ntod

• *•

-

• *•

**#

• •*

• **

• **

*

• •*

+

+

• •*

+

-

• * •

* **

+

+

-

* • *

• •*

+

* • *

1.7

3 . 7

n.t.

0 .8

4 . 9

8 . 9

4 . 2

0 .2

3.S

0.7

n.t.

0 .1

s.o

8.3

8 .0

8 .2

475

Page 6: Prediction of Salmonella mutagenicity€¦ · two computer-based systems (CASE0 and TOPKAT®), the measurement of a physiochemical parameter (ke) and the use of structural alerts

E.Zeiger et aL

Table I. continued.

Mo.

0 7 .

0 0 .

0 0 .

71 .

7 2 .

7 1 .

7 4 .

7 5 .

7*.

7 7 .

7 0 .

7 0 .

oe.

• i .

Chomlcal

N-M»9iyM-nttroan*e>

CH,

4 V-COOH

O.H

2-M»(hvt-8-n)tro6enzotc acid

CH,

HO,

2-Mo»i¥M-n)lPob»niolc add

O,*—\f—COOH

3-Mat<vM-nl(rob»nzok: add

rCC >-COOH

9 NMVI^n c^lHTwOVfiZOK SOU

2-M«thv< ptwnanlhiww

^ y—C=CH,

(HiMhvt ««yran*

CH,-CH,-OH

9CH,

"̂̂ ° * f HO,OCH,

m-NBfoanano

HO,

NO.H O - C H , - C H - C H . - C H ,

2-NiiD-1-butanol

HO,

O&—C ^—CH.-CH.-OH

p-NMrophwMIM alcohol

10O-1S-2

1975-52-6

3113-71-1

3113-72-2

S32-60-S

2531-64-2

88-634

443-46-1

21S2S-2S-4

99-09-2

8S-74-4

119-75-5

100-27-6

•itflty

n.o.

n.o.

- . 4

98 +

ee-f

-e*

6 4

• 6 1

n.o.

SAL

+

+

+

+

-

+

*

-

-

-

• .A.

+

+

+

*

Aohky

*

+

+

+

-

*

TOPKAT

mo4Tt

mod

mod

mod

mod

lo

lo

lo/mod

hi

n/o

mod

mod

modyhl

Ind

mod

CASE/N

eonfld.

+

*••

+

*••

+

+

-

m

-

CASE/E

eonfld.

+

+

+

*

+

+

• **

• • •

• **

-

1 BofcsU

1 *o

4.6

2 . 2

2 . 2

1.2

2 .1

n . . .

n . . .

1.0

2 . 2

1.6

5 .7

4.a

1.6

t . O

4 .2

476

Page 7: Prediction of Salmonella mutagenicity€¦ · two computer-based systems (CASE0 and TOPKAT®), the measurement of a physiochemical parameter (ke) and the use of structural alerts

Prediction of Salmonella mutagenldty

Ho.

82.

03.

• 4.

• 8.

0 7 .

as.

S t .

0 0 .

0 2 .

0 3 .

8 4 .

85.

8 6 .

Chemical

/ — \

V tN-ffitrocotnxphoCne

n-Nonana

2-Octyt-3-feothiuolona

1,1-OxyDSiiicOiyHfW, Us bonzvno

I-Pmtmol0 OH

Ouercettn

0Outnatazone (Aquwnox)

59-69-2

111-84-2

28530-20-1

103-50-4

71-41-0

117-30-5

73-40-4

%Purity

n.t.

0 0 *

- 4 1

00 +

00 +

n.i.

. . .

U X •« u "30

I^JLJLJJ ^SOi-CH.-CHt-O-eO^

Scofaltnine hydrobronid* 4HjO

Br

.114-40-8

n.«.

O=C-(CH|) , -CH, 9 4 g .

« ? °H,C-<CH1 ) , -C-0-CH,-CH-CH1 -O-C—<CHJ,~C

TrfctpryBn 538-23-8

TrichlofomBtefTVi#

ci—£y-OHc.Mc.

2.3.4-TrVJitOfOPhanol

Cl

Cl Cl

Cl

Cl Cl

0

CH, -O-CH, -CH, -O-C-CH,

CH, -O-CH, -CHt -O-^ -CH,0

7673-08-8

15950-68-0

033-78-8

833-75-5

111-21-7

• 08

00 +

00 +

00 +

- 8 6

SAL

-

-

*

*

-

-

-

8.A.

-

*

-

+

-

AthbyS.A.+

-

-

*

-

-

-

TOPKAT

confld.

(•)

hin/a

n/a

n/a

mod

(•)

hi

lo/mod

Ind

Ind

hi

mod

hi

hi

hi

Ind

CASE/N

confld.

• **

*

• •*

*•*

m

*

• *•

• **

*

*•*

• •*

CASE/E

confld.

***

-

***

***

m

*

-

-

*

-

*••

• *•

T3 .0

0 . 2

1.B

1.0

0 .1

0 . 3

0 .3

0 . 1

0 . 2

0 . 1

0 .5

4 . 7

4 .6

5.2

0.1

TOPKAT: (), in learning set; n/a, unable to validate; Ind, indeterminate; confid, confidence level: low, mod(erate),high.CASE: N, NTP chemicals learning set; E, EPA chemicals learning set; (), in learning set; m, marginal; confid,confidence level: "low, "medium, ""high.Bakale: eq'm, equilibrium observed; n.s., chemical not sent; n.t., chemical could not be tested. 477

Page 8: Prediction of Salmonella mutagenicity€¦ · two computer-based systems (CASE0 and TOPKAT®), the measurement of a physiochemical parameter (ke) and the use of structural alerts

E.Zeiger et al.

TOPKA7**. The T0PKAT0 program's mutagenicity model was developedwith a database of 1083 chemicals derived mostly from the US-EPA Gene-Tox Program (Kier el al, 1986; A.Auletta, personal communication) and fromthe N IP's mutagenicity testing program. After a chemical's structure isentered into the system, a search is made for those features present in thechemical and also in the 'mutagenesis equation'. Values for those features asthey apply to the chemical in question are then calculated The value for eachfeature is multiplied by its corresponding regression coefficient. From thissum of products the probability of the chemical being a mutagen is calculatedby means of a simple exponential equation.

The next step is to determine whether the chemical is adequately 'covered'by the database from which the SAR equation was developed. First, onedetermines whether the structural features used for the estimate completelycover all aspects of the chemical's structure If they do not. the database issearched to determine whether features that were not useful for the equationare represented in the training database and cover the rest of the chemical.Only if all features of the chemical structure are adequately covered in thetraining database does one proceed further. If the chemical is adequatelycovered, the level of confidence assigned to the estimate is determined. Thisis done by searching the database from which the model was developed forchemicals with the 'important' features in the target chemical. These featuresinclude those in the estimate, as well as those which are represented inchemicals in the database, but which were not of sufficient statisticalimportance to be included in the equation (Enslein, 1988)

SA; SA + A compilation was made of chemical substructures associated withelectrophilicity and those that have been associated with nonreactivity (Ashby.1985, Ashby and Tennant, 1988). Chemical structures were assessed for actualor potential electrophilic centers according to the megastructure described byTennant and Ashby (1991). Chemicals containing corresponding sites wereclassified as SA This was the primary call. As a secondary exercise, theseclassifications were assessed for the likelihood that the Salmonella assaywould detect SA-containing agents as mutagens. This secondary expertjudgement (SA + ) was based on such considerations as pnor expenence witha wide range of structurally diverse chemicals and mechanistic inferences.This secondary call was made because of the knowledge that not allelectrophilic chemicals are mutagemc in the Salmonella lest.

kc (Bakale). This procedure is based on using excess electrons in an inertmedium to serve as nucleophilic surrogates of the biological targets ofmutagenesis and thereby provide a measure of the electrophilic potential of achemical. Attachment of an excess electron to a test chemical dissolved in aninert medium occurs at every encounter only if the solute has nothermodynamicbarrier to attaching an electron. A chemical that meets this criterion ofdiffusion-controlled attachment is regarded as yielding a positive kc response.and is predicted to be a mutagen In a typical kc measurement, excess electronsare produced in an inert solvent, cyclohexane, by a 15 ns pulse of I MeVelectrons, and the rate at which these electrons attach to the test solute ismonitored with an oscillograph in the 50-500 ns time regime. The half-lifeof electrons in a solution containing a 2 |iM concentration of an electrophileis ~ 100 ns. The half-life of electrons observed for the test solution is correctedfor attachment to impurities in the 'pure' cyclohexane (Bakale. 1989. Bakaleand McCreary, 1987, 1990, 1992a,b, submitted for publication; Ennever andBakale, 1992).

Salmonella. The Salmonella tests were performed as described by Zeigeret al. (1992). All chemicals were tested in a preincubation procedure withoutexogenous metabolic activation, and in the presence of Aroclor 1254-inducedrat and hamster liver S9. The initial testing was performed using strains TA98and TAI00 without S9 and with 30% S9 in the S9 mix. If a positive resultwas obtained, the strain/activation conditions giving that positive were repeatedand testing was terminated at that point. If no positive responses were obtained,additional strains were used under the same activation conditions If at leastfour strains (TA97. TA98. TA100 and TAI535) were negative with andwithout metabolic activation, the chemical was labeled nonmutagenic Thetest data for 68 of the chemicals are presented in Zeiger el al. (1992)

There were 45 mutagens among the chemicals, and. all but two were detectedin strains TA98 and/or TA100 The two exceptions were p-azoxyanisole (#9),which was mutagenic only in strain TA97, and tricaprylin (#91). which wasmutagemc only in strain TA1535. This is consistent with an earlier report(Zeiger. ei al., 1985) that showed that the combination of strains TA98 andTAI00 detected ~95% of the mutagens in the NTP testing program

ResultsThe chemical structures and purities, Salmonella results, andthe performances of the different prediction systems arecompiled in Table I. The performance summaries of thedifferent test systems are presented in Tables II-VI. The

478

predictive performance of each system was evaluated bycalculating its sensitivity and specificity, positive and negativepredictivity, and the concordance obtained between the pre-dicted and actual activity in Salmonella. Sensitivity is ameasure of the proportion of mutagens in the populationthat were correctly classified, and specificity measures theproportion of nonmutagens correctly classified. Positive andnegative predictivity refer to the proportions of chemicals thatwere correctly predicted to be mutagens or nonmutagens.Concordance was defined as the proportion of correct predic-tions (positive and negative). For various reasons, several ofthe systems were unable to handle or make predictions for allthe chemicals. In these cases, a direct comparison amongconcordances may not be valid because chance plays a largerrole when fewer chemicals are evaluated. To allow for a fairercomparison, the %2 values for each of the predictions werealso calculated (Bailey, 1971; Klopman and Rosenkranz, 1991)(Table VI). If x2 < 3.84, the probability that the observedconcordance in due to chance is >5%. The %2 value must be>6.63 for chance to be < 1 %.

The prediction systems used here represent two differenttypes: those that are driven solely by the actual structure ofthe chemical and one (ke) that is driven by a measuredphysicochemical property of the chemical. Because thesetwo approaches are so different, their results were analyzedseparately. There was complete agreement between theSalmonella test results and all predictions for 41 chemicals.Ten chemicals were negative in Salmonella but were predicted

Table I I . SA and SA+ results

SA result

SAL result + 36 919 36

tt 55 45

Table I I I . kt result

+

SAL result + 23 16IS 27

tt 41 43

"Chemicals classified as 'equivalent' C)

Table IV. TOPKAT result

+

SAL result + 208

It 28

Tabk V. CASE/n and CASE/e results'

CASE/n4- -

SAL result + 28 148 43

tt 36 57

tt

4555

100

tt

394584

included

tt

425193

SA +

32II43

+

231841

-

82533

result

134457

a

163046

CASE/e

281846

83543

ft

4555

100

tt

394887

tt

283361

tt

365389

"Chemicals classified as marginal (m) are not included.

Page 9: Prediction of Salmonella mutagenicity€¦ · two computer-based systems (CASE0 and TOPKAT®), the measurement of a physiochemical parameter (ke) and the use of structural alerts

Prediction of Salmonella mutagenicity

as positive or indeterminate by all of the systems; fourchemicals were mutagenic in Salmonella but were predictedto be negative by all of the systems. These are discussedbelow. The remainder of the chemicals showed a lesser degreeof agreement among the prediction systems, or were not testedin all systems.

In general, the computational and structural alert systemsperformed similarly, leading to 71-74% concordance (correctpredictions), although there were differences in their sensi-tivities, specificities and positive predictivities (Table VI). Thephysicochemical (Jtc) system performed less well, with aconcordance of 60-61%, and lower sensitivities and positivepredictivities (Table VI). Specific similarities and differencesamong the different prediction systems are addressed below.

Predictivity of SA by themselves and in combination with theintervention by expert judgement (SA+)The identification of a potential Salmonella mutagen basedsolely on the presence or absence in the molecule of one ormore 'structural alerts' identified as being associated with theformation of an electrophilic moiety that could react withcellular DNA (Ashby, 1985; Ashby and Tennant, 1988; Tennantand Ashby, 1991) was effective in correctly identifying 36/45(80%) of the mutagens and 36/55 (65%) of the nonmutagens(Table II).

This system is unlike the TOPKAT and CASE systems,where the chemicals in the learning sets are known and canbe eliminated from the test set when overlap occurs. Thestructural alerts were identified by Ashby (1985) based on theearlier published studies of Miller and Miller (1977), byreviewing the published Salmonella mutagenicity literature,and by the application of expert judgement and mechanisminference. Hence, the learning set in this case consists ofprevious knowledge and possible exposure to published resultsfor some of the chemicals. The precise amount of overlap thatexisted between this learning set and the chemicals predictedcannot be determined, because many of the chemicals hadbeen tested by other laboratories and the results published.However, it is believed that only in the case of captan (#16)did prior knowledge influence the call.

Of the four nitrobenzoic acid structural homologs tested,two mutagens (#69, 2-methyl-6-nitrobenzoic acid; and #71, 5-methyl-2-nitrobenzoic acid) were described as negative forstructural alerts and called nonmutagenic by SA+. The distin-guishing characteristic of these two chemicals is the presenceof a nitro group ortho to a carboxyl group, where the latter is

Table V I .

SASA +

*e

kr"TOPKATCASE/nc

CASE/ec

Comparison

Sens.

0.800.710.590.590.710.670.78

of performances"

Spec.

0.650.800.600.630.760.840.66

( + )Pre

0.650.740.560.560.710.780.61

(-)Pre

0.800.770.630.650.760.750.81

Cone.

0.720.760.600.610.740.760.71

X2

20.6626.38

3.013.98

13.5825.2316.48

•Sens., sensitivity; Spec., specificity; ( + )Pre, positive predictivity; (-)Pre,negative predictivity; Cone., concordance; %2 measures the probability thatthe results are not due to chance (the larger the %2 value, the higher theprobability of the method).LJI vt/uL/iiiiv yji 14ii> IIL^UII^J/.

?' considered as negative.c 'm' not included.

believed to critically inhibit enzymic N-oxidation via internalinteractions with the carboxylic -OH group.

There were 12 chemicals where a strict application of thestructural alerts led to a prediction of mutagenicity, butwhere this call was overridden because of other structuralconsiderations in the molecule (the 'SA+' call). The SA+override was correct for 8/12 chemicals to which it wasapplied. This override lowered the sensitivity of the predictionto 32/45 (71%) but raised the specificity to 36/55 (65%).Captan (#16) was recognized as a mutagen based on publishedstudies, but was called nonmutagenic because it did not containa structural alert.

Performance of the ke systemA total of 87 chemicals were tested in this system. Sevenchemicals were not sent for testing because of inadequatesupply, or other reasons, and six diat were sent could not betested for technical reasons, such as solubility or volatility. Amore detailed description of these chemicals is provided inBakale and McCreary (submitted for publication). This systemis not directly comparable with the other systems described herebecause the predictions are based solely on an experimentallymeasured electron attachment rate constant (^), calculatedfrom the solution of the test chemical in cyclohexane. There-fore, this is the only system used here that is not driven bythe chemical's structure per se but by its physicochemicalproperties. It is also the only system that does not require alearning set of chemicals.

Because the predictions in this system are based solelyon the calculated Jfc<. value, the correct and incorrect predic-tions can be examined as a function of this value. Chemicalswere predicted to be positive (mutagenic) when the /^ valuewas >3.lX1012 M"1 s"1; negative when the value was<2.9X 10l2M~' s"1; and equivocal when the value was betweenthese limits. As can be expected, because of its differenceswith the structure-based systems, this system had more uniqueconclusions (16) than CASE/e and/or CASE/n (nine), SA and/or SA+ (six) or TOPKAT (two). Of the unique conclusionswith kf, three were predictive and 13 were not predictive.

The ke system correctly predicted 23/39 (59%) of themutagens and 27/45 (60%) of the nonmutagens when equivocalresponses were not considered, and 30/48 (62.5%) of thenonmutagens when equivocal responses were considered nega-tive (Tables III and VI). The 59% sensitivity found here agreeswith the 59% sensitivity found with a different group of 171NTP-tested chemicals screened with the kc test (Ennever andBakale, 1992). In that same study, the specificity was 53%.As can be seen in Table VQ, with the exception of chemicalswith a kc < 2.0 (nonmutagenic), there was no clear relationshipbetween kf and Salmonella mutagenicity.

Performance of the TOPKAT' systemPredictions were made for 73 chemicals; however, 12 of thesewere part of the TOPKAT learning set and were not includedin the computations. Estimates could not be calculated forthree chemicals because of parameterization problems, andcould not be validated adequately for 13 further chemicals; 11chemicals fell in the 'indeterminate' (no decision) range. Thus,only 61 chemicals had 'usable' predictions for the purposesof this exercise.

TOPKAT correctly predicted 20/28 (71%) of the mutagensand 25/33 (76%) of the nonmutagens (Table TV). Among thepredictions assigned a high confidence level, 15/17 (88%)were correct. Similarly, the predictions assigned a low, or low-

479

Page 10: Prediction of Salmonella mutagenicity€¦ · two computer-based systems (CASE0 and TOPKAT®), the measurement of a physiochemical parameter (ke) and the use of structural alerts

E.Zeiger et aL

to-moderate confidence level were correct 7/8 times (88%).The least accurate predictions came among those assignedconfidence levels of moderate-to-high (56%) and moderate(64%) (Table VIII). As with other structure-driven predictionsystems, the false-positive predictions tended to concentrateamong the aromatic amines and aromatic nitro-containingchemicals.

Performance of the CASE® systemsThe CASE® predictions were made using two separate learningsets. The NTP learning set (CASE/n) was made up of testresults that had previously been published by the NTP and,therefore, did not contain any of the chemicals in the test set.The other learning set (CASE/e) was made up of chemicalsin the EPA Gene-Tox compilation and, by chance, included10 chemicals that were in the test set. These chemicals werenot included in the computations, although their predictedmutagenicities did not always match the responses in thelearning set.

The CASE/n learning set led to a correct prediction of28/42 (67%) mutagens and 43/51 (84%) nonmutagens, whenthe seven 'marginal' calls were not considered (Table V). TheCASE/e learning set led to a correct prediction of 28/36(78%) mutagens and 35/53 (66%) nonmutagens when the one'marginal' call and the 10 chemicals in the learning set werenot considered. The proportions of correct calls were relatedto the confidence levels assigned to each prediction. Theserelationships are illustrated in Table IX. There were no signi-

Table VII. Relationship of jfcj values to SAL responses"

SAL+ SAL- Call

0-11-22-33 13 2-44-55-6

8160887

20432793

negativenegativenegativeequivalentpositivepositivepositive

•Data from Table 1.

Table VIII. Correspondence of TOPKAT and SAL results as they relate tothe confidence levels of their predictions

Confidence" SAL+ SAL- Overall

HighMod/highModLow/modLow

4/40/3

13/211/12/3

11/135/65/74/4

-

15/17 (0 88)5/9 (0.56)

18/28 (0.64)5/5 (1.00)2/3 (0.67)

"Assigned confidence levels, ind' not included Data from Table I.

ficant differences in the proportions of correct predictions atthe different confidence levels.

A clear example of how the particular learning set can affectthe effectiveness of a prediction system can be seen in acomparison of the CASE/e and CASE/n predictions (Table V).CASE/n had a concordance of 0.76, as compared with 0.71for CASE/e. However, CASE/n predicted 39% of the chemicalsto be nonmutagenic whereas CASE/e predicted 52%. A totalof 69% (34/49) of the CASE/n and CASE/e predictions agreedwhen the marginal and training set chemicals were eliminatedfrom the test set.

The biggest disparity between the CASE/n and CASE/epredictions can be seen with the mutagens 5- and 6-methoxy-psoralen (#56 and 57). Both were predicted by CASE/n tobe nonmutagenic, with a high level of confidence. However,CASE/e correctly predicted both to be mutagenic: 5-methoxy-psoralen with a high degree of confidence and 6-methoxypsor-alen with a moderate degree of confidence. Other examples ofdisparities can be seen with 2-aminobenzimidazole (#4), 2-aminobenzothiazole (#5) and 2-amino-6-methoxybenzothia-zole (#6). CASE/n predicted all three as marginal or non-mutagenic, with low levels of confidence, whereas CASE/epredicted them to be mutagens, with high levels of confidence.Although the CASE/n predictions for 2-aminobenzimidazoleand 2-aminobenzothiazole were correct, low levels of confi-dence are associated with them because the chemical structuresbeing predicted may not have been adequately represented inthe learning set. These and other differences between theCASE/e and CASE/n predictions reflect differences in thecomposition of the two learning sets used, and dramaticallyillustrate the importance of the selection of chemicals to makeup the learning sets.

Detailed evaluations of selected chemicals, and comparisonsamong prediction systemsCaprylyl chloride (#15) was mutagenic in strain TA100 in thepresence of 5% induced hamster S9, giving an ~2-fold increaseover background at 666 fig/plate. Weak positive responseswere obtained with 10 and 30% induced hamster S9 and 5%induced rat S9. Other levels of rat S9 produced equivocalresponses, and it was nonmutagenic without S9 (Zeiger et aL,1992). Caprylyl chloride was predicted to be mutagenic bytwo tests: CASE/e, with a moderate degree of confidence, andthe ke test (kc = 3.5). The structurally similar chemical, butyrylchloride, although not included in this study, was mutagenicin the same Salmonella strains and to the same extent ascaprylyl chloride (Zeiger et aL, 1992).

Dichloran (#24) was mutagenic in strain TA98 with andwithout metabolic activation, and equivocal in TA100 with S9(Zeiger et al., 1992). The chemical contains both a primaryamino and a nitro group on a single benzene ring; bothgroups have been traditionally associated with mutagenicity

Table IX. Correspondence of CASE and SAL

Confidence CASE/n

SAL+

* * * 19/27• • 6/8* 3/4

results as

SAL-

26/316/6

11/12

they relate to the confidence

Overall

45/58 (0 78)12/14 (0 88)14/21 (0.67)

levels of their predictions

CASE/e

SAL +

22/286/66/9

SAL-

23/345/68/14

Overall

45/62 (0.73)11/12 (0.92)14/23 (0.61)

Data from Table 1 Cm' not included). Confidence, assigned confidence levels. ***high. **medium, *low

480

Page 11: Prediction of Salmonella mutagenicity€¦ · two computer-based systems (CASE0 and TOPKAT®), the measurement of a physiochemical parameter (ke) and the use of structural alerts

Prediction of Salmonella mutagenicity

Table X. Chemicals negative in Salmonella and predicted to be mutagenicby most, or all, systems

Chemical Exceptions^1

4

525404352667880

100

2-aminobenzimidazole

2-aminobenzothiazole3,4-dichloroaniline2,4-dinitrophenol1,2-epoxydodecaneiodochlorohydroxyquinoline5-methyl-2-nitroanilineo-nitroaniline2-nitrodiphenylaminevat brown 1

TOPKAT =CASE/n =CASE/n =CASE/n =K= •-

CASE/n =

SA+ = ' - ' ;

'ind';'m', kf = '-''-'

'm'

' - '

kt not tested

*ind: indeterminate; m: marginal.

in Salmonella. It was not unexpected that the structure-basedsystems correctly predicted it to be mutagenic, with a highdegree of confidence. It also yielded a high &,. (ke = 5.7),probably as a result of the presence of these two, potentiallyactivating, substituent groups.

1,2-Epoxydodecane (#43) is a member of the class ofaliphatic epoxides, which are generally considered to bechemically and biologically active, and mutagenic in the base-pair substitution strains TA100 and TA1535. It was predictedto be mutagenic by all SAR systems. An earlier study (Canteret al, 1986) showed that the mutagenicity of aliphatic epoxidesdiminishes with increasing chain length. In that study, epoxideswith chain lengths of 5=8 (1,2-epoxydecane) were not muta-genic. It is obvious that the SAR and SA predictions weredriven by the epoxide ring, and the systems did not havesufficient information about the side chain length to modulatethe positive call. This chemical yielded a low kc value (0.2)and was predicted to be nonmutagenic.

5-Methoxypsoralen (#56) and 8-methoxypsoralen (#57)were both mutagenic in Salmonella strain TA100 with S9under conditions that would not allow photoactivation. The SAsystem recognized the furan moieties as potential electrophilicsites, but the prediction of mutagenicity was overruled bySA+ because the carcinogens benzofuran and furan are bothnonmutagens in Salmonella (Haworth et al., 1983; Mortelmanset al., 1986). TOPKAT predicted both to be nonmutagenicwith a moderate level of confidence. CASE/n predicted bothto be negative with a high degree of confidence, based onthe absence of any biophore in the NTP database previouslyidentified as associated with mutagenicity. In contrast, CASE/e predicted both chemicals to be mutagenic based on thepresence of the biophore (O-CH=) which is associated in theEPA Gene-Tox learning database with a 79% probability ofmutagenicity in Salmonella. The ke system recognized thepotential electrophilicity of the two chemicals (kc = 5.9and 4.2).

The influence of chemical purity on predictivityThe chemical purities are included in Table I. Purity informa-tion was available for 76 chemicals; of these, 46 (61%) were5=99% pure. Mutagenic responses were seen with 11 of these46 (24%). Among the chemicals with lower purities, 50%(4/8) of the chemicals with purities of 98.0-98.9% weremutagenic, as were 69% (11/16) of the chemicals with puritiesof 93.0-97%, as were 33% (2/6) of the chemicals with puritiesbelow 93%. Despite the possible presence of confoundingimpurities, the majority of positive responses were consistentwith the chemical structures and presumed properties.

Nitroaromatic chemicalsIn a previous survey of the NTP database (Zeiger et al, 1985),9.1 % of the chemicals tested in Salmonella contained at leastone nitro (-NO2) group on an aromatic ring. Of these, 76%were judged to be mutagenic, as compared with 36% mutagensin the overall NTP-tested chemical population. In the presentset of 100 chemicals, there were 20 containing an -NO2 groupon a phenyl ring, and one heterocyclic nitroaromatic chemical.All but five of these chemicals (76%) were mutagenic, ascompared with 45% overall.

Any computer algorithm based on the published NTPSalmonella data would assign a high-confidence prediction ofmutagenicity to any chemical carrying an aromatic -NO2 group.Such an automatic assignment for -NO2-containing chemicalswould be correct -70-80% of the time. Both CASE systemspredicted all the nitroaromatics to be positive with highconfidence levels, with few exceptions. Similarly, TOPKATmade positive predictions for 18 of the 20 chemicals, mostwith moderate-to-high confidence (two chemicals, #34 and 80,were considered indeterminate). Despite the presence of thestructurally alerting -NO2 group, which should have led to apositive prediction for all these chemicals, two of the chemicals(#69 and 71) were predicted (incorrectly) by SA+ to benegative. A challenge, with this group of chemicals, is toidentify those factors that interfere with the mutagenicityconferred by the -NO2 moiety.

The low kg responses (i.e. predicting nonmutagenicity)obtained for five of the mutagenic nitroaromatic chemicals(#39, 40, 68, 69, 71 and 75) were unexpected in view of anearlier ^ study of nitrobenzene and 47 nitrobenzene derivatives,of which only p-dinitrobenzene was found to have a k^ of <3(Bakale et al., 1977). Excluded from this earlier study wereacids and phenols which are the subclasses to which four ofthe six negative nitroaromatics belong that were found to benegative in this current study.

SAL-positive versus SA, TOPKAT and CASE-NegativeThis pattern of chemical structure-based responses was seenwith four chemicals, and the mutagenic potencies of thesechemicals were relatively low. One explanation for the disparitybetween the prediction and the Salmonella response could bethe presence in the tested chemicals of low levels of mutagenicimpurities. Alternatively, these chemical structures were notwell-represented in the various SAR learning sets.

Chloroneb (#19; 93.7% pure) was predicted to be mutageniconly by CASE/e, but with a low level of confidence. Itproduced small increases in mutagenicity in TA100 and TA98(the only strains used) at doses up to 200 |ig/plate in thepresence of hamster, but not rat, liver S9 (Zeiger et al, 1992).If the chloroneb mutagenicity were due to an impurity, itwould be a potent mutagenic impurity with an unusual patternof effect, although this cannot be ruled out. Chloroneb waspredicted by its ke value to be positive. An electron-attachingimpurity at a potential concentration of 6.3% would be insuffi-cient to account for a k^ of 4.1 if the chloroneb itself did notattach electrons.

Tricaprylin (#91; 94.6% pure) was positive only in TA1535,in the presence of S9, at doses >6666 jig/plate. Unlike otherchemicals where the magnitude of response in TA1535, if alsoseen in TA100, would be considered equivocal or weak, therewas no mutagenic response evident in TA100. If an impuritywere responsible for the mutagenic response, it would beunusual for it to be mutagenic in TA1535 but not TA100. The1^ prediction agreed with the structure-based predictions.

481

Page 12: Prediction of Salmonella mutagenicity€¦ · two computer-based systems (CASE0 and TOPKAT®), the measurement of a physiochemical parameter (ke) and the use of structural alerts

E.Zeiger et al

Both triethylene glycols (#96, triethylene glycol diacetate,-96% pure; and #97, triethylene glycol, dimethyl ether, 99%pure), were mutagenic in TA100 and TA98 in the absence ofS9, and were equivocal with S9. Neither chemical was toxicat the doses tested, and the magnitudes of the responses inboth strains were equivalent. Because of the high purity of thedimethyl ether, and the similarity of the responses betweenthe two glycols, it is unlikely that the mutagenicities of thesetwo chemicals were due to impurities. The ke predictions forthese two glycols agreed with the structure-based predictions.

SAL-negative versus SA, TOPKAT and CASE-positive(Table X)Ten chemicals were not mutagenic in Salmonella but werepredicted to be mutagenic by all or most systems. A considera-tion with these chemicals is whether the particular Salmonellaprotocol used was optimum for the specific chemical, becausemost of the other members of these classes that were in thevarious learning sets are mutagenic.

Four of these chemicals are nitroaromatic; this class istypically mutagenic in Salmonella under the test conditionsused here. 5-Methyl-2-nitroaniline (#66) is the only nonmuta-genic chemical in the series of methylnitroanilines, whichincludes others not included in this study. Also, o-nitroaniline(#78) was nonmutagenic, whereas both the m- (#11) and p-nitroanilines (Haworth et al., 1983) were mutagenic. Theprediction of mutagenicity by the structure-based systems wasobviously driven by the -NO2 moiety.

A number of these chemicals also contain an aromatic -NH2

group. This structure is often associated with mutagenicity;however, the mutagenicity of this class of chemicals is usuallyevident when there are more than one amino group on a phenylring, or the single amino group is on a fused ring structure. Itdoes not appear as if the structure-driven systems made thatdistinction.

It is easy to understand the predictions that 1,2-epoxydode-cane (#43) and vat brown 1 (#100) would be mutagenic. Itwas noted by Canter et al. (1986), that the mutagenicity ofaliphatic epoxides decreased with increasing chain length. Vatbrown 1 is a fusion of 3 mol of anthraquinone, which is aSalmonella mutagen itself. Any system that recognizes theanthraquinone moiety, or reactive sites on that molecule, wouldtend to predict a positive response unless overridden by themolecular size and overall configuration.

Discussion and conclusionsThis was an effort to determine the effectiveness of differentsystems for predicting Salmonella mutagenicity. Although themajor interest in using these prediction systems is the predictionof long-term toxicological effects, such as cancer, Salmonellamutagenicity was used as an event of concern by itself, andas a surrogate for the other toxicological effects. Unfortunately,it was not possible to perform this exercise using carcino-genicity data, because there are a limited number of carcinogendatabases and a large degree of overlap among them. Manyof the carcinogenicity prediction systems have been developedand/or calibrated using these databases, and therefore it wouldnot be a good test of the systems if they were compared usingthe chemicals in the same databases. Additionally, becausethere are few 'new' carcinogens and noncarcinogens publishedevery year, and the generation of new test results is a slowprocess, it is not feasible, for the short duration, to performthe exercise on large numbers of chemicals not yet tested, or

currently under test, for carcinogenicity. Subsequent to thisexercise, Ashby and Tennant (1994) and Tennant et al. (1990)performed a prospective carcinogenicity prediction exerciseusing structural alerts and invited others to use their predictionsystems against the same chemicals. Among the additionallaboratories participating in that exercise were some that usedthe systems employed here (Bakale and McCreary, 1992b;Enslein et al, 1990; Rosenkranz and Klopman, 1990b).

The following conclusions can be drawn from the predictionsof Salmonella mutagenicity by the different systems. It maybe assumed that, because the SAR-based prediction systemsare not based on presumed mechanisms of action, their abilitiesto predict Salmonella mutagenicity would be similar to theirabilities to predict other toxicological effects, such as carcino-genicity or teratogenicity. Additionally, confounding factorsfor predicting mutagenicity, such as chemical purity and testreproducibility, would also be present when predicting othertoxicological effects.There is a relationship between chemical structure andSalmonella mutagenicityAlthough this has been a conclusion of other studies, itnevertheless should be stated as a conclusion of this study.The goal of SAR systems is to characterize the structure-activity relationship in such a way as to predict the biologicalactivities of chemicals with similar or related structures. Thevarious SAR systems perform these tasks using approachesthat have similarities and differences. Basically, they try toidentify, either by fragment recognition, by calculation ofparameters such as molecular size and/or atomic charges ofthe molecule, or by a mathematical algorithm based on thevarious structural and electronic components, the predilectionof a chemical to produce a specific effect. SAR systems aredeveloped and refined by using congeneric or noncongeneric'learning sets' of chemicals which, in theory, are designed toencompass all the molecular features needed to predict otherchemicals. Regardless of the system used, they all focus onstructural components of the molecule that are associated withthe effect of interest—in this case, Salmonella mutagenicity.

Among the chemicals with aromatic -NO2 and/or -NH2

groups, 16/21 (76%) and 8/15 (53%) respectively, were muta-genic. On the other hand, only 2/10 (20%) of halogenatedaromatics that did not contain either of the above groupswere mutagenic. These relationships between these chemicalstructures and mutagenicity have been addressed elsewhere(Ashby and Tennant, 1988; Rinkus and Legator, 1979; Zeiger,1987) and the SAR systems mirror those findings. Where adefinitive call could be made, TOPKAT and the two CASEsystems predicted all ring -NO2-containing chemicals to bemutagenic. The systems appeared to be more selective withregard to the aromatic -NH2 chemicals; a small fraction ofthem were predicted to be nonmutagenic by TOPKAT andCASE/n. CASE/e appeared to be the least discriminatory;it predicted all aromatic -NO2 and -NH2 chemicals to bemutagenic.

The chemical structure-directed systems were similar in theircorrect and incorrect predictionsRegardless of whether the system performs chemical fragmentrecognition (simplistic, as with SA, or highly sophisticated, aswith CASE), whether it looks at various weighting factorsderived from the chemical structure, or both (TOPKAT), theexpert and computer systems are essentially addressing, orweighting, the same substructural components of the molecules.

482

Page 13: Prediction of Salmonella mutagenicity€¦ · two computer-based systems (CASE0 and TOPKAT®), the measurement of a physiochemical parameter (ke) and the use of structural alerts

Prediction of Salmonella mutagenidty

Therefore it is not surprising that these systems tended toagree on the same chemicals, regardless of whether they werepredicted as mutagenic or nonmutagenic. The differencesamong SA, CASE and TOPKAT are heavily predicated on theparticular logic or learning sets used.

As a corollary to the above, structure-directed systems arelimited by the learning sets on which they have been 'trained.'None of these systems should be expected to correctly predictthe activity of chemicals that are total strangers to the system,in contrast to kc, for which all chemicals can be regarded asstrangers. A good example of this can be seen in the differentpredictions of the CASE/n and CASE/e systems. The onlydifference between these two systems was the particularlearning set used. Yet, although they had similar predictivities(Table IV), they tended to correctly identify different chemicals.The differences between the NTP Salmonella mutagenicitydatabase, and other databases in terms of chemical classeshave been addressed previously (Zeiger, 1987).

By their very nature, structure-directed prediction systemsare designed to predict only those chemicals that containstructures or moieties that were part of their learning systemsand cannot, without more data, make the extension to unknownstructures. This has implications for the use of SAR systemsto replace laboratory testing procedures. It is difficult toidentify new enzymes or pathways de novo, whether it be ina single cell or in a complex organism. As a rule, such newpathways are found when the effects of a new chemical areexamined, or when an organism reacts in an unexpected wayto an 'old' chemical. Until the mechanism of action of a newchemical can be deduced solely from its chemical structure,laboratory testing will be required to define the parameters ofthe new chemical classes. However, measuring the value of k^ ofa new chemical would provide an indication of the electrophiliccharacter of the chemical. However, this system only measuresthe parent molecule, whereas many of these chemicals requiremammalian metabolic activation, and their mutagenicities area function of primary or secondary metabolites.

The purity of the chemical tested can affect the accuracy ofthe predictionSAR is based solely on chemical structure and addresses thequestion of whether the chemical structure portends muta-genicity, whereas the Salmonella, and other, tests (includingthe 1^ test) determine whether the substance in the bottle ismutagenic. Chemical structure-based systems are predicatedon the assumption that the biological system responded tothe named chemical rather than a minor component (impurity)of the sample. If a chemical sample contains a low level ofimpurity, it can be presumed that the impurity would be morelikely to produce an artifactual positive mutagenic responsethan an artifactual negative response. If such an event occurred,any attempt to link the structure of the chemical to the observedbiologic response would lead to an erroneous conclusion.

In contrast, ke values are calculated from the molecularweight of the chemical and the electron half-life measurementof the substance in the bottle (Bakale and McCreary, submittedfor publication). Minor components of the sample (impurities)could affect this measurement, so that the value obtainedwould be different from that produced by the pure chemical.Thus, if the kc response of the test chemical is extremely weak,the testing of impure chemicals (i.e. chemicals containingimpurities that have a high ifcj in this system would be expectedto lead to artifactual positive responses.

The accuracy of prediction systems should be measuredagainst the reproducibility of the test that is being predictedThe wealth of Salmonella data that were generated in a numberof laboratories using standard protocols to test coded chemicalspermits the evaluation of the reproducibility of test results forindividual chemicals. The accuracy of the different SARsystems for predicting Salmonella mutagenicity approachedthe ability of the Salmonella test to predict itself, i.e. itsreproducibility. A recent analysis of the inter- and intra-laboratory reproducibility of Salmonella test results yieldeda strict positive-versus-negative concordance of 84.5% and apairwise concordance of 86.9% (Piegorsch and Zeiger, 1991).These values refer to the responses obtained when the samechemicals were tested at a later time in the same or differentlaboratories, without the laboratory knowing that it had beentested earlier. Thus, these values may be considered as anupper bound for prediction of Salmonella mutagenicity. Unlikeexperimentally based systems, however, the computer-basedsystems would be expected to yield 100% reproducibilitybetween trials if the learning sets or the computational algo-rithms are not changed.

SAR systems continue to evolve, and their accuracy andusefulness will be dependent on the informational content ofthe databases usedSAR predictions depend on the numbers and types of chemicalsused in model development and the relevance of the chemicaldescriptors to the underlying mechanism(s) of activity. Thepredictions in this exercise were for a noncongeneric set ofchemicals, that is, chemicals that represented a wide range ofstructural classes, as opposed to a congeneric set, whichcomprises chemicals within a single structural class. This runsthe risk of having relatively few representatives of certainchemical moieties from which to derive prediction criteria.Ideally, if sufficient data were available, congeneric databaseswould be used to derive SAR models that subsequently wouldbe merged into larger reference sets for further analyses. Itshould also be recognized that these SAR systems are usefulfor studying biological and biochemical mechanisms, anddefining chemical structure-activity relationships.

This exercise was performed in 1990. Since that time thecomputer-driven SAR systems used here have evolved andhave been improved. It is therefore important to consider theabove results in the context of the level of development of thesystems that performed the predictions.

ReferencesAshbyJ. (1985) Fundamental structural alerts to potential carcinogenicity or

noncarcinogenicity. Environ. Mutagen., 7, 919-921.AshbyJ. and Tennant,R.W. (1988) Chemical structure, Salmonella

mutagenicity and extent of carcinogenicity as indicators of genotoxiccarcinogenesis among 222 chemicals tested in rodents by the U.S. NCI/NTP. Mutal. /?«., 204, 17-155.

AshbyJ. and Tennant.R.W. (1994) Prediction of rodent carcinogenicity for 44chemicals: results. Mutagcnesis, 9, 7-15.

Bailey.D.E. (1971) Probability and Statistic Models for Research. Wiley &Sons, New York.

Bakale.G. (1989) Detection of mutagens and carcinogens by physiochemicaltechniques. In SaxenaJ. (ed.), Hazard Assessment of Chemicals—CurrentDevelopments. Hemisphere, Washington, DC, pp. 85—124.

Bakale.G. and McCreary.R.D. (1987) A physico-chemical screening test forchemical carcinogens: the k^ test. Carcinogenesis, 8, 253—264.

Baialc.G. and McCreary.R.D. (1990> Response of the 1^ test to NCI/N IP-screened chemicals. I. Non-gertotoxic carcinogens and genotoxicnoncarcinogens. Carcinogenesis, 11, 1811-1818.

Bakale.G. and McCreary.R.D. (1992a) Response of the k,. test to NCI/

483

Page 14: Prediction of Salmonella mutagenicity€¦ · two computer-based systems (CASE0 and TOPKAT®), the measurement of a physiochemical parameter (ke) and the use of structural alerts

E.Zeiger el al.

NTP-screened chemicals. II Genotoxic carcinogens and nongenotoxicnoncarcinogens. Carcinogcncsis, 13, 1437-1445.

Bakale.G. and McCreary,R.D. (1992b) Prospective kc screening of potentialcarcinogens being tested in rodent bioassays by the US National ToxicologyProgram. Mutagenesis, 7, 91-94.

Bakale.G., Gregg,E.C. and McCreary.R.D. (1977) Electron attachment to nitrocompounds in liquid cyclohexane. J Chem. Phys., 67, 5788-5794.

Canter,D A , Zeiger.E., Haworth.S , Lawlor.T., Mortelmans.K and Spcck.W.(1986) Comparative mutagenicity of aliphatic epoxides in Salmonella.Muiat. Res., 172, 105-138

Ennever.F.K and Bakale.G. (1992) Response of the kc test to NCl/NTP-screened chemicals III Complementary value of k^. in screening forcarcinogens. Carcmogcnesis, 13, 2059-2065.

Enslein.K (1988) An overview of structure-activity relationships as analternative for carcinogemcity, mutagenicity, dermal and eye irritation, andacute oral toxicity. Toxicol. Ind. Health, 4, 479-498.

Enslein.K., Blake.B.W. and Borgstedt.H.H. (1990) Prediction of probabilityof carcinogemcity for a set of ongoing NTP bioassays. Mutagenesis. 5,305-306.

Haworth.S., Lawlor.T., Mortelmans.K, Speck.W and Zeiger.E. (1983)Salmonella mutagenicity test results for 250 chemicals. Environ Mutagen.,5 (Suppl. I), 3-142.

Kier.L.E., Brusick.D.J., Auletta.A.E., Von Halle.E.S., Brown,M.M ,Simmon,V.F., Dunkel.V., McCannJ., Mortelmans.K., Pnval.M., Rao.T.K.and Ray.V. (1986) The Salmonella typhimuriumlmammaWan microsomalassay. A report of the U.S. Environmental Protection Agency Gene-ToxProgram. Mutat. Res., 168, 69-240

KIopman.G. (1984) Artificial intelligence approach to structure-activitystudies. Computer automated structure evaluation of biological activity oforganic molecules. J. Am. Chem. Soc., 106, 7315-7320.

KIopman.G., Fnerson.M.R. and Rosenkranz.H.S. (1990) The structural basisof the mutagenicity of chemicals in Salmonella typhimunum' the Gene-ToxData Base. Mutat. Res., 228, 1-50.

KIopman.G. and Rosenkranz.H.S. (1984) Structural requirements for themutagenicity of environmental nitroarenes. Mulat. Res., 126, 277-238.

KIopman.G. and Rosenkranz.H.S. (1991) Quantification of the predictivity ofsome short-term assays for carcinogenicity in rodents. Mulat. Res., 253,237-240.

MillerJ.A. and Miller.E.C. (1977) Ultimate chemical carcinogens as reactivemutagenic electrophiles. In Hiatt.H.H , WatsonJ D. and WinstenJ.A. (eds),Origins of Human Cancer, Book B. Cold Spring Harbor Laboratory Press,Cold Spring Harbor, NY, pp. 605-628

Mortelmans.K., Haworth.S., Lawlor.T, Speck,W., Tainer.B. and Zeiger.E.(1986) Salmonella mutagenicity tests. II. Results from the testing of 270chemicals. Environ. Mutagen , 8 (Suppl 7), 1-119.

Piegorsch.W.W. and Zeiger.E. (1991) Measuring intra-assay agreement for theAmes Salmonella assay. In Hothom.L. (ed.), Lecture Notes in MedicalInformatics, Vol. 43. Statistical Methods in Toxicology. Springer-Verlag,Heidelberg, pp. 35-41.

Rinkus.S.J and Legator.M.S. (1979) Chemical characterization of 465 knownor suspected carcinogens and their correlation with mutagenic activity inthe Salmonella typhimunum system. Cancer Res., 39. 3289-3318.

Rosenkranz.H.S. and KIopman.G (1990a) The structural basis of themutagenicity of chemicals in Salmonella typhimunum: the NationalToxicology Program Data Base. Mutat. Res., 228, 51-80.

Rosenkranz,H.S. and KIopman.G. (1990b) Prediction of the carcinogenicityin rodents of chemicals currently being tested by the US National ToxicologyProgram: structure-activity correlations. Mutagenesis, 5, 425-432.

Tennant,R.W. and AshbyJ. (1991) Classification according to chemicalstructure, mutagenicity to Salmonella and level of carcinogenicity of afurther 39 chemicals tested for carcinogenicity by the U S NationalToxicology Program. Mutat. Res.. 257, 209-227. [Also1 Corrigendum,Mutat. Res., 317, 175.J

Tennant.R.W.. SpaldingJ.. Stasiewicz.S. and AshbyJ (1990) Prediction ofthe outcome of rodent carcinogenicity bioassays currently being conductedon 44 chemicals by the National Toxicology Program. Mulagenesis, 5, 3-14.

Zeiger.E. (1987) Carcinogenicity of mutagens: predictive capability of theSalmonella mutagenesis assay for rodent carcinogenicity. Cancer Res., 47,1287-12%.

Zeiger.E.. Risko.KJ. and Margolin.B.H. (1985) Strategies to reduce the costof mutagenicity screening using the Sa//nonW/a/microsome assay. Environ.Mutagen., 7. 901-911.

Zeiger.E. Anderson,B., Haworth.S., Lawlor.T and Mortelmans.K (1992)Salmonella mutagenicity tests. V. Results from the testing of 311 chemicalsEnviron. Molec Mutagen.. 19 (Suppl 21). 2-141

Received on November 16, 1995; accepted on March 29. 1996

484