p. gramatica 1 , h. walter 2 and r. altenburger 2
DESCRIPTION
D. e. n. d. r. o. g. r. a. m. o. f. h. i. e. r. a. r. c. h. i. c. a. l. c. l. u. s. t. e. r. a. n. a. l. y. s. i. s. E. u. c. l. i. d. e. a. n. d. i. s. t. a. n. c. e. -. c. o. m. p. l. e. t. e. l. i. n. k. a. g. e. S. i. m. - PowerPoint PPT PresentationTRANSCRIPT
P. GramaticaP. Gramatica11, H. Walter, H. Walter22 and R. Altenburger and R. Altenburger22
11QSAR Research Unit - DBSF - University of Insubria - VARESE - ITALY QSAR Research Unit - DBSF - University of Insubria - VARESE - ITALY 22UFZ Centre for Environmental Research - LEIPZIG - GERMANYUFZ Centre for Environmental Research - LEIPZIG - GERMANY
e-mail: [email protected] Web: http://fisio.dipbsf.uninsubria.it/qsare-mail: [email protected] Web: http://fisio.dipbsf.uninsubria.it/qsar
INTRODUCTIONINTRODUCTION
Environmental exposure situations are often characterized by a multitude of heterogeneous chemicals with different mechanisms of action and type of effect. The EEC priority List 1 (Council Directive 76/464/EEC) consists of heterogeneous environmental chemicals with mostly unknown or unspecific modes of action, so it was used to select components for mixture experiments in the EEC PREDICT (Prediction and Assessment of the Aquatic Toxicity of Mixtures of Chemicals) project. A list of 202 compounds was studied for structural similarity to identify the most representative and dissimilar chemicals and to find an objective method to group them on the basis of their structural aspects.
STRUCTURAL DESCRIPTION OF COMPOUNDSSTRUCTURAL DESCRIPTION OF COMPOUNDSMolecular descriptors represent the way chemical information contained in the molecular structure is transformed and coded. Among the theoretical descriptors, the best known, obtained simply from the knowledge of the formula, are: molecular weight and count descriptors (1D-descriptors, i. e. counting of bonds, atoms of different kind, presence or counting of functional groups and fragments, etc.). Graph-invariant descriptors (2D-descriptors, including both topological and information indices), are obtained from the knowledge of the molecular topology. WHIM molecular descriptors [1] contain information about the whole 3D-molecular structure in terms of size, symmetry and atom distribution. All these indices are calculated from the (x,y,z)-coordinates of a three-dimensional structure of a molecule, usually from a spatial conformation of minimum energy: 37 non-directional (or global) and 66 directional WHIM descriptors are obtained. A complete set of about two hundred molecular descriptors has been obtained [2].[1] Todeschini R. and Gramatica P.; Quant.Struct.-Act.Relat. 1997, 16, 113-119; [2] Todeschini R. and Consonni V. - DRAGON - Software for the calculation of the molecular descriptors., Talete srl, Milan (Italy) 2000. Download: http://www.disat.unimib.it/chm.
REGRESSION MODELSREGRESSION MODELS
QSAR models were developed by Ordinary Least Square regression (OLS) method. The selection of the best subset variables for modelling the algal toxicity of the studied compounds was done by a Genetic Algorithm (GA-VSS) approach and all the calculations have been performed by using the leave-one-out (LOO) and leave-more-out (LMO) procedures and the scrambling of the responses for the validation of the models.
RR22 = 78 Q = 78 Q22LOOLOO = 62.1 Q = 62.1 Q22
LMOLMO = 61.7 SDEP = 0.751 SDEC = 0.573 = 61.7 SDEP = 0.751 SDEC = 0.573
CONCLUSIONSCONCLUSIONSThe chemometric analyses here applied have been demonstrated to be very useful in ranking the studied chemicals in according to their structural similarity or dissimilarity.
In the modelling of structural heterogeneous compounds with unknown mode of action, not very satisfactory QSAR models have been obtained.
The role of specific parameters, such as directional WHIMs, capable to describe particular molecular features relevant for explaining the specific mode of action, is always relevant in QSAR models for congeneric chemicals. Increasing heterogeneity increases the role of structural and topological descriptors, accounting for general molecular features, not related to specific mode of action.
This work was supported by the Environment & Climate programme for the European Commission, Contract EV4-CT96-0319 (PREDICT) and Contract EVK1-CT99-00012 (BEAM)
CHEMOMETRIC METHODSCHEMOMETRIC METHODS
Several chemometric analyses have been applied to the compounds (represented by molecular descriptors) to group the more similar ones, in accordance with a multivariate structural approach, and with the final aim to highlight the structurally most dissimilar compounds. The analyses performed are:
Hierarchical Cluster Analysis: : hierarchical clustering was performed with the aim of finding clusters of the studied compounds in high dimensional space, using molecular descriptors as variables. Different distance metrics (Euclidean, Manhattan, Pearson) and different linkages (Complete, average, single, etc.) were used and compared to find the best way to cluster these compounds.
Principal Component Analysis (PCA): this analysis was used to calculate just a few components from a large number of variables. These components allow the highlighting of the distribution of the compounds according to structure, and find the similarity between compounds assigned to the same cluster.
Kohonen Maps: : this is an additional way of mapping similar compounds by using the so-called “self-organized topological feature maps”, which are maps that preserve the topology of a multidimensional representation within a toroidal two-dimensional representation. The position of the compounds in this map shows the similarity level of the structure of the EEC List 1 compounds.
100
SimilarityDendrogram of hierarchical cluster analysis.Euclidean distance - complete linkage.Variables = first 10 structural principal components
Benzene derivatives (2) Chloroaliphatic compounds (7) DDT - PCBs (11)Organo-phosphates (12)Phen.-Triaz. (10)
PAH (15)
Chlorinated aliphatics (9)
0
PCA on all molecular descriptors for 202 EEC List 1 compounds
Cum. E.V. = 47.4%
PC 1
PC
2
1
2
3
5
6
7
8
9
10
11
13
14
15
16
17
1819
20
21
22
23
2425
26b
2728
2930
3132
32b32c
32d32e
32f
32g
32h32i
3334
3536
37
38
39
4041
4242b
42c42d42e
43
44
45
46
46b46c
4747b
47c
47d
48
49
50
5252b
52c
52d
52e
52f
5354
55
5656b
58
5960
61
62
63
63b63c
63d
63e
63f
64
64b
64c
64d
64e
64f6565b
65c
65d
66
67tr
67cs
6868c68d
68e
69
70
71
72
73
74
75
76
77
7879
80
81
8282b
83
84
85
85b
85c
86
87
88
89
90
91 9394tr
94cs
95
96
9798
99
99b
99c99d
99e
99f
99g100
101
101a
101b101c
101d101e
101f101g101h101i
101l101m101n101o101p101q101r
102
103
104105
106
107
108
109
110
111
112
113
114
115
116
117b117c
118
119
120
121
122122b122c
122d
122e
122f
123
124
125
126127
128
129
129b129c
130s
133s
134s
135s
-16
-12
-8
-4
0
4
8
12
-40 -30 -20 -10 0 10 20
CluPCec=1CluPCec=2CluPCec=3CluPCec=4CluPCec=5CluPCec=6CluPCec=7CluPCec=8CluPCec=9CluPCec=10CluPCec=11CluPCec=12CluPCec=13CluPCec=14CluPCec=15CluPCec=16
Group 17
Group 18
Group 19
Group 20
KOHONEN MAP
RO
W
1
2
3
5
6
7
8
9
10
11
13
14
15
16
17
18
19
20
21
22
23
24
25 026b
27
2829
30
31
32
032b
032c
032d
032e
032f
032g
032h
032i
33
34 35
36
37
38
39
40
4142
042b
042c
042d042e
43
44
45
46046b
046c
47047b
047c047d
48
49
50
52 052b
052c
052d
052e052f
53
54
55
56056b
58
59
60
61
62
63063b
063c
063d
063e
063f
64
064b
064c
064d
064e
064f
65 065b
065c
065d
66
067atr067azc
68 068c
068d
068e
6970
71
72
73
74
75
76
77
7879
80
81
82
082b
83
84
85 085b
085c
86
87
88
89
90
9193094atr
094cs
95
96
9798
99 099b099c099d
099e
099f 099g
100
101
101a
101b101c
101d
101e
101f
101g101h
101i101l
101m101n
101o
101p101q
101r
102
103
104105
106
107
108
109
110 111
112
113
114
115
116
117b
117c
118
119
120
121
122
122b
122c
122d
122e
122f
123
124
125
126127
128
129129b
129c
130s
133s
134s135s
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
CL=1CL=2CL=3CL=4CL=5CL=6CL=7CL=8CL=9CL=10CL=11CL=12CL=13CL=14CL=15CL=16
Cl=19
Cl=20
Cl=18
Cl=17
The chemicals selected as the structurally most dissimilar compounds are:
N. Substance Chemical Class
1 atrazine Triazine2 biphenyl Aromatic3 chloralhydrat Chlorinated aliphatics4 2,4,5-trichlorophenol Benzene derivative5 fluoranthene PAH6 lindane HCH7 naphthalene PAH8 parathion Organophosphate9 phoxime Organophosphate10 tributyltin chloride Organotin11 triphenyltin chloride Organotin
RR22 = 93.9 Q = 93.9 Q22LOOLOO = 91.8 Q = 91.8 Q22
LMOLMO = 87.5 SDEP = 0.342 SDEC = 0.296 = 87.5 SDEP = 0.342 SDEC = 0.296 RR22 = 77 Q = 77 Q22LOOLOO = 69.7 Q = 69.7 Q22
LMOLMO = 69.7 SDEP = 0.709 SDEC = 0.619 = 69.7 SDEP = 0.709 SDEC = 0.619
nO is the number of O atoms and IDE is the mean information content on the distance equality.
HETEROGENEOUS HETEROGENEOUS COMPOUNDSCOMPOUNDS
Regression model for 11 selected compounds
Log 1/EC50 = -5.14 -0.52 nO +2.12 IDE
experimental Log1/EC50
pre
dic
ted
Lo
g1
/EC
50
1
2
3
4
5
67
8
9
1011
-4
-3
-2
-1
0
1
2
-4 -3 -2 -1 0 1 2
CONGENERIC COMPOUNDS CONGENERIC COMPOUNDS (NITROBENZENES)(NITROBENZENES)
nOH is the number of OH groups, Sp is the sum of polarizabilities and Ds is the 3D-WHIM considering the global electrotopological distribution.
Model for 19 nitrobenzenes
Log1/EC50 = -7.87 -2.96 nOH +0.10 Sp +13.25 Ds
experimental Log1/EC50
pre
dic
ted
Lo
g1/E
C50
1
2
34
5
6 7
8
9
10
11
12
13 14
15
16
1718
19
-3.5
-2.5
-1.5
-0.5
0.5
1.5
2.5
-3.5 -2.5 -1.5 -0.5 0.5 1.5 2.5
HETEROGENEOUS + HETEROGENEOUS + CONGENERIC COMPOUNDSCONGENERIC COMPOUNDS
nO is the number of O atoms, IDDM is the mean information content on the distance degree magnitude, while E1e is a directional 3D-WHIM descriptor of atomic distribution weighted on the electronegativity.
Model for 19 nitrobenzenes + 11 heterogeneous compounds
Log1/EC50 = -20.27 -0.55 nO +3.87 IDDM +11.44 E1e
experimental Log1/EC50
pre
dic
ted
Lo
g1/E
C50
1
2
3
4567
8
910
1112
13
14
15
16
17
18
19
20
21
22
23
24
25
2627
28
29
30
-4
-3
-2
-1
0
1
2
-4 -3 -2 -1 0 1 2
NitrobenzenesHeterogeneous compounds
RANKING OF “EEC PRIORITY LIST 1” CHEMICALS FOR RANKING OF “EEC PRIORITY LIST 1” CHEMICALS FOR STRUCTURAL SIMILARITY AND MODELLING OF ALGAL TOXICITYSTRUCTURAL SIMILARITY AND MODELLING OF ALGAL TOXICITYD 12