![Page 1: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/1.jpg)
Introduction on QSAR Introduction on QSAR and modelling of and modelling of physico-chemical and physico-chemical and biological propertiesbiological properties
Alessandra Roncaglioni – IRFMN
Problems and approaches in computational chemistry
![Page 2: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/2.jpg)
OutlineOutlineHistoryQSAR/QSPR steps
◦(Descriptors)
◦Activity data
◦Modelling approaches
◦Validation (OECD principles)
QSPR (Phys-chem properties)QSAR (Biological activities)Example (DEMETRA)
2
![Page 3: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/3.jpg)
QSAR postulatesQSAR postulatesThe molecular structure is
responsible for all the activities Similar compounds have similar
biological and chemico-physical properties (Meyer 1899)
Hansch analysis (‘70s)Free Wilson approach (‘70s)
H. Kubinyi. From Narcosis to Hyperspace: The History of
QSAR. Quant. Struct.-Act. Relat., 21 (2002) 348-356. 3
![Page 4: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/4.jpg)
Hansch analysisHansch analysisApplied to congeneric series
Log 1/C = a + b + c Es + const.
where
C = effect concentration
= octanol - water partition coefficient
= Hammett substituent constant (electronic)Es= Taft’s substituent constant
Linear free energy-related approach
McFarland principle4
![Page 5: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/5.jpg)
Free-Wilson analysisFree-Wilson analysis
Log 1/C = ai + where C = effect concentrationai= contribution per group
=activity of reference compound
5
![Page 6: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/6.jpg)
The old QSAR paradigmThe old QSAR paradigm
Compounds in the series must be
closely related
Same mode of action
Basics biological activity
Small number of “intuitive”
properties
Linear relation6
![Page 7: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/7.jpg)
The old QSAR paradigmThe old QSAR paradigm
Factors limiting to the old Factors limiting to the old
paradigm:paradigm:
Sw availability
Calculation of molecular properties
Limited COMPUTING POWER
Costs of hw and sw
7
![Page 8: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/8.jpg)
The new QSAR paradigmThe new QSAR paradigm
Heterogeneous compound sets
Mixed modes of action
Complex biological endpoints
Large number of properties
Non linear modelling
8
![Page 9: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/9.jpg)
The new QSAR paradigmThe new QSAR paradigm
Factors enabling new paradigm:Factors enabling new paradigm:
Increased computing power
QM calculations
Thousands of descriptors
Cost drop for hw and sw (freeware)
9
![Page 10: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/10.jpg)
OutlineOutlineHistoryQSAR/QSPR steps
◦(Descriptors)
◦Activity data
◦Modelling approaches
◦Validation (OECD principles)
QSPR (Phys-chem properties)QSAR (Biological activities)Example (DEMETRA)
10
![Page 11: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/11.jpg)
QSAR flowchatQSAR flowchat
11
![Page 12: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/12.jpg)
2D 3D
… … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … …
Descriptos (1, …, m)
D(n,m)
… … … … … … … … … … … … … …
Activity
A A = f (D(n,m))
12
![Page 13: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/13.jpg)
QSAR/QSPR defined by Y QSAR/QSPR defined by Y datadataQuantitative Structure-Property
Relationship: physico-chemical or biochemical properties◦ Boiling point◦ Partition coefficients (LogP)◦ Receptor binding
Quantitative Structure-Activity Relationship: interaction with the biota◦ Toxicity◦ Metabolism
13
![Page 14: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/14.jpg)
Activity dataActivity dataGarbage in, garbage out
Quality and quantity of data
◦ Suitable for purposes?
◦ Intrinsic variability of Y data (particularly for QSAR): examples later on
◦ Chemical domain covered with experimental data
◦ As much as you can expecially if using complex models
14
![Page 15: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/15.jpg)
Data needData needData are one of the
pillars of the modelsThe goal is to
extract knowledge from these data
If they are too noisy it is not possible to extract this knowledge
Enough number of training data
Keep data variability low
Large number of compounds
Quality / Accuracy
Nr.
of
com
poun
ds
15
![Page 16: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/16.jpg)
Modelling stepsModelling stepsData pre-processing
◦Scaling X block and transformation of Y block
Variable selection
Application of algorithms to search for the reationship
16
![Page 17: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/17.jpg)
Scaling variables◦ making sure that each descriptor has an
equal chance of contributing to the overall analysis
◦ E.g.: autoscaling, range scaling
Y transformation
Data pre-processing (I)Data pre-processing (I)
17
No. TestSub Name_Jp CAS nr MolWeight ZM1V HNar MSD GMTIV SPI TI11 1-001 Trichlorfon 52-68-6 257,437 162,124 1,358 0,266 1336,074 9,072 -13,982
2 1-002 Dimethoate 60-51-5 229,257 148,198 1,485 0,324 1587,741 8,253 -16,476
3 1-003 Dichlorvos 62-73-7 220,976 172,519 1,451 0,326 1380,444 7,646 -13,908
4 1-004 Malathon 121-75-5 330,358 274,198 1,551 0,26 6524,185 13,757 -34,658
51-005 Methoprene
40596-69-8 310,471
224 1,562 0,334 9922 17,17 -57,135
6 1-006 Propylthiourea 927-67-3 118,201 50,444 1,448 0,431 235,667 4,166 -6,962
7 1-007 2-Butanone oxime 96-29-7 87,1204 72 1,385 0,414 224 3,484 -4,909
8 1-008 Dibromoacetic acid 631-64-1 217,844 86,134 1,286 0,38 204,949 3,786 -4,593
9 1-011 Bis(2-ethylhexyl)adipate 103-23-1 370,566 254 1,696 0,334 14655 17,037 -80,091
10 1-013 Thiram 137-26-8 240,433 87,778 1,44 0,338 829,445 9,053 -17,08
11 1-015 Stannane, tributylfluoro- 1983-10-4 309,051 88,008 1,6 0,314 1243,889 8,561 -22,079
121-016 Methomyl
16752-77-5 162,21
148,444 1,5 0,375 1128,333 6,542 -12,904
13 1-017 Aldicarb 116-06-3 190,263 158,444 1,485 0,351 1697 8,357 -17,609
14 1-018 Demeton-s-methyl 919-86-8 230,285 124,198 1,548 0,353 1163,185 7,554 -17,685
15 1-019 Citral 5392-40-5 152,233 106 1,535 0,387 1399 7,197 -16,06
16 1-020 Disulfiram 97-77-8 296,539 103,778 1,548 0,307 1801,444 11,43 -28,141
17 1-021 2-Ethyl-1,3-hexanediol 94-96-2 146,227 86 1,5 0,338 805 6,412 -11,897
18 1-022 Tributyl phosphate 126-73-8 266,314 183,309 1,659 0,311 3441,667 10,069 -32,437
20 1-024 Tris(2-chloroethyl)phosphate 115-96-8 285,49 170,124 1,6 0,314 1985,037 8,561 -22,079
22 1-026 Ethylene glycol 107-21-1 62,0678 58 1,333 0,527 139 1,893 -2,499
![Page 18: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/18.jpg)
Data pre-processing (II)Data pre-processing (II)Variable pruning
◦ Detecting constant variables
◦ Detecting quasi-constant variables
It can distinguish between informative and non informative variables
◦ Detecting correlated variables
Variables can be grouped into correlation groups and the most correlated variable with the response is retained
◦ Variables with missing values
18
![Page 19: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/19.jpg)
Variable selectionVariable selectionReducing dimensions, facilitating
data visualization and interpretation
Likely improving prediction performance
Hypothesis driven or statistically driven
19
Wrappers: utilizes the choice of prediction method to score subsets of features according to their predictive power;
Filters: a preprocessing step, independent of the choice of the predictor.
![Page 20: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/20.jpg)
Variable selection Variable selection techniquestechniquesPrincipal component analysis (PCA)ClusteringSelf organizing maps (SOM)Stepwise procedures
• Forward selection: features are progressively incorporated into larger and larger subsets;
• Backward elimination: starting with the set of all features and progressively eliminates the least promising ones.
Genetic algorithmsVariable importance/sensitivity
20
![Page 21: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/21.jpg)
Principal component Principal component analysisanalysisKeep only those components that
possess largest variationPC are orthogonal to each other
Loadings plot21
![Page 22: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/22.jpg)
ClusterCluster analysis analysisProcess of putting objects into classes,
based on similarityDescriptors in the same cluster are
assume similar values for the molecules of the dataset
Many different methods and algorithms◦ different clustering methods will result in
different clusters, with different relationships between them
◦ different algorithms can be used to implement the same method (some may be more efficient than others)
22
![Page 23: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/23.jpg)
Hierarchical and non-Hierarchical and non-hierarchicalhierarchicalA basic distinction is between
clustering methods that organise clusters hierarchically, and those that do not
3 42 5 6 7 81 3 42 5 6 7 81
23
![Page 24: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/24.jpg)
Hierarchical Hierarchical agglomerativeagglomerativeThe hierarchy is built from the bottom
upwardsSeveral different methods and algorithmsBasic Lance-Williams algorithm (common
to all methods) starts with table of similarities between all pairs of items◦ at each step the most similar pair of
molecules (or previously-formed clusters) are merged together
◦ until everything is in one big cluster◦ methods differ in how they determine the
similarity between clusters
24
![Page 25: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/25.jpg)
Hierarchical divisiveHierarchical divisiveThe hierarchy is built from the
top downwardsAt each step a cluster is chosen
to divide, until each cluster has only one member
Various ways of choosing next cluster to divide◦ one with most members◦ one with least similar pair of members◦ etc.
Various ways of dividing it25
![Page 26: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/26.jpg)
Non-hierarchical methodsNon-hierarchical methodsUsually faster than hierarchical e.g.: Nearest neighbour methods
◦best known is example is Jarvis-Patrick method identify top k (e.g. 20) nearest
neighbours for each object
two objects join same cluster if they have at least kmin of their top k nearest neighbours in common
◦tends to produce a few large heterogeneous clusters and a lot of singletons (single-member clusters)
26
![Page 27: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/27.jpg)
Self organizing mapsSelf organizing mapsA SOM is an unsupervised NN
condensing the input space into a low-dimensional representation
27
![Page 28: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/28.jpg)
Genetic algorithmsGenetic algorithmsBased on the Darwinian evolutionary
theory◦ individuals in a population of models are crossed
over, mutated, then iteratively evaluated against a fitness function which gives a statistical evaluation of the model’s performances
28
Initialpopulation
Evaluation of individuals
Cross-over
MutationsIndividual selection
Fitness?
End
Y
N
10010111011011010101010101011110011100111010001010010010 10001010110010
11010111011111
![Page 29: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/29.jpg)
Modelling approachesModelling approaches
SAR
Quantitative SAR
29
Categorical YClassification
Continuous YRegression
![Page 30: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/30.jpg)
Modelling techniquesModelling techniquesMultiple Linear RegressionPLS…
Neural Networks
Classification treesDiscriminant analysisFuzzy classification…
30
![Page 31: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/31.jpg)
Multiple RegressionMultiple Regression
Linear relationship between Y
and several Xi descriptors
Y = aX1 + bX2 + cXn + … + const.
Minimize error by least squares
May include polynomial terms
31
; (1)
)(
)ˆ(1
1
12
yy
yyR
i
n
i
ii
n
i
![Page 32: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/32.jpg)
32
Partial Least SquarePartial Least Square
PLS similarly to PCA uses orthogonal PC of linearly correlated variables more closely related to the Y response
Scores t1&t2 projection
![Page 33: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/33.jpg)
O = f(I)
I O
Neural networksNeural networksInspired by biological NNs are a set of connected
nonlinear elements making transformation of input
33
![Page 34: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/34.jpg)
The problem of The problem of overfittingoverfitting
y = 0.979x + 0.344R² = 0.956
y = -0.062x4 + 1.293x3 - 9.472x2 + 29.24x - 27.37R² = 0.999 34
![Page 35: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/35.jpg)
Solution: validationSolution: validation
35
Training prediction
Validation prediction
Complexity
Perf
orm
an
ces
![Page 36: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/36.jpg)
Validation criteriaValidation criteriaInterna validation - robustness
◦Cross-validation (LOO, LSO)◦Bootstrap◦Y scrambling
External validation - prediction ability◦Test set representative of training
set◦Tropsha criteria
Applicability domain
36
![Page 37: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/37.jpg)
Cross validationCross validationLeave One OutAll the data are used for fitting but one compoundPredict the excluded sampleRepeat it for all samplesCalculate Q2 or R2cv similarly to R2 on the basis of
these predictions
Problem: to optimistic if there are many samples
Leave Many OutUse larger groups to obtain a more realistic
outcome
37
![Page 38: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/38.jpg)
BootstrappingBootstrappingBootstrapping simulates what happen by
randomly resampling the data set with n objects
K n-dimensional groups are generated by a randomly repeated some objects
The model obtained on the different sets is used to predict the values for the excluded sample
From each bootstrap sample the statistical parameter of interest is calculated
The estimation of accuracy is obtained by the average of all calculated statistics
38
![Page 39: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/39.jpg)
Y-scramblingY-scramblingRandomply permutate Y responses while X
variables are kept in the same order for several times
39
![Page 40: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/40.jpg)
Tropsha criteria*Tropsha criteria*
40
* A. Golbraikh, M. Shen, Z. Xiao, Y.D. Xiao, K.-H. Lee, A. Tropsha, Rational selection of training and test sets for the development of validated QSAR models, JCAMD, 17 (2003) 241-253.
a) Q2 > 0.5; b) R2 > 0.6;
c) (R2 - R20)/ R2 < 0.1 and 0.85 < k < 1.15 or
(R2 – R’20)/ R2 < 0.1 and 0.85 < k’ < 1.15
(k=slope of the regression line)
(R20 = R2 related to y=kx)
d) if (c) is not fulfilled, then | R20 – R’2
0| < 0.3
![Page 41: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/41.jpg)
Applicability domainApplicability domain
The applicability domain of a (Q)SAR model is the response and chemical structure space in which the model makes predictions with a given reliability.*
41
* Current status of methods for defining the applicability domain of (quantitative) structure-activity relationships. ATLA, 33:1-19, 2005.
![Page 42: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/42.jpg)
Applicability domainApplicability domain
42
Training data
![Page 43: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/43.jpg)
Applicability domainApplicability domain
43
Training data
New compounds
![Page 44: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/44.jpg)
AD assessmentAD assessment
Similarity measures:
Response range (span of activity data)
Chemometric treatment of the descriptor space
Fragment-based approaches
44
![Page 45: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/45.jpg)
Chemometric MethodsChemometric Methods
Descriptor range-based
45
0
2
4
6
8
10
12
0 5 10 15 20
Descr. 1
De
sc
r. 2
![Page 46: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/46.jpg)
Chemometric MethodsChemometric Methods
Descriptor range-based
Geometric methods
46
0
2
4
6
8
10
12
0 5 10 15 20
Descr. 1
De
sc
r. 2
![Page 47: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/47.jpg)
Chemometric MethodsChemometric Methods
Descriptor range-based
Geometric methods
Distance-based
47
![Page 48: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/48.jpg)
Chemometric MethodsChemometric Methods
Descriptor range-based
Geometric methods
Distance-based
Probability density
distribution
48
![Page 49: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/49.jpg)
AMBIT softwareAMBIT software
http://ambit.acad.bg/main.php 49
![Page 50: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/50.jpg)
AD assessmentAD assessment
Similarity measures:
Response range (span of activity data)
Chemometric treatment of the descriptor space
Fragment-based approaches
50
![Page 51: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/51.jpg)
Example of AD Example of AD assessmentassessment
0
10
20
30
40
50
60
70
80
90
100Within 1 log unit
Within 2 log unit
Test set 1
Test set 2
0
10
20
30
40
50
60
70
80
90
100
% o
f com
poun
ds
Within 1 log unit
Within 2 log unit
% of all compounds in the test set predicted within one or two log unit without assessing the AD
51
![Page 52: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/52.jpg)
Further aspects in ADFurther aspects in AD
52
Including the model’s characteristicsTerminal
nodeNode
assignment
Misclassification ratio
Training setValidation
setTest set
4 0 0.04 0.02 0
6 1 0.13 0.16 0.14
8 0 0.1 0.26 0.17
11 1 0.31* 0 0.25
12 0 0.05 0.14 0.25
14 1 0.47* 0.67* 0.25
15 0 0.14 0.19 0.13
18 1 0.32* 0.33* 0.55*
19 0 0.2 0 0
20 0 0.06 0.2 0.17
21 1 0.44* 0.2 0.17
![Page 53: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/53.jpg)
OutlineOutlineHistoryQSAR/QSPR steps
◦(Descriptors)
◦Activity data
◦Modelling approaches
◦Validation (OECD principles)
QSPR (Phys-chem properties)QSAR (Biological activities)Example (DEMETRA)
53
![Page 54: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/54.jpg)
Why?Why?
54
Large number of existing and new chemicals without a complete (eco)toxicological characterization
http://www.ewg.org/reports/skindeep/
Ingredient Search Results: [tocopheryl
acetate]
Ingredient Sample Product Categories Number of Products Ingredient Score (5=highest concern)
1. TOCOPHERYL ACETATE
Moisturizer , Facial Moisturizer/Treatment , Facial Cleanser , Body Wash/Cleanser , Lip Gloss
3144 0.6
Ingredient Categories: Low concern Moderate concern Higher concern
Cancer hazard Reproductive/developmental toxicity
Unsafe for use in cosmetics Illegal ingredients (EU)
Illegal ingredients (US) Unsafe in infant products
Potential for harmful impurities Ingredient(s) not disclosed on label
Sunburn/skin cancer risk Estrogenic chemicals and other endocrine disruptors
Irritants - eye, skin, or lungs Fragrance
Persistent/bioaccumulative Immune system toxicants (allergies, sensitization)
Penetration enhancers Safety limits on use/purity/manufacturing
Classified as toxic Potential for infectious disease risk
Hazards for occupational exposures Industry safety warnings
Illegal for use in food Illegal for use in drugs
Insufficient safety data Wildlife/environmental toxicity
Ingredient(s) not assessed for safety No safety information in 37 regulatory/toxicity data sources
Summary - health information
Constrains: time consuming, expensive,
ethical issues
![Page 55: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/55.jpg)
REACHREACH
Enterprises that manufacture or import more than one tonne of a chemical substance per year would be required to register it in a central database
It is estimated that the testing of the approximately 30’000 existing substances would result in total costs of about 2,1 billion €, over the next 11 years
Promotion of non-animal testing
55
Registration, Evaluation and Authorisation of CHemicals
![Page 56: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/56.jpg)
REACHREACH
56
Registration, Evaluation and Authorisation of CHemicals
Additional cost Use of (Q)SARs,
read-across 2.3 billion Euro Minimal
use
1.5 billion Euro Average use (likely scenario)
1.1 billion Euro Maximal use
Cost-saving potential: € 800-1130 million
Pedersen et al. (2003). Assessment of additional testing needs under REACH.
![Page 57: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/57.jpg)
REACHREACH
57
Registration, Evaluation and Authorisation of CHemicals
Additional animalsUse of (Q)SARs, read-
across 3.9 million Minimal
use
2.6 million Average use (likely scenario)
2.1 million Maximal use
Animal-saving potential: 1.3-1.9 million animals
Van der Jagt et al. (2004). Alternative approaches can reduce the use of test animals under REACH.
![Page 58: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/58.jpg)
OECD principles for QSAR OECD principles for QSAR validationvalidationEfforts to improve transparency and acceptability of in silico methods:
A defined endpoint
An unambiguous algorithm
A defined domain of applicability
Appropriate measures of goodness-of-fit, robustness and predictivity
A mechanistic interpretation, if possible
58
![Page 59: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/59.jpg)
QSPRQSPRPhysico-chemical properties
◦Boiling point◦Solubility◦Partition coefficients◦Viscosity◦Hydrophobicity
Biochemical assays
59
![Page 60: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/60.jpg)
Specific aspects of QSPRSpecific aspects of QSPRIn general you can expect to
obtain more precise models, and experience reduced experimental variability
Many properties important for drug design◦Biochemical assay – target property◦Bioavailability – LogP◦Side effects
Many others important for REACH60
![Page 61: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/61.jpg)
QSARQSARBiological activities
◦Ecotoxicity◦Mammalian toxicity (as surrogate of
human health)◦Carcinogenicity & Mutagenicity◦…◦& many more
61
![Page 62: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/62.jpg)
Specific aspcts of QSARSpecific aspcts of QSAR
Biological variability
Moles vs. wheight data
Role of LogP
Mechanistic interpretation
62
![Page 63: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/63.jpg)
Biological variabilityBiological variabilityIntrinsic variability of
toxicological data (LC50)
63
![Page 64: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/64.jpg)
Mole vs. Mole vs. wheight wheight unitsunits
64
![Page 65: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/65.jpg)
Role of LogPRole of LogPUsed to model the penetration
into the phospholipidic membrane
Extreamly common for its easyness of interpretation
65
![Page 66: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/66.jpg)
Role of LogPRole of LogPWhich is the your favourite option?
66
Tox = 1.32 • LogP + 0.23
Tox = 0.55 • des1 + 0.36 • des2 + 0.29 • des3 + 0.64 • des4 - 0.47 • des5 - 1.56 • des6 -
0.53 • des7 + 0.27 • des8 + 0.55 • des9 + 0.50 • des10 + 0.23
![Page 67: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/67.jpg)
Role of LogPRole of LogP
67
KowWin (LogKow) Log P Calculation: SMILES : CC(C)(C)C(=O)C(Oc1ccc(Cl)cc1)n2cncn2 CHEM : Triadimefon MOL FOR: C14 H16 CL1 N3 O2 MOL WT : 293.76 -------+-----+--------------------------------------------+----------+--------- TYPE | NUM | LOGKOW v1.66 FRAGMENT DESCRIPTION | COEFF | VALUE -------+-----+--------------------------------------------+----------+--------- Frag | 3 | -CH3 [aliphatic carbon] | 0.5473 | 1.6419 Frag | 1 | -CH [aliphatic carbon] | 0.3614 | 0.3614 Frag | 8 | Aromatic Carbon | 0.2940 | 2.3520 Frag | 1 | -CL [chlorine, aromatic attach] | 0.6445 | 0.6445 Frag | 1 | -O- [oxygen, one aromatic attach] |-0.4664 | -0.4664 Frag | 1 | -C(=O)- [carbonyl, aliphatic attach] |-1.5586 | -1.5586 Frag | 3 | Aromatic Nitrogen [5-member ring] |-0.5262 | -1.5786 Frag | 1 | -tert Carbon [3 or more carbon attach] | 0.2676 | 0.2676 Factor| 1 | -N-C-O- structure correction | 0.5494 | 0.5494 Factor| 1 | -C-CO-C-O- structure correction | 0.5000 | 0.5000 Const | | Equation Constant | | 0.2290 -------+-----+--------------------------------------------+----------+--------- Log Kow = 2.9422
LogKow Estimated Log P: 2.94
![Page 68: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/68.jpg)
From LogPFrom LogP
68
descriptorsdescriptors
structure
logP
activity
descriptors
structure activity
To direct descriptors
![Page 69: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/69.jpg)
MechanisticMechanistic interpretationinterpretationA priori (experimentally
determined – even more complex that the studied endpoint itself) or postulated or a posteriori
Different classification schemes for MOA exist (narcosis, specific reactive modes)
69
![Page 70: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/70.jpg)
Global models …Global models …
70
Training setTraining set n = 422 d = 5 R2 % = 69,9 %Rcv2 % = 68,0 % RMS = 0,77
Test setTest set n = 141 R2 = 71,7 RMS = 0,70
- Log (LC50) for training and test set
Observed
-4 -2 0 2 4 6
Pred
icte
d
-4
-2
0
2
4
6 Training setTest set
N-Vinylcarbazole
Acrolein
2-propyn-1-ol
2-propen-1-olMalononitrile
![Page 71: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/71.jpg)
Global models …Global models …
71
nn 563 563
dd
Log P, ELog P, ELUMOLUMO, MW,
Kier&Hall (order 0), Molecular surface area
Log P, Log P,
EELUMOLUMO
RR22 71.1 69.5
QQ22 70.7 69.3
RMSRMS 0.74 0.76
![Page 72: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/72.jpg)
vs mechanistic models …vs mechanistic models …
72
-Log (LC50) for reactive compounds
Observed
0 2 4 6
Pre
dict
ed
0
2
4
6
n d R2% Rcv2% RMS
141 2 58.6 58.2 0.83
MOA Class n d R2% Rcv2%
Narcosis I
Narcosis II
Narcosis III
238
38
26
2
3
4
90.1
82.9
91.7
89.9
81.1
90.6
![Page 73: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/73.jpg)
OutlineOutlineHistoryQSAR/QSPR steps
◦(Descriptors)
◦Activity data
◦Modelling approaches
◦Validation (OECD principles)
QSPR (Phys-chem properties)QSAR (Biological activities)Example (DEMETRA)
73
![Page 74: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/74.jpg)
Activity data collectionActivity data collection Identification of the endpoints that can
mostly benefit for QSAR Costs, test severity,
feasibility, etc…
Identification of data sources Quality, guidelines, protocols
Refinement of the data Multiple sources comparison,
precautionary selection
74
TROUT(282)
WATER FLEA
(263) ORAL QUAIL (116)
DIETARY QUAIL (123) BEE
(105)
WHOLE
DATASET
(398)
![Page 75: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/75.jpg)
Trout
Daphnia
Oral quail
Dietary quail
Bee
• Individual models
Linear models, ANN models
• Hybrid system
Combining
model results
Modelling processModelling process
75
![Page 76: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/76.jpg)
ValidationValidationValidation of the hybrid model for
Daphnia with new data subsequenly identified in literature◦Real “blind” test set
Comparison with Expert systems◦ECOSAR◦Topkat
76
![Page 77: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/77.jpg)
DEMETRA Hs - training DEMETRA Hs - training setset
77
-4
-2
0
2
4
6
-4 -2 0 2 4 6
experimental value [-log(mg/l)]
pre
dic
ted
val
ue
[-lo
g(m
g/l)
]Daphnia Magna
TRAINING SETNC = 193
R2 = 0.80
![Page 78: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/78.jpg)
DEMETRA Hs - DEMETRA Hs - results on results on teststests
78
Daphnia Magna
TEST SETSEPA test set
NC = 36
R2 = 0.80
-4
-2
0
2
4
6
-4 -2 0 2 4 6
experimental value [-log(mg/l)]
pre
dic
ted
val
ue
[-lo
g(m
g/l)
]
D-BBA test set
NC = 101
R2 = 0.70
![Page 79: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/79.jpg)
US EPA ECOSAR US EPA ECOSAR predictionspredictions
79
Daphnia Magna
ECOSAR – tutti i datiNC = 432
R2 = 0.20
-4
-2
0
2
4
6
-4 -2 0 2 4 6
experimental value [-log(mg/l)]
pre
dic
ted
val
ue
[-lo
g(m
g/l)
]
![Page 80: Introduction on QSAR and modelling of physico-chemical and biological properties Alessandra Roncaglioni – IRFMN aroncaglioni@marionegri.it Problems and](https://reader036.vdocuments.site/reader036/viewer/2022081514/551c42df5503467b488b4b9b/html5/thumbnails/80.jpg)
Topkat predictionsTopkat predictions
80
NC = 176
NC (training test) = 31
R2 = 0.20
-4
-2
0
2
4
6
-4 -2 0 2 4 6
experimental value [-log(mg/l)]
pre
dic
ted
val
ue
[-lo
g(m
g/l)
]