chapter 1 introduction - shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/6512/6/06_chapter...
TRANSCRIPT
1
CHAPTER 1
INTRODUCTION
1.1 GENERAL INTRODUCTION
An integrated and insightful look at successful drug synthesis
depends upon the ability to identify new chemical entities that have potential to
treat diseases in a safe and efficient manner. Many of the drugs in use, in the
last fifty years or more have been of synthetic or semisynthetic origin. The
pharmacopoeias prior to that period were of natural origin. Finding effective
drugs is difficult. Many are discovered by chance observations, the scientific
analysis of folk medicines or by noting the side effect of other drugs.
During the early stage of drug discovery the scientist’s were
primarily concentrated with the isolation of medicinal agents found in plants
For example salicylic acid the precursor of Aspirin was isolated from Willow
bark; Morphine a powerful pain killer from opium poppy, Quinine from
cinchona bark; Digitalis from purple foxglove plants etc. Synthetic chemistry
grew rapidly in sophistication during the first half of 20th century and first
proved its value for pharma by enabling the discovery of Sulfa drugs [1].
Medicinal chemistry received further boost in 1940 as
pharmacology, which until then had been dominated by physiology, became
increasingly biochemical in character with new understanding of the role of
enzymes and cell receptors. Medicinal chemistry is the application of chemical
research technique used to identify, synthesize and develop chemical
2
entities for therapeutic use, which operates as an intersection of chemistry and
pharmacology. It also includes the study of existing drug, their biological
properties and quantitative structure activity relationship (QSAR).
The seeds for the concept of rational drug design were laid in the
1940s and 1950s by George Hitchings and Gertrude Elion in their work on
DNA-based antimetabolites, which led to the discovery of modified purines
with anticancer activity [2]. However, the era of DNA and medicine was
largely stimulated by the elucidation of the double-helical structure of DNA by
Watson and Crick in 1953 [3]. The ramifications of this discovery in DNA
replication, transcription and translation led to a much better understanding of
viral replication. This laid the foundation for antiviral drug discovery in
subsequent decades as molecular targets in the viral replication cycle began to
be identified. The 1950s also saw the discovery of Vancomycin, a glycopeptide
which was developed much later for use against Methicillin-resistant
staphylococci infections. The era of recombinant DNA technology and
molecular cloning began around the mid-1970s. During the 1970s chemists
turned increasingly to rational drug discovery based on structural knowledge of
target and / or ligand, a movement that began a strategic shift away from
natural products towards purely synthetic or natural based compounds. Rational
drug design therefore requires significant knowledge of chemistry as well as
biology, because chemical interaction between drugs and their target are what
which determine whether a drug is biologically active or not.
The first drug developed using the rational approach was antiviral
drug called Zanamivir [4]. This drug was designed to interact with a
neuraminidase, a virus produced enzyme that is requested to release newly
formed virus from the infected cells. Other rationally designed drugs include
HIV drugs such as Ritonavir and Indinavir, both of which interact with viral
proteins.
3
More recently automated high-throughput screening (HTS) system
utilizing cell culture system with lined enzyme assays and receptor molecules
derived from gene cloning have greatly increased the efficiency of random
screening. It is now practical to screen enormous libraries of peptides and
nucleic acid from combinatorial chemistry procedure. With significant
advances in X-ray crystallography and NMR it’s possible to obtain a detailed
representation of enzyme and other drug receptor. The techniques of molecular
graphics and computational chemistry provide novel chemical structures that
have lead to new drug with potent medicinal activities. Development of human
immuno deficiency virus (HIV) protease and angiotensin converting enzyme
inhibitors came from an understanding of the geometric and chemical character
of the receptor enzyme active site [5]. Even if the receptor structure is not
known in detail, rational approaches based on the physiochemical properties of
a lead compound can provide new drug. Despite the progress there still remains
an increasing need for novel innovative therapeutic agents. The aim is to
improve the success in drug development by devising a better method for the
synthesis of lead molecule from easily accessible, affordable and inexpensive
raw material. The majority of drugs used today are distinct products of
synthetic organic chemistry and most of them are heterocyclic derivatives.
1.2 NUCLEUS INTRODUCTION
Compounds classified as heterocycles probably constitute the largest
and most varied family of organic compounds. They are rich sources of diverse
physical, chemical and biological properties. In medicinal chemistry they are
commonly used as template to design biologically active agents. A number of
compounds having heterocyclic nucleus such as thiadiazole, triazole,
benzthiazole, benzoxazole, oxadiazole etc and their derivatives have been
associated with broad spectrum of biological activities [7]. Synthesis of triazole
fused with another heterocyclic ring has attracted widespread attention due to
their diverse applications. Among them symmetrical triazole fused with
4
thiadiazole represent an interesting class of compound since 1,2,4, triazole and
1,3,4 thiadiazole both possess wide spectrum of activities (Figure 1.1).
N
NN
N
S
1
3
2
4
5 6
78
Figure 1.1 3D Structural representation of 1,2,4-triazolo-[3,4-b]-1,3,4-
thiadiazole
Molecular formula - C3H2N4S
Molecular Weight - 126.14
Log P - 0.82
MR - 33.7 [cm3/mol]
Henry's Law - 1.63
CLogP - 0.932756
CMR - 2.9819
Thiadiazole is a versatile moiety and many drugs containing
thiadiazole nucleus are available in market such as diuretic-Acetazolamide,
Methazolamide, antibacterial-Sulphamethazole, antibiotic-Cefazoline etc.,
H2NS
O
ON N
SN
OCH3
CH3
Methazolamide
NH2SO
O
HN S
NNCH3
Sulphamethazole
5
NH2S
O
ONN
SNH
O
H3C
Acetazolamide
NN
SH3C S
NH
O
N
NN
N
Cefazoline
N
S
COOH
The review of literature showed that the thiadiazole derivatives
possess antimicrobial [8-10], anti-inflammatory [11], anticancer [12-14],
anticonvulsant, antidepressant [15], carbonic anhydrase inhibitor [16] and
antioxidant [17] activities.
The derivatives from 1, 2, 4-triazole, possess potent biological
activities such as antifungal [18], antibacterial [19], antiviral [20], antimigraine
[21] activities etc. Some available therapeutically important medicines
containing triazole nucleus are, hypnotic-Triazolam, Estazolam, antifungal
drugs-Fluconazole, Vorconazole and antiviral drug- Ribavirin [22].
Cl
N
NCl
NNH3C
Triazolam
N
NCl
NN
Estazolam
F
FHO
N
N
N
N N
N
Fluconazole
F
FHO
N
N
N
F
H3C
Vorconazole
NN
N
OHO
HO
H2NO
OH
Ribavirin
6
The fused triazole and thiadiazole ring system shows various
biological effects and it is viewed as cyclic analogues of two very important
component thiosemicarbazide and biguanide [23] which often display diverse
biological activities. Literature survey revealed that, s-triazolo-[3, 4-b]-1,3,4-
thiadiazole rings have received much attention during recent years on account
of their prominent utilization as anti-inflammatory [24], antiviral [25],
anthelmentic [26] and antimicrobial [27-29] activities. The anticancer activity
of the ring is reported to be due to its similarity with purine ring [28]. The
thiadiazole ring system (A) of triazolo-thiadiazole take the place of the
pyrimidine ring in the purine ring (B).
NH
N N
NA B
N
S N
NN
A B
Purine Triazolo-thiadiazole
1.3 BIOLOGICAL ACTIVITIES
Biological screening of any synthesized compounds is an important
manifold in drug design. The pharmacophore or lead moiety can be selected
only after the determination of functional groups which are responsible for
various biological activities.
1.3.1 Cytotoxic Activity
Cancer remains a major public health issue at the beginning of 21st
century. It is a disease in which the control of growth is lost in one or more
cells leading to a solid mass of cell knowns as tumor. A growing tumor will
often become life threatening by obstructive vessels or organ; however׳ death is
caused by the spread of the primary tumor to one or more other site in the body
making surgical intervention impossible. It is thought that exposure to the ever
7
increasing carcinogenic chemicals in the environment and the diet may be a
significant contributing factor. It is now known to be a disease that involves
changes to the genome caused by both internal factors like mutation, loss of
genetic material changed gene expression and external factors like viruses,
chemicals, radiations etc.
Cancer treatment often encompasses more than one approach and the
strategy adopted is largely dependent on the nature of the cancer.
Chemotherapy is the one way to fight against cancer. Significant side effects
such as nausea, vomiting, diarrhoea, hair loss and serious infections often
accompany chemotherapy [2]. Therefore, the need for accelerated development
of new, more effective as well as less toxic chemotherapeutic agents have
appeared. For this purpose both invitro and invivo models are employed for
screening anticancer activity. Though animal models provide significant result,
invitro testing is still preferred than invivo testing of a potential
chemotherapeutic agents. Thiadiazoles have been of great interest as antitumor
agents and 1, 2, 3-triazole derivatives have been reported aganist tumor
proliferation, invasion and metastasis [30].
1.3.2 Antioxidant Activity
It is commonly accepted that, in a situation of oxidative stress,
reactive oxygen species such as superoxides, hydroxy and peroxy radicals are
generated. The reactive oxygen species play an important role related to the
degenerative or pathological process of various serious diseases such as cancer,
coronary heart diseases, Alzheimer’s disease, neurodegenerative disorders,
atherosclerosis, cataract, inflammation and ageing [31].
In many infections or disorders there is an excessive phagocytes,
production of O·2, ·OH radical as well as non free radical species (H2O2) which
can harm surrounding tissue either by powerful direct oxidizing action or
indirectly with hydrogen peroxide (H2O2) and ·OH radical formed from oxygen
8
free radical, which results in membrane destruction. Free radicals can be
formed by three ways:
· by homolytic cleavage of a covalent bond of a normal
molecule, with each fragment retaining one of the paired
electrons
· by the addition of single electron from a normal molecule
· by the removal of single electron from a normal molecule
Environment sources such as ultra violet irradiation, ionizing
irradiation and pollutants also produces reactive free radical species.
Peroxidase enzymes such as lipoxgenase can generate free radicals.
Lipoxgenase can react with the free form of Phospholipids A2. Injured cells and
tissues can stimulate the generation of free radicals. The human body possesses
several defence systems against free radicals although it produces free radical
continuously, which comprises of enzyme and radical scavengers. These are
called “first line antioxidant defence system,” but are not completely effective.
The “second line defence system” is the repair system of biomolecules, which
are damaged by the attack of free radicals due to the increased use of
antioxidants in therapy. Ascorbic acid, α-Tocopherol, Probucol, Sylibin and
Gnaphalin are proved to possess antioxidant activity [32].
An antioxidant is a molecule capable of slowing or preventing the
oxidation of other molecules. Oxidation is a chemical reaction that transfers
electrons from substances to oxidizing agent. Oxidising reaction can produce
free radicals which start chain reactions that damage cells. Antioxidant
terminates these chain reactions by removing free radical intermediates and
inhibits other oxidation reactions by being oxidizing them. Hence, the agents
that can scavenge these reactive species can be beneficial in the treatment of
various disorders.
9
1.3.3 Antifungal Activity
Recently research on antifungal agent play a vital role because
immuno compromised patients is very susceptible to invasive fungal infections.
The onset of Acquired immuno deficiency syndrome (AIDS) combined with
increased use of powerful immuno suppressive drugs for organ transplants and
cancer chemotherapy has resulted in demand for new antifungal. In 1980s and
1990s a number of safer antifungal drugs such as the azoles were introduced
into clinic. However the widespread use of newer antifungal agents has been
accompanied by increasing reports of resistance. Azole antifungals are the
largest class of antimycotics available today with over 20 drugs in market.
Azoles are five membered ring containing 2 or 3 nitrogen atom either
imidazole or triazoles. All the azoles act by inhibiting ergosterol biosynthesis
[33]. The main target of azole antifungal is the cytochrome P450 dependent 14α-
demethylation of Lanosterol. Inhibition of sterol 14-demethylase results in the
depletion of ergosterol and accumulation of sterol precursor including
Lanosterol alter the structure and function of plasma membrane.
The ideal antifungal agents should be fungicidal with broad spectrum
of activity and also be suitable for oral or intraveneous administration and
possess good pharmacodynamic properties without development of resistance
during therapy. At present none of the clinically used drugs satisfies all these
criteria.
1.4 DRUG DESIGN
Quantitative structure activity relationship (QSAR) is an important
area of chemometrics that has been widely utilized to study the relationship
between chemical structure and biological or other functional activities [34].
QSAR models are widely used for the prediction of physiochemical properties
and biological activities. The success of QSAR approach can be explained by
10
the insight offered into the structural determination of chemical properties and
possibility to estimate the properties of new chemical compounds without the
need to synthesize and test them. The attempt to transform qualitative beliefs
into a quantitative method of activity assessment is known as QSAR. It began
with the work of Hansch and was further developed by others.
QSAR is the process by which chemical structure is quantitatively
correlated with a well defined process, such as biological activity or chemical
reactivity. For example, biological activity can be expressed quantitatively as in
the concentration of a substance required to give a certain biological response.
Additionally, when physicochemical properties or structures are expressed by
numbers, one can form a mathematical relationship or quantitative structure
activity relationship, between the two. QSAR's most general mathematical
form is
Activity = f (physiochemical properties and/or structural properties)
Types of QSAR are based on the dimensionality [35] of molecular
descriptors (Figure 1.2) used:
§ 0D-These are descriptors derived from molecular formula ׳e.g.
molecular weight, number and type of atoms׳ etc.
§ 1D-A substructure list representation of a molecule can be
considered as a one-dimensional (1D) molecular representation
and consists of a list of molecular fragments (e.g׳. functional
groups, rings, bonds, substituents׳ etc.).
§ 2D-A molecular graph contains topological or two dimensional
(2D) information. It describes how the atoms are bonded in a
molecule, both the type of bonding and the interaction of
11
particular atoms (e.g׳. total path count, molecular connectivity
indices׳ etc.).
§ 3D-These are calculated starting from a geometrical or 3D
representation of a molecule. These descriptors include
molecular surface, molecular volume and other geometrical
properties. There are different types of 3D descriptors .e.g׳
electronic, steric, shape׳ etc.
§ 4D-In addition to the 3D descriptors the 4th dimension is
generally in terms of different conformations or any other
experimental condition.
1D Representation 2D Representation
3D Representation 4D Representation
Figure 1.2 Dimensionality of molecular descriptors
12
There are two main objectives for the development of QSAR:
· Development of predictive and robust QSAR, with a specified
chemical domain, for prediction of activity of untested
molecules.
· It acts as an informative tool by extracting significant patterns
in descriptors related to the measured biological activity leading
to understanding of mechanisms of given biological activity.
This could help in suggesting design of novel molecules with
improved activity profile.
Determination of QSAR generally proceeds as follows [36]:
a) Identify the training set
b) Enter biological activity data
c) Calculate descriptors
d) Generate a QSAR equation
e) Validate the equation
f) Predict the biological activity.
a) Identify the training set: The first step is to choose the molecular structure
to use as the training set and built new structure with provided tool.
b) Enter biological activity data: For each of the molecule in the training set,
the observed biological activities performed manually in laboratory are entered
(dependent parameter).
13
c) Calculate descriptors: Express the ligand in some quantitative manner; that
is select a collection of numbers that characterize the ligand. These numbers
are called molecular descriptors. By using suitable software a wide variety of
molecular descriptors (independent parameter) are calculated. The descriptor is
a measure of the potential contribution of its group to a particular property of
the ligand or parent structure [37]. The normally used molecular descriptors are
electronic, steric, thermodynamic and topological indices [38-42]. The various
parameters have been used in this QSAR studies are given in the table
The QSAR approach uses parameters which have been assigned to
the various chemical groups that can be used to modify the structure of the
drug. The parameter is a measure of the potential contribution of its group to a
particular property of the parent drug. The selection of parameters is an
important step in QSAR study. The various parameters used in QSAR study
are:
a)Thermodynamic Parameters
(i) Heat of Formation: The enthalpy for forming a molecule from its
constituent atom is a measure of the relative thermal stability of a molecule. It
is calculated by quantum-chemical technique and has a wide range of
applicability in conformational analysis, intermolecular modeling and chemical
reaction modeling. The atom limit is 300 atoms or 300 atomic orbitals
(whichever is less) per molecule.
(ii) Partition Coefficient Log P: Log P (the octanol/water partition coefficient)
and molar refractivity are molecular descriptors that can be used to relate
chemical structure to observe chemical behavior. Log P is related to the
hydrophobic character of the molecule. The molecular refractivity index of a
substituent is a combined measure of its size and polarizability.
14
octoct /wat un ionized
water
[Solute]logP log[Solute] -
= (1.1)
The partition coefficient is a ratio of concentrations of un-ionized
compound between the two solutions. To measure the partition coefficient of
ionizable solutes, the pH of the aqueous phase is adjusted such that the
predominant form of the compound is un-ionized. The logarithm of the ratio of
the concentrations of the un-ionized solute in the solvents is called log P.
(iii) Melting Point: The melting point of a solid is the temperature at which the
vapor pressure of the solid and the liquid are equal. At the melting point the
solid and liquid phase exists in equilibrium. When considered as the
temperature of the reverse change from liquid to solid, it is referred as the
freezing point.When the "characteristic freezing point" of a substance is
determined, in fact the actual methodology is almost always "the principle of
observing the disappearance rather than the formation of ice", that is, the
melting point.
(iv) Molar Refractivity (MR): The molar refractivity is a measure of both the
volume of a compound and how easily it is polarized. It is expressed as:
2
2
(n 1)MMR(n 1)d
-=
+ (1.2)
where n is the refraction index
M is the molecular weight
d is the density.
The term mol.wt/density define a volume, while the term ( n2 – 1)/
(n2 + 2) provide a correction factor by defining how easily the substituent can
15
be polarized. This is particularly significant if the substituent has a π electron or
lone pair of electrons.The positive sign of MR in QSAR equation explains that
the substituent binds to polar surface while a negative sign or nonlinear
relationship indicates steric hindrance at the binding site.
(v) Energy Stretching: Energy stretching is the bond stretching energy. The
value of the E stretching bond energy for pair of atoms joined by a single bond can
be estimated by considering the bond to be a mechanical spring that obeys
Hooke’s law. If r is the stretched length of the bond and r0 is the ideal bond
length, then
E stretching = 1/2 K (r – r0)2 (1.3)
where ro is Ideal bond
r is Stretched bond
K is the force constant in other words, a measure of the strength of
the bond.
If a molecule consist of three atoms, (a-b-c), then
E stretching = E a-b + E b-c
=K(a-b) [r(a-b) – r0(a-b)] +½ k(b-c) [r(b-c) – r0(b-c)]2 (1.4)
(vi) Torsion Energy: E Torsion is the bond enery due to changes in the
conformation of the bond and given by
1 (1 cos( ( ))2TorsionE k m offsetf f f= + +
(1.5)
where kφ is the energy barrier to the rotation about the torsion angle φ m is
the periodicity of the rotation
16
φ is offset is the ideal torsion angle relative to staggered arrangement
of two atoms.
(vii) Energy VDW: The van der Waals interaction energy of the molecule with
the receptor.E vdW is the total energy contribution due to van der Waal’s force
and it is calculated from the Leonard – Jones potential equation
12 6min min
vdw
(r ) (r )E 2r r
= e - (1.6)
The6
min(r )r
term in this equation represents attractive force, while
12min(r )r
term represents the short range of repulsive forces between the atoms.
The r min is the distance between two atoms and when the energy is at a
minimum ε and r is the actual distance between the atoms.
b)Electronic Parameter
(i) Energy Bend: E bend is the bond energy due to the changes in the bond angle
and estimated as:
E bend = ½ kӨ (Ө-Ө0)2 (1.7)
where θ0 is the ideal bond length that is the minimum energy position of the
3 atoms.
(ii) Highest Occupied Molecular Orbital Energy (HOMO): HOMO (highest
occupied molecular orbital) is the highest energy level in the molecule that
contains electrons. It is crucially important in governing molecular reactivity
and properties. When a molecule acts as a Lewis base (an electron-pair donor)
in bond formation, the electrons are supplied from the molecule's HOMO. How
17
readily this occurs is reflected in the energy of the HOMO. Molecules with
high HOMOs are more able to donate their electrons and are hence relatively
reactive when compared to molecules with low-lying HOMOs; thus the HOMO
descriptor measure the nucleophilicity of a molecule.
(iii) Lowest Unoccupied Molecular Orbital Energy (LUMO): LUMO (lowestunoccupied molecular orbital) is the lowest energy level in the molecule thatcontains no electrons. It is important in governing molecular reactivity andproperties.When a molecule acts as a Lewis acid (an electron-pair acceptor) inbond formation, incoming electron pairs are received in its LUMO. Moleculeswith low-lying LUMOs are more able to accept electrons more than those withhigh LUMOs; thus the LUMO descriptor should measure the electrophilicity ofa molecule.
c)Steric Parameter
(i) Ovality: Ovality or non-circularity is the degree of deviation from perfectcircularity of the cross section of the core or cladding of the fibre.Quantitatively, the Ovality of either the core or lading is expressed as,
(a b)2(a b)++
(1.8)
where a is the length of major axisb is the length of minor axis.
(ii) Dipole Moment: The dipole moment descriptor is a 3D electronicdescriptor that indicates the strength and orientation behavior of a molecule inan electrostatic field. Both the magnitude and the components (X, Y, and Z) ofthe dipole moment are calculated. It is estimated by utilizing partial atomiccharges and atomic coordinates. The descriptor uses Debye units. Dipoleproperties have been correlated to long-range ligand-receptor recognition andsubsequent binding.
18
(iii) Balaban Index: The Balaban Index ‘J’ is a graph index defined for agraph on n nodes and m edges. This is a highly discriminating descriptor,whose values do not substantially increase with molecule size and the numberof rings present. Its evaluation begins with the D-matrix modified as follows:
· Each edge contributes length 1/b to overall path lengths, whereb is the edge (bond) order.
· For aromatic bonds, the number b is set to 1.5 by definition(thus contributing 2/3 to overall path lengths).
n n 1/2i 1 j 1
mJ (DiDj)1
-= =
=g + å å (1.9)
where = m – n +1 is the circuit tank of the graph
Di is the sum of all entries in the ith (or column) of the graphdistance matrix.
Balaban Index helps to differentiate the molecule according to theirshape
(iv) Connolly Solvent Accessible Area (Angstrom2): The locus of the center ofa spherical probe as it is rolled over the molecular model.
(v) Connolly Molecular Surface Area (Angstrom2): The contact surfacecreated when a spherical probe is rolled over the molecular model.
(vi) Connolly Solvent Excluded Volume (Angstrom3): The volume containedwithin the contact molecular surface.
(vii) Principle Moment of Inertia(X,Y,Z): The moment of inertia of the wholebody with respect to one of the principal axes is known as Principle Moment of
19
Inertia. The moments of inertia are computed for a series of straight linesthrough the center of mass. The moments of inertia are given by:
2i i 1I m d=å 2 (1.10)
If all the three moments are equal, the molecule is considered to be a
symmetrical top.
(viii)Wiener Index (W): The Wiener index is the sum of the chemical bonds
existing between all pairs of heavy atoms in the molecule. In graph-theoretical
terms: the sum of lengths of minimal paths between all pairs of vertices
representing heavy atoms. This is equal to half the sum of all D-matrix entries
Di j ij
1W a2
= å å (1.11)
d) Generate a QSAR equation
Determine the functional relationship between activity and the
selected descriptors; that is, search for mathematical function f, that has a
property that, activity= f (descriptor) to a suitably high level of accuracy i.e׳
after identifying the dependent and independent variables a suitable statistical
method is used to generate a QSAR equation [43]. The statistical methods can
be broadly divided into two: linear and non-linear methods. In statistics a
correlation is established between dependent variable(s) (biological activity)
and independent variable(s) (molecular descriptors).
The linear method fits a line between the selected descriptors and
activity as compared to non-linear method which fits a curve between the
selected descriptors and activity. The statistical method to build QSAR model
is decided based on the type of biological activity data.
20
Following are few commonly used statistical methods:
· Categorical Dependent Variable - Discriminant analysis, Logistic
regression, k-Nearest Neighbour classification, Decision Trees.
· Continuous Dependent Variable - Multiple regression, Principal
Component Regression, Continuum Regression, Partial Least
Squares Regression, Canonical Correlation Analysis, k-Nearest
Neighbor method, Neural Networks.
Multiple regression is the widely used method for building QSAR
model. It is simple to interpret a regression model, in which contribution of
each descriptor could be seen by the magnitude and sign of its regression
coefficient. Multiple linear regression attempts to maximize the fit of the data
to a regression equation for the biological activity by adjusting each of the
parameters upon down. Successive regression equations will be derived in
which parameters will be either added or removed until the r2 and S values are
optimized. The magnitude of coefficients derived in this manner that indicates
the relative contribution of the associated parameter to bioactivity.
There are two important caveats in applying multiple regression
analysis. The first is based on the fact that, given enough parameters of any
data set can be fitted to a regression line. The consequence of this is that
regression analysis generally requires significantly more compounds that
parameters up to 3-6 times. The difficulty is that regression analysis is most
effective for interpolation and is extrapolation that is most useful in a synthesis
campaign.
There are various statistical measures available for evaluation of the
significance of the model; following are most commonly used [44]:
n - number of molecules
21
k - number of descriptors in a model
df - degree of freedom (n-k-1) (higher is better)
r2 - coefficient of determination (> 0.7)
Q2 - cross-validated r2 (>0.5)
pred_r2 - for external test set (>0.5)
SEE - standard error of estimate (smaller is better)
F-test - F-test for statistical significance of the model (higher is
better, for same set of descriptors and compounds)
Z score - Z score calculated by the randomization test (higher
is better)
SDEP - Standard deviation error of predictivity.
Correlation Coefficient (r) and Coefficient of Determination (r2): The
quantity r, called the linear correlation coefficient, measures the strength and
the direction of a linear relationship between two variables. The coefficient of
determination, r2, is useful because it gives the proportion of the variance
(fluctuation) of one variable that is predictable from the other variable. It is a
measure that allows us to determine how certain one can be in making
predictions from a certain model/graph. It can be calculated as:
2 Sum of Squares of the deviation from the regressionlinerSum of Squares of the deviations from the mean
=
Regression VarianceOriginal Variance
= (1.12)
22
Regression variance is defined as the original variance minus the
variance around the regression line. The original variance is the sum of square
distances of the original data from the mean. If
0 < r2 < 1, it indicates positive correlation
r2 = 0, it shows that there is no linear correlation or week correlation
r2 = 1, it means perfect correlation.
The higher of the r2 value, the less likely that the relationship is due
to chance.
F or Variance Ratio: F-statistic value is a ratio between explained and
unexplained variance for a given number of degree of freedom. The larger the
value of F the greater the probability that the QSAR model is significant.
Z-Score: Z score can be defined as an absolute difference between the values
of the model and the activity field, divided by the square root of the mean
square error of the data set. Any compounds which show Z-score higher than
2.5 in QSAR model is considered as outlier.
e) Validate the equation
Validation technique is used to identify outlines (data that is not
modeled well by the equation). Graphic analysis and cross validation are used
to characterize the robustness the QSAR .There is no single method that works
better for predictiveness, interpretability and computational efficiency.
Cross Validation Technique: As opposed to traditional regression methods,
cross validation [45] evaluates the validity of a model by how well it will
predict data rather than how well it will fit data. The analysis uses the Leave-
One-Out (LOO) scheme. Each compound is left out of the model and the
23
derivation then predicted in turn. An indication of the performance of the
model is obtained from the cross validated r2 which is defined as
2 SD PRESSrSD-
= (1.13)
where SD is sum of squares of deviation for each activity from the mean,
Press is predictive sum of squares which is the sum of the squared
differences between the actual and predicted value.
Once a model is developed which has the highest cross-validated r2
that is used to derive the conventional QSAR equation and conventional r2 and
S values. The final model results are then visualized as contour maps of the
coefficients.
f) Predict Activity
From the QSAR equation obtain, the biological activity of new
compounds can be predicted
QSAR methods are useful in elucidating the mechanism of chemical-
biological interaction in various biomolecules, particularly enzymes,
membranes, organelles and cells. It has also utilized for the evaluation of
absorption, distribution, metabolism and excretion phenomena in organism and
whole animal study. Potential use of QSAR model for screening of chemical
database or virtual libraries before their synthesis appears equally attractive to
chemical manufacturers and pharmaceutical companies.