chapter 1 introduction - shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/6512/6/06_chapter...

1

CHAPTER 1

INTRODUCTION

1.1 GENERAL INTRODUCTION

An integrated and insightful look at successful drug synthesis

depends upon the ability to identify new chemical entities that have potential to

treat diseases in a safe and efficient manner. Many of the drugs in use, in the

last fifty years or more have been of synthetic or semisynthetic origin. The

pharmacopoeias prior to that period were of natural origin. Finding effective

drugs is difficult. Many are discovered by chance observations, the scientific

analysis of folk medicines or by noting the side effect of other drugs.

During the early stage of drug discovery the scientist’s were

primarily concentrated with the isolation of medicinal agents found in plants

For example salicylic acid the precursor of Aspirin was isolated from Willow

bark; Morphine a powerful pain killer from opium poppy, Quinine from

cinchona bark; Digitalis from purple foxglove plants etc. Synthetic chemistry

grew rapidly in sophistication during the first half of 20th century and first

proved its value for pharma by enabling the discovery of Sulfa drugs [1].

Medicinal chemistry received further boost in 1940 as

pharmacology, which until then had been dominated by physiology, became

increasingly biochemical in character with new understanding of the role of

enzymes and cell receptors. Medicinal chemistry is the application of chemical

research technique used to identify, synthesize and develop chemical

2

entities for therapeutic use, which operates as an intersection of chemistry and

pharmacology. It also includes the study of existing drug, their biological

properties and quantitative structure activity relationship (QSAR).

The seeds for the concept of rational drug design were laid in the

1940s and 1950s by George Hitchings and Gertrude Elion in their work on

DNA-based antimetabolites, which led to the discovery of modified purines

with anticancer activity [2]. However, the era of DNA and medicine was

largely stimulated by the elucidation of the double-helical structure of DNA by

Watson and Crick in 1953 [3]. The ramifications of this discovery in DNA

replication, transcription and translation led to a much better understanding of

viral replication. This laid the foundation for antiviral drug discovery in

subsequent decades as molecular targets in the viral replication cycle began to

be identified. The 1950s also saw the discovery of Vancomycin, a glycopeptide

which was developed much later for use against Methicillin-resistant

staphylococci infections. The era of recombinant DNA technology and

molecular cloning began around the mid-1970s. During the 1970s chemists

turned increasingly to rational drug discovery based on structural knowledge of

target and / or ligand, a movement that began a strategic shift away from

natural products towards purely synthetic or natural based compounds. Rational

drug design therefore requires significant knowledge of chemistry as well as

biology, because chemical interaction between drugs and their target are what

which determine whether a drug is biologically active or not.

The first drug developed using the rational approach was antiviral

drug called Zanamivir [4]. This drug was designed to interact with a

neuraminidase, a virus produced enzyme that is requested to release newly

formed virus from the infected cells. Other rationally designed drugs include

HIV drugs such as Ritonavir and Indinavir, both of which interact with viral

proteins.

http://www.wisegeek.com/what-is-hiv.htm

http://www.wisegeek.com/what-is-ritonavir.htm

3

More recently automated high-throughput screening (HTS) system

utilizing cell culture system with lined enzyme assays and receptor molecules

derived from gene cloning have greatly increased the efficiency of random

screening. It is now practical to screen enormous libraries of peptides and

nucleic acid from combinatorial chemistry procedure. With significant

advances in X-ray crystallography and NMR it’s possible to obtain a detailed

representation of enzyme and other drug receptor. The techniques of molecular

graphics and computational chemistry provide novel chemical structures that

have lead to new drug with potent medicinal activities. Development of human

immuno deficiency virus (HIV) protease and angiotensin converting enzyme

inhibitors came from an understanding of the geometric and chemical character

of the receptor enzyme active site [5]. Even if the receptor structure is not

known in detail, rational approaches based on the physiochemical properties of

a lead compound can provide new drug. Despite the progress there still remains

an increasing need for novel innovative therapeutic agents. The aim is to

improve the success in drug development by devising a better method for the

synthesis of lead molecule from easily accessible, affordable and inexpensive

raw material. The majority of drugs used today are distinct products of

synthetic organic chemistry and most of them are heterocyclic derivatives.

1.2 NUCLEUS INTRODUCTION

Compounds classified as heterocycles probably constitute the largest

and most varied family of organic compounds. They are rich sources of diverse

physical, chemical and biological properties. In medicinal chemistry they are

commonly used as template to design biologically active agents. A number of

compounds having heterocyclic nucleus such as thiadiazole, triazole,

benzthiazole, benzoxazole, oxadiazole etc and their derivatives have been

associated with broad spectrum of biological activities [7]. Synthesis of triazole

fused with another heterocyclic ring has attracted widespread attention due to

their diverse applications. Among them symmetrical triazole fused with

4

thiadiazole represent an interesting class of compound since 1,2,4, triazole and

1,3,4 thiadiazole both possess wide spectrum of activities (Figure 1.1).

N

NN

N

S

1

3

2

4

5 6

78

Figure 1.1 3D Structural representation of 1,2,4-triazolo-[3,4-b]-1,3,4-

thiadiazole

Molecular formula - C3H2N4S

Molecular Weight - 126.14

Log P - 0.82

MR - 33.7 [cm3/mol]

Henry's Law - 1.63

CLogP - 0.932756

CMR - 2.9819

Thiadiazole is a versatile moiety and many drugs containing

thiadiazole nucleus are available in market such as diuretic-Acetazolamide,

Methazolamide, antibacterial-Sulphamethazole, antibiotic-Cefazoline etc.,

H2NS

O

ON N

SN

OCH3

CH3

Methazolamide

NH2SO

O

HN S

NNCH3

Sulphamethazole

5

NH2S

O

ONN

SNH

O

H3C

Acetazolamide

NN

SH3C S

NH

O

N

NN

N

Cefazoline

N

S

COOH

The review of literature showed that the thiadiazole derivatives

possess antimicrobial [8-10], anti-inflammatory [11], anticancer [12-14],

anticonvulsant, antidepressant [15], carbonic anhydrase inhibitor [16] and

antioxidant [17] activities.

The derivatives from 1, 2, 4-triazole, possess potent biological

activities such as antifungal [18], antibacterial [19], antiviral [20], antimigraine

[21] activities etc. Some available therapeutically important medicines

containing triazole nucleus are, hypnotic-Triazolam, Estazolam, antifungal

drugs-Fluconazole, Vorconazole and antiviral drug- Ribavirin [22].

Cl

N

NCl

NNH3C

Triazolam

N

NCl

NN

Estazolam

F

FHO

N

N

N

N N

N

Fluconazole

F

FHO

N

N

N

F

H3C

Vorconazole

NN

N

OHO

HO

H2NO

OH

Ribavirin

6

The fused triazole and thiadiazole ring system shows various

biological effects and it is viewed as cyclic analogues of two very important

component thiosemicarbazide and biguanide [23] which often display diverse

biological activities. Literature survey revealed that, s-triazolo-[3, 4-b]-1,3,4-

thiadiazole rings have received much attention during recent years on account

of their prominent utilization as anti-inflammatory [24], antiviral [25],

anthelmentic [26] and antimicrobial [27-29] activities. The anticancer activity

of the ring is reported to be due to its similarity with purine ring [28]. The

thiadiazole ring system (A) of triazolo-thiadiazole take the place of the

pyrimidine ring in the purine ring (B).

NH

N N

NA B

N

S N

NN

A B

Purine Triazolo-thiadiazole

1.3 BIOLOGICAL ACTIVITIES

Biological screening of any synthesized compounds is an important

manifold in drug design. The pharmacophore or lead moiety can be selected

only after the determination of functional groups which are responsible for

various biological activities.

1.3.1 Cytotoxic Activity

Cancer remains a major public health issue at the beginning of 21st

century. It is a disease in which the control of growth is lost in one or more

cells leading to a solid mass of cell knowns as tumor. A growing tumor will

often become life threatening by obstructive vessels or organ; however׳ death is

caused by the spread of the primary tumor to one or more other site in the body

making surgical intervention impossible. It is thought that exposure to the ever

7

increasing carcinogenic chemicals in the environment and the diet may be a

significant contributing factor. It is now known to be a disease that involves

changes to the genome caused by both internal factors like mutation, loss of

genetic material changed gene expression and external factors like viruses,

chemicals, radiations etc.

Cancer treatment often encompasses more than one approach and the

strategy adopted is largely dependent on the nature of the cancer.

Chemotherapy is the one way to fight against cancer. Significant side effects

such as nausea, vomiting, diarrhoea, hair loss and serious infections often

accompany chemotherapy [2]. Therefore, the need for accelerated development

of new, more effective as well as less toxic chemotherapeutic agents have

appeared. For this purpose both invitro and invivo models are employed for

screening anticancer activity. Though animal models provide significant result,

invitro testing is still preferred than invivo testing of a potential

chemotherapeutic agents. Thiadiazoles have been of great interest as antitumor

agents and 1, 2, 3-triazole derivatives have been reported aganist tumor

proliferation, invasion and metastasis [30].

1.3.2 Antioxidant Activity

It is commonly accepted that, in a situation of oxidative stress,

reactive oxygen species such as superoxides, hydroxy and peroxy radicals are

generated. The reactive oxygen species play an important role related to the

degenerative or pathological process of various serious diseases such as cancer,

coronary heart diseases, Alzheimer’s disease, neurodegenerative disorders,

atherosclerosis, cataract, inflammation and ageing [31].

In many infections or disorders there is an excessive phagocytes,

production of O·2, ·OH radical as well as non free radical species (H2O2) which

can harm surrounding tissue either by powerful direct oxidizing action or

indirectly with hydrogen peroxide (H2O2) and ·OH radical formed from oxygen

8

free radical, which results in membrane destruction. Free radicals can be

formed by three ways:

· by homolytic cleavage of a covalent bond of a normal

molecule, with each fragment retaining one of the paired

electrons

· by the addition of single electron from a normal molecule

· by the removal of single electron from a normal molecule

Environment sources such as ultra violet irradiation, ionizing

irradiation and pollutants also produces reactive free radical species.

Peroxidase enzymes such as lipoxgenase can generate free radicals.

Lipoxgenase can react with the free form of Phospholipids A2. Injured cells and

tissues can stimulate the generation of free radicals. The human body possesses

several defence systems against free radicals although it produces free radical

continuously, which comprises of enzyme and radical scavengers. These are

called “first line antioxidant defence system,” but are not completely effective.

The “second line defence system” is the repair system of biomolecules, which

are damaged by the attack of free radicals due to the increased use of

antioxidants in therapy. Ascorbic acid, α-Tocopherol, Probucol, Sylibin and

Gnaphalin are proved to possess antioxidant activity [32].

An antioxidant is a molecule capable of slowing or preventing the

oxidation of other molecules. Oxidation is a chemical reaction that transfers

electrons from substances to oxidizing agent. Oxidising reaction can produce

free radicals which start chain reactions that damage cells. Antioxidant

terminates these chain reactions by removing free radical intermediates and

inhibits other oxidation reactions by being oxidizing them. Hence, the agents

that can scavenge these reactive species can be beneficial in the treatment of

various disorders.

9

1.3.3 Antifungal Activity

Recently research on antifungal agent play a vital role because

immuno compromised patients is very susceptible to invasive fungal infections.

The onset of Acquired immuno deficiency syndrome (AIDS) combined with

increased use of powerful immuno suppressive drugs for organ transplants and

cancer chemotherapy has resulted in demand for new antifungal. In 1980s and

1990s a number of safer antifungal drugs such as the azoles were introduced

into clinic. However the widespread use of newer antifungal agents has been

accompanied by increasing reports of resistance. Azole antifungals are the

largest class of antimycotics available today with over 20 drugs in market.

Azoles are five membered ring containing 2 or 3 nitrogen atom either

imidazole or triazoles. All the azoles act by inhibiting ergosterol biosynthesis

[33]. The main target of azole antifungal is the cytochrome P450 dependent 14α-

demethylation of Lanosterol. Inhibition of sterol 14-demethylase results in the

depletion of ergosterol and accumulation of sterol precursor including

Lanosterol alter the structure and function of plasma membrane.

The ideal antifungal agents should be fungicidal with broad spectrum

of activity and also be suitable for oral or intraveneous administration and

possess good pharmacodynamic properties without development of resistance

during therapy. At present none of the clinically used drugs satisfies all these

criteria.

1.4 DRUG DESIGN

Quantitative structure activity relationship (QSAR) is an important

area of chemometrics that has been widely utilized to study the relationship

between chemical structure and biological or other functional activities [34].

QSAR models are widely used for the prediction of physiochemical properties

and biological activities. The success of QSAR approach can be explained by

10

the insight offered into the structural determination of chemical properties and

possibility to estimate the properties of new chemical compounds without the

need to synthesize and test them. The attempt to transform qualitative beliefs

into a quantitative method of activity assessment is known as QSAR. It began

with the work of Hansch and was further developed by others.

QSAR is the process by which chemical structure is quantitatively

correlated with a well defined process, such as biological activity or chemical

reactivity. For example, biological activity can be expressed quantitatively as in

the concentration of a substance required to give a certain biological response.

Additionally, when physicochemical properties or structures are expressed by

numbers, one can form a mathematical relationship or quantitative structure

activity relationship, between the two. QSAR's most general mathematical

form is

Activity = f (physiochemical properties and/or structural properties)

Types of QSAR are based on the dimensionality [35] of molecular

descriptors (Figure 1.2) used:

§ 0D-These are descriptors derived from molecular formula ׳e.g.

molecular weight, number and type of atoms׳ etc.

§ 1D-A substructure list representation of a molecule can be

considered as a one-dimensional (1D) molecular representation

and consists of a list of molecular fragments (e.g׳. functional

groups, rings, bonds, substituents׳ etc.).

§ 2D-A molecular graph contains topological or two dimensional

(2D) information. It describes how the atoms are bonded in a

molecule, both the type of bonding and the interaction of

http://en.wikipedia.org/wiki/Chemical_structure

http://en.wikipedia.org/wiki/Correlation

http://en.wikipedia.org/wiki/Biological_activity

11

particular atoms (e.g׳. total path count, molecular connectivity

indices׳ etc.).

§ 3D-These are calculated starting from a geometrical or 3D

representation of a molecule. These descriptors include

molecular surface, molecular volume and other geometrical

properties. There are different types of 3D descriptors .e.g׳

electronic, steric, shape׳ etc.

§ 4D-In addition to the 3D descriptors the 4th dimension is

generally in terms of different conformations or any other

experimental condition.

1D Representation 2D Representation

3D Representation 4D Representation

Figure 1.2 Dimensionality of molecular descriptors

12

There are two main objectives for the development of QSAR:

· Development of predictive and robust QSAR, with a specified

chemical domain, for prediction of activity of untested

molecules.

· It acts as an informative tool by extracting significant patterns

in descriptors related to the measured biological activity leading

to understanding of mechanisms of given biological activity.

This could help in suggesting design of novel molecules with

improved activity profile.

Determination of QSAR generally proceeds as follows [36]:

a) Identify the training set

b) Enter biological activity data

c) Calculate descriptors

d) Generate a QSAR equation

e) Validate the equation

f) Predict the biological activity.

a) Identify the training set: The first step is to choose the molecular structure

to use as the training set and built new structure with provided tool.

b) Enter biological activity data: For each of the molecule in the training set,

the observed biological activities performed manually in laboratory are entered

(dependent parameter).

13

c) Calculate descriptors: Express the ligand in some quantitative manner; that

is select a collection of numbers that characterize the ligand. These numbers

are called molecular descriptors. By using suitable software a wide variety of

molecular descriptors (independent parameter) are calculated. The descriptor is

a measure of the potential contribution of its group to a particular property of

the ligand or parent structure [37]. The normally used molecular descriptors are

electronic, steric, thermodynamic and topological indices [38-42]. The various

parameters have been used in this QSAR studies are given in the table

The QSAR approach uses parameters which have been assigned to

the various chemical groups that can be used to modify the structure of the

drug. The parameter is a measure of the potential contribution of its group to a

particular property of the parent drug. The selection of parameters is an

important step in QSAR study. The various parameters used in QSAR study

are:

a)Thermodynamic Parameters

(i) Heat of Formation: The enthalpy for forming a molecule from its

constituent atom is a measure of the relative thermal stability of a molecule. It

is calculated by quantum-chemical technique and has a wide range of

applicability in conformational analysis, intermolecular modeling and chemical

reaction modeling. The atom limit is 300 atoms or 300 atomic orbitals

(whichever is less) per molecule.

(ii) Partition Coefficient Log P: Log P (the octanol/water partition coefficient)

and molar refractivity are molecular descriptors that can be used to relate

chemical structure to observe chemical behavior. Log P is related to the

hydrophobic character of the molecule. The molecular refractivity index of a

substituent is a combined measure of its size and polarizability.

14

octoct /wat un ionized

water

[Solute]logP log[Solute] -

= (1.1)

The partition coefficient is a ratio of concentrations of un-ionized

compound between the two solutions. To measure the partition coefficient of

ionizable solutes, the pH of the aqueous phase is adjusted such that the

predominant form of the compound is un-ionized. The logarithm of the ratio of

the concentrations of the un-ionized solute in the solvents is called log P.

(iii) Melting Point: The melting point of a solid is the temperature at which the

vapor pressure of the solid and the liquid are equal. At the melting point the

solid and liquid phase exists in equilibrium. When considered as the

temperature of the reverse change from liquid to solid, it is referred as the

freezing point.When the "characteristic freezing point" of a substance is

determined, in fact the actual methodology is almost always "the principle of

observing the disappearance rather than the formation of ice", that is, the

melting point.

(iv) Molar Refractivity (MR): The molar refractivity is a measure of both the

volume of a compound and how easily it is polarized. It is expressed as:

2

2

(n 1)MMR(n 1)d

-=

+ (1.2)

where n is the refraction index

M is the molecular weight

d is the density.

The term mol.wt/density define a volume, while the term ( n2 – 1)/

(n2 + 2) provide a correction factor by defining how easily the substituent can

http://en.wikipedia.org/wiki/Ionization

http://en.wikipedia.org/wiki/PH

http://en.wikipedia.org/wiki/Logarithm

http://en.wikipedia.org/wiki/Concentration

http://en.wikipedia.org/wiki/Solute

http://en.wikipedia.org/wiki/Vapor_pressure

http://en.wikipedia.org/wiki/Melting_point#Melting_point_measurements

15

be polarized. This is particularly significant if the substituent has a π electron or

lone pair of electrons.The positive sign of MR in QSAR equation explains that

the substituent binds to polar surface while a negative sign or nonlinear

relationship indicates steric hindrance at the binding site.

(v) Energy Stretching: Energy stretching is the bond stretching energy. The

value of the E stretching bond energy for pair of atoms joined by a single bond can

be estimated by considering the bond to be a mechanical spring that obeys

Hooke’s law. If r is the stretched length of the bond and r0 is the ideal bond

length, then

E stretching = 1/2 K (r – r0)2 (1.3)

where ro is Ideal bond

r is Stretched bond

K is the force constant in other words, a measure of the strength of

the bond.

If a molecule consist of three atoms, (a-b-c), then

E stretching = E a-b + E b-c

=K(a-b) [r(a-b) – r0(a-b)] +½ k(b-c) [r(b-c) – r0(b-c)]2 (1.4)

(vi) Torsion Energy: E Torsion is the bond enery due to changes in the

conformation of the bond and given by

1 (1 cos( ( ))2TorsionE k m offsetf f f= + +

(1.5)

where kφ is the energy barrier to the rotation about the torsion angle φ m is

the periodicity of the rotation

16

φ is offset is the ideal torsion angle relative to staggered arrangement

of two atoms.

(vii) Energy VDW: The van der Waals interaction energy of the molecule with

the receptor.E vdW is the total energy contribution due to van der Waal’s force

and it is calculated from the Leonard – Jones potential equation

12 6min min

vdw

(r ) (r )E 2r r

= e - (1.6)

The6

min(r )r

term in this equation represents attractive force, while

12min(r )r

term represents the short range of repulsive forces between the atoms.

The r min is the distance between two atoms and when the energy is at a

minimum ε and r is the actual distance between the atoms.

b)Electronic Parameter

(i) Energy Bend: E bend is the bond energy due to the changes in the bond angle

and estimated as:

E bend = ½ kӨ (Ө-Ө0)2 (1.7)

where θ0 is the ideal bond length that is the minimum energy position of the

3 atoms.

(ii) Highest Occupied Molecular Orbital Energy (HOMO): HOMO (highest

occupied molecular orbital) is the highest energy level in the molecule that

contains electrons. It is crucially important in governing molecular reactivity

and properties. When a molecule acts as a Lewis base (an electron-pair donor)

in bond formation, the electrons are supplied from the molecule's HOMO. How

17

readily this occurs is reflected in the energy of the HOMO. Molecules with

high HOMOs are more able to donate their electrons and are hence relatively

reactive when compared to molecules with low-lying HOMOs; thus the HOMO

descriptor measure the nucleophilicity of a molecule.

(iii) Lowest Unoccupied Molecular Orbital Energy (LUMO): LUMO (lowestunoccupied molecular orbital) is the lowest energy level in the molecule thatcontains no electrons. It is important in governing molecular reactivity andproperties.When a molecule acts as a Lewis acid (an electron-pair acceptor) inbond formation, incoming electron pairs are received in its LUMO. Moleculeswith low-lying LUMOs are more able to accept electrons more than those withhigh LUMOs; thus the LUMO descriptor should measure the electrophilicity ofa molecule.

c)Steric Parameter

(i) Ovality: Ovality or non-circularity is the degree of deviation from perfectcircularity of the cross section of the core or cladding of the fibre.Quantitatively, the Ovality of either the core or lading is expressed as,

(a b)2(a b)++

(1.8)

where a is the length of major axisb is the length of minor axis.

(ii) Dipole Moment: The dipole moment descriptor is a 3D electronicdescriptor that indicates the strength and orientation behavior of a molecule inan electrostatic field. Both the magnitude and the components (X, Y, and Z) ofthe dipole moment are calculated. It is estimated by utilizing partial atomiccharges and atomic coordinates. The descriptor uses Debye units. Dipoleproperties have been correlated to long-range ligand-receptor recognition andsubsequent binding.

18

(iii) Balaban Index: The Balaban Index ‘J’ is a graph index defined for agraph on n nodes and m edges. This is a highly discriminating descriptor,whose values do not substantially increase with molecule size and the numberof rings present. Its evaluation begins with the D-matrix modified as follows:

· Each edge contributes length 1/b to overall path lengths, whereb is the edge (bond) order.

· For aromatic bonds, the number b is set to 1.5 by definition(thus contributing 2/3 to overall path lengths).

n n 1/2i 1 j 1

mJ (DiDj)1

-= =

=g + å å (1.9)

where = m – n +1 is the circuit tank of the graph

Di is the sum of all entries in the ith (or column) of the graphdistance matrix.

Balaban Index helps to differentiate the molecule according to theirshape

(iv) Connolly Solvent Accessible Area (Angstrom2): The locus of the center ofa spherical probe as it is rolled over the molecular model.

(v) Connolly Molecular Surface Area (Angstrom2): The contact surfacecreated when a spherical probe is rolled over the molecular model.

(vi) Connolly Solvent Excluded Volume (Angstrom3): The volume containedwithin the contact molecular surface.

(vii) Principle Moment of Inertia(X,Y,Z): The moment of inertia of the wholebody with respect to one of the principal axes is known as Principle Moment of

19

Inertia. The moments of inertia are computed for a series of straight linesthrough the center of mass. The moments of inertia are given by:

2i i 1I m d=å 2 (1.10)

If all the three moments are equal, the molecule is considered to be a

symmetrical top.

(viii)Wiener Index (W): The Wiener index is the sum of the chemical bonds

existing between all pairs of heavy atoms in the molecule. In graph-theoretical

terms: the sum of lengths of minimal paths between all pairs of vertices

representing heavy atoms. This is equal to half the sum of all D-matrix entries

Di j ij

1W a2

= å å (1.11)

d) Generate a QSAR equation

Determine the functional relationship between activity and the

selected descriptors; that is, search for mathematical function f, that has a

property that, activity= f (descriptor) to a suitably high level of accuracy i.e׳

after identifying the dependent and independent variables a suitable statistical

method is used to generate a QSAR equation [43]. The statistical methods can

be broadly divided into two: linear and non-linear methods. In statistics a

correlation is established between dependent variable(s) (biological activity)

and independent variable(s) (molecular descriptors).

The linear method fits a line between the selected descriptors and

activity as compared to non-linear method which fits a curve between the

selected descriptors and activity. The statistical method to build QSAR model

is decided based on the type of biological activity data.

20

Following are few commonly used statistical methods:

· Categorical Dependent Variable - Discriminant analysis, Logistic

regression, k-Nearest Neighbour classification, Decision Trees.

· Continuous Dependent Variable - Multiple regression, Principal

Component Regression, Continuum Regression, Partial Least

Squares Regression, Canonical Correlation Analysis, k-Nearest

Neighbor method, Neural Networks.

Multiple regression is the widely used method for building QSAR

model. It is simple to interpret a regression model, in which contribution of

each descriptor could be seen by the magnitude and sign of its regression

coefficient. Multiple linear regression attempts to maximize the fit of the data

to a regression equation for the biological activity by adjusting each of the

parameters upon down. Successive regression equations will be derived in

which parameters will be either added or removed until the r2 and S values are

optimized. The magnitude of coefficients derived in this manner that indicates

the relative contribution of the associated parameter to bioactivity.

There are two important caveats in applying multiple regression

analysis. The first is based on the fact that, given enough parameters of any

data set can be fitted to a regression line. The consequence of this is that

regression analysis generally requires significantly more compounds that

parameters up to 3-6 times. The difficulty is that regression analysis is most

effective for interpolation and is extrapolation that is most useful in a synthesis

campaign.

There are various statistical measures available for evaluation of the

significance of the model; following are most commonly used [44]:

n - number of molecules

21

k - number of descriptors in a model

df - degree of freedom (n-k-1) (higher is better)

r2 - coefficient of determination (> 0.7)

Q2 - cross-validated r2 (>0.5)

pred_r2 - for external test set (>0.5)

SEE - standard error of estimate (smaller is better)

F-test - F-test for statistical significance of the model (higher is

better, for same set of descriptors and compounds)

Z score - Z score calculated by the randomization test (higher

is better)

SDEP - Standard deviation error of predictivity.

Correlation Coefficient (r) and Coefficient of Determination (r2): The

quantity r, called the linear correlation coefficient, measures the strength and

the direction of a linear relationship between two variables. The coefficient of

determination, r2, is useful because it gives the proportion of the variance

(fluctuation) of one variable that is predictable from the other variable. It is a

measure that allows us to determine how certain one can be in making

predictions from a certain model/graph. It can be calculated as:

2 Sum of Squares of the deviation from the regressionlinerSum of Squares of the deviations from the mean

=

Regression VarianceOriginal Variance

= (1.12)

22

Regression variance is defined as the original variance minus the

variance around the regression line. The original variance is the sum of square

distances of the original data from the mean. If

0 < r2 < 1, it indicates positive correlation

r2 = 0, it shows that there is no linear correlation or week correlation

r2 = 1, it means perfect correlation.

The higher of the r2 value, the less likely that the relationship is due

to chance.

F or Variance Ratio: F-statistic value is a ratio between explained and

unexplained variance for a given number of degree of freedom. The larger the

value of F the greater the probability that the QSAR model is significant.

Z-Score: Z score can be defined as an absolute difference between the values

of the model and the activity field, divided by the square root of the mean

square error of the data set. Any compounds which show Z-score higher than

2.5 in QSAR model is considered as outlier.

e) Validate the equation

Validation technique is used to identify outlines (data that is not

modeled well by the equation). Graphic analysis and cross validation are used

to characterize the robustness the QSAR .There is no single method that works

better for predictiveness, interpretability and computational efficiency.

Cross Validation Technique: As opposed to traditional regression methods,

cross validation [45] evaluates the validity of a model by how well it will

predict data rather than how well it will fit data. The analysis uses the Leave-

One-Out (LOO) scheme. Each compound is left out of the model and the

23

derivation then predicted in turn. An indication of the performance of the

model is obtained from the cross validated r2 which is defined as

2 SD PRESSrSD-

= (1.13)

where SD is sum of squares of deviation for each activity from the mean,

Press is predictive sum of squares which is the sum of the squared

differences between the actual and predicted value.

Once a model is developed which has the highest cross-validated r2

that is used to derive the conventional QSAR equation and conventional r2 and

S values. The final model results are then visualized as contour maps of the

coefficients.

f) Predict Activity

From the QSAR equation obtain, the biological activity of new

compounds can be predicted

QSAR methods are useful in elucidating the mechanism of chemical-

biological interaction in various biomolecules, particularly enzymes,

membranes, organelles and cells. It has also utilized for the evaluation of

absorption, distribution, metabolism and excretion phenomena in organism and

whole animal study. Potential use of QSAR model for screening of chemical

database or virtual libraries before their synthesis appears equally attractive to

chemical manufacturers and pharmaceutical companies.

chapter 1 introduction - shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/6512/6/06_chapter...

Documents