Transcript
Page 1: Use of quantitative structure-activity relationships (QSAR) in drug design (review)

SEARCH FOR NEW DRUGS

USE OF QUANTITATIVE STRUCTURE--ACTIVITY RELATIONSHIPS (QSAR)

IN DRUG DESIGN (Review)*

C. Hansch UDC 615.015.11(048.8)

IMPORTANCE OF THE STRUCTURE--ACTIVITY PROBLEM

A knowledge of the relationship between the chemical structure of organic compounds and their biological activity serves as a basis in studies on the production of new drugs and pesticides, and also in estimating the toxicity of chemical compounds around us. At the same time, showing the structure--activity relationships (SAR) is important in the de- velopment of fundamental sciences, i.e., biochemistry and molecular biology. In recent years, chemists very often turned from the study of SAR in pure organic chemistry to the investigation of compounds which are of interest in biology. This led to the study of quantitative structure--activity relationships (QSAR). Several approaches are used to ex- plain the relationship between the chemical structure and biological response [1-8]. The present paper discusses one such approach, namely, the use of the substituent constants and regression analysis.

If we know that a molecule causes a certain desired biological response, how can we intensify this property? Despite the very great complexity of this problem, intuitively understood by chemists working in medicine, it has still been insufficiently clarified in the literature.

To formulate the solutions, it is useful to consider the scope of the problem.

O 0 0 H

7 8

(I)

Let us assume that naphthoic acid (I) exhibits a certain desired biological activity, which we hope to intensify by modifying its structure. The potential possibilities are very great. For example, we have recently published [7] a list of 166 clearly characterized substi- tuents; selection of only 50 of them, and their substitution into the free positions (2-8) of the structure of I, would give 507 or nearly 7,800,000,000,000 possible derivatives. Even a more limited investigation using 50 substituents taken three at a time only (in any of the seven free positions) would give 4,375,000 derivatives. The situation, of course, can still be more complicated if we take into account the possibility of introducing several heteroatoms into the ring, or replacing the carboxylic group by other groups, such as CONHR, CONR,R2, COOR, etc. The modification program of a drug can specify a study of only a few hundred, and in an extreme case, a few thousand derivatives. Which compounds out of l0 s or lO s possible derivatives are the most promising? How should we choose the initial set of compounds for testing?

During the last 20 years we studied the problem of using substituent constants by em- ploying computerized regression analysis to find a set of suitable constants and a math- ematical model of QSAR. In this approach the main postulate is the relationship

A biological response (BR) = f(AEI + AE2 + ... + AEn). (i)

*Corvin Hansch, Thoughts on the Use of QSAR in Drug Design (review),

Pomona College, Clermont, California, U.S.A. Translated from Khimiko-Farmatsevticheskli Zhurnal, Vol. 14, No. i0, pp. 15-30, October, 1980. Original article submitted December 26, 1979.

678 0091-150X/81/1410-0678507.50 �9 1981 Plenum Publishing Corporation

Page 2: Use of quantitative structure-activity relationships (QSAR) in drug design (review)

According to equation (i) it is implied that when changes are introduced into the structure of a parent compound, the change in the biological response (set of perturbations), the ABR, caused by the structural changes, can be related to changes in certain physico- chemical parameters E~, E~, ..., E n. In other words, the introduction of a substituent, for example CI, into I leads to changes in the electron distribution, hydrophobicity and steric properties. An analysis of perturbations (ABR) observed in a series of related compounds can show which El, ..., E n properties play an important role in the manifestation of the observed BR.

The problem is to find the corresponding numerical values of El, ..., E n from the data obtained on model systems. Up to the present, the hydrophobic (~ or log P), electronic (oi and o R or F and R) and steric (E S and MR) parameters have been studied most. We have still no suitable general scale for hydrogen bonding, but research workers in several lab- oratories advanced considerably in the study of this problem.

The necessity of taking into consideration numerous parameters for different positions very much complicates the SAR problem. For structure I we assume that substitution in posi- tions 2-8 has been carried out. The substituents in each position can lead to a perturba- tion of the initial structure with respect to its hydrophobic properties (modeled by means of 7a, ..., ~s), its intermolecular steric effects during interaction with a receptor (modeled by MRa, ..., MRs), its electron distribution caused by inductive (oi: , ..., ols) and reson- ance (OR~ , ..., OR6) effects, and also other effects, such as hydrogen bonding, etc. Since for a given biological system, we do not know a p~ior~ whether all these effects will be appreciable, or only some of them, we must consider all the 28 possible variables, and also terms of higher order, for example Z~a, MRS, etc. We can theoretically assume that terms of an equation in the form of mixed products, such as ~=.MRs or ols. ~ etc., .may also be im- portant. If only pairs of mixed products are considered (neglecting mixed products of three or more variables Vl.V2-Vs, etc.), their number would be n(n-- 1)/2. For 28 variables, this reaches a considerable number of 378; in the presence of 28 first-order terms, 28 second- order terms and 378 mixed products, we would have to check 2 ~s4 -- i (2n -- I) equations. Fortunately, as we shall see, most of the changes occurring in a normml set of related com- pounds are probably determined by a relatively small number of variables, nevertheless, we must not forget the potential complexity of the biological SAR.

Another serious problem in setting up the SAR problem is to determine whether all the related compounds from the set studied react qualitatively in the same way with the same re- ceptors. Kutter [9] formulated this problem in the form of a harmonious theory. A proper classification of drugs into subgroups of similar compounds is by no means a simple problem; to solve this problem different forms of the pattern identification theory are now being used [3, 10-15].

NUMERICAL EXPRESSIONS OF STRUCTURE-~ACTIVITY RELATIONSHIPS

Most of us are familiar with the known saying of Lord Kelvin: "If it is impossible to measure, then our knowledge is sparse and insufficient." Since only a hypothesis can be ex- pressed in a mathematical form, the hypothesis is clear, even if it cannot be successfully confirmed experimentally. Even an imperfect correlation equation is of great help in the con- sideration of a problem and the determination of the nature of the experiment.

In 1935 Hammett [16] made a great advance when he formulated what is today called the Hammett equation:

i o g k = pG -7 cons t . ( 2 )

In this expression, which stimulated the appearance of many thousands of papers, k is the rate constant or the equilibrium constant of any organic reaction, and o is the substitu- ent constant determined as:

G x = log K x - - log K ~ ,

where K H is the ionization constant of benzoic acid; K X the ionization constant of its meta- or para-substituted derivative. On this scale o H = 0, the electron acceptor substituents have positive values (O4_NO ~ = 0.78), and the electron-donor groups negative values (~4-OCHs = --0.37). The Hammett postulate is a simple one, long since intuitively used by organic chem- ists~ it states that the electronic effect of the substituents on the given reaction of the

679

Page 3: Use of quantitative structure-activity relationships (QSAR) in drug design (review)

side chain of an aromatic ring is parallel to the influence of the substituents on the ioni- zation of benzoic acids. The coefficient p is a measure of the sensitivity of the reaction to the effects of the substituents. As soon as the numerical values for the expression of the electronic effect of the substituents began to be used, the Hammett equation was quickly extended to aliphatic reactions by employing the o* and E S constants,-and also to the special cases when a strong and direct resonance interaction takes place between the substituents and the reaction centers (o-, o +) [7]. The constant o was divided into the inductive and reson- ance constants (Ol, q R or F and R) [7], so that these effects could be considered indepen- dently. At that time so many investigations were carried out using the Hammett equation to clarify the mechanism of organic reactions that the journal "Reaktsionnaya Sposobnost' Or- ganicheskikh Soedinenii" [Reactivity of Organic Compounds] (Tartu, USSR), founded by V. A. Pal'm, is dedicated mainly to the subject of correlation analysis, which appeared as the re- sult of these studies.

The use of a simple Hammett equation (log BR = ao + b) in the biological SAR did not justify expectations, since bioiogical processes rarely depend on electronic effects only. One of the few successful examples is described in the paper by Sydel [17]:

n r $

log 1/C = 1,11o- -- 4,69; 17 0,952 0,144 (3)

where C is the minimal inhibiting molar concentration of sulfanilamide preparations (H=NC6" H~SO=NHC6H4-X), acting on E. ool~! n is the number of related compounds tested; r is corre- lation coefficient! s is standard deviation from the regression. Equation (3) can be used to predict the activity (~n ~t~o) of thousands of sulfanilamide preparations; it is however known that if fairly great changes take place in the electronic effects, the linear relation- ship according to equation (3) becomes distorted, and the results correlate better if the exponential term is introduced=

n r s (4) log l /C-2 ,54pKa- 0,18(pKa) 2 - - 3,07; 40 0,906 0,477

E q u a t i o n (4) t a k e n f r o m t h e p a p e r o f B e l l a n d R o b l i n [ 1 8 ] i s b a s e d on t h e d e p e n d e n c e o f BR on PKa, since for a big series of related compounds, the constant o proved to be an unsuitable parameter. The correlation according to equation(4) gives poo~er results than that accord- ing to equation (3) (compare the s values), and undoubtedly takes into account other effects besides the electronic ones. Nevertheless, equation (4) brought a large number of data into a certain order.

The work of Hammett was continued by Taft, who worked out substituent constants for steric effects E s [7, 19]: E s is determined as the difference between two reaction rate constants of acid hydrolysis of esters:

Es = log k x -- log k H

for the reaction

0 (6) X _ C H 2 C ~ H~O~ X_CH2COO H + EtOH.

' \ O E t

The constant k H refers to a case of X = H. The greater the steric hindrances exerted by substituent X, the lower is the rate of hydrolysis. It was shown that the electronic effect of group X on the reaction rate of acid hydrolysis is inappreciable and can be ne- glected. Although constant E s was determined as a characteristic of intramolecular effects of the substituents, it was shown that it can be also used to take into account intermolecu- far effects [19].

The third parameter required for biological QSAR is that which considers the hydro- phobic (lipophilic) character of the molecule or its substituents, and can be determined similarly to o in equation (7) as in [7, 20]:

z~ x = log Px - - log PH, ( 7 )

680

Page 4: Use of quantitative structure-activity relationships (QSAR) in drug design (review)

where PH is the distribution coefficient in an octanol-water system for the initial molecule; PX is the distribution coefficient for the derivative. Recently, Rekker [21] introduced the

n

hydrophobic fragmentary constant f (log P = ~anfn).

The ~ and f constants are related by equation:

fx= ~x +l~. (8)

The value of fH = 0.23, hence log P, ~, or f can be used in correlation analysis.

We were able to illustrate [52] the use of a linear combination of substituent con- stants by using the data of Taft on transesterification

0 O- IE c,mo~ I

naphthyl -O--CR - - naphthyl -O--C > naphthyl-OH q- RCOOC~H7 (9)

OGH, Equation (i0) describes well the transesterification reaction of six esters with n-

propyl alcohol (9) :

logK : 1 , 6 7 E s - - 1,13; n r $

6 0,989 0,156 (lO)

We can show that the effects of substituents in biochemical processes can be corre- lated with constants obtained from simple homogeneous organic reactions by using data on the acylation of chymotrypsin according to equations (ii) and (12):

O O II li

02NC0 H4OCR + HOCH2---ehymotrypsim--+RC--OCH..- chymotrypsin rk O2NC6H4 OH;

/ I r $

log k2/K m -= 1,76 E s -}- 0,79a + 2,28; 8 0,981 0,201

(ii)

(12)

Since the coefficients at E s in equations (i0) and (12) are substantially the samej we can assume that the steric effects in these two processes are similar; however, for a better correlation of the heterogeneous reaction (ll), an additional term is necessary. Equation (12) was obtained by using parameter 7, but it is known that the use of MR instead of ~ gives qualitatively similar correlation; this is because for the eight alkyl derivatives studied, ~ and MR are almost colinear. In fact, from another paper [23] it is known that MR is an extremely suitable parameter. It is possible that here two steric effects partici- pate: The intramolecular effect correlated with Es, and the intermolecular effect, correlated with MR or ~. The latter steric effect facilitates reactions, while the former hinders them, since the E s values become more negative with increase in the size of X, and this term is deducted from log k2/K m. It is surprising that the negative steric effects in the enzymatic process are not higher than those in equation (i0), which is related to simple alcohols: i.e., the coefficient at E s in equation (12) is only a little higher than that at E s in equation (i0). By using correlation analysis we were able to explain the remarkable effectiveness of the hydrolase.

The importance of correlation analysis in a comparison of the effects of the substi- tuents in living and non-living systems can be illustrated [24] on the example of phosphoric esters (II) acting as cholinesterase inhibitors:

X--C~H~OPO(OEt)2 (II)

Correlation of cholinesterase inhibition:

tZ r S

-- log I~o = 2,37o- + 4,38; 6 0,985 0,297 (13)

Toxicity for domestic flies: n r s

- - I o g L D ~ o = 2 , 6 5 o - T O , 3 1 1 o g P @ 2 , 4 4 ; 8 0,990 0,206 (14)

681

Page 5: Use of quantitative structure-activity relationships (QSAR) in drug design (review)

To ensure a high correlation of cholinesterase inhibition (Eq. (13)), only a single parameter is necessary. In this case, the SAR is similar to that expressed by Eq. (3). It is clear that group X of phosphate esters is not in contact with the enzyme, so that the term containing ~ or MR is not necessary to describe the nonspecific interaction of X with the enzyme, as in the case of equation (ii). If we deal with living flies, and when drugs have to infiltrate to the site of action, equation (14) includes distribution processes of materials, and therefore a term containing log P is necessary. The coefficients at o- in equation (13) and (14) are remarkably similar, if we consider the large difference between the systems studied.

Equations (8)-(13) show that parameters Es, o, and 7, obtained from a study of homo- geneous processes can be used for correlations of heterogeneous reactions taking place on enzymes in vivo and in vitro.

The fourth parameter, namely molar refraction (MR), whose value probably increases in the QSAR problem, is determined by equation:

n 2- 1 M M~ = ~ . -7-, (15)

where n is refractive index; M is molecular weight, d is density of materials. MR, like ~, is an additively constitutive parameter of organic compounds. It can be considered [23] as some kind of "corrected" molar volume, since n changes little in the case of organic com- pounds (approximately from 1.3 to 1.6), while fraction (n = -- l)/(n 2 + 2) does not appreci- ably influence MR. Pauling and Pressman [25] were apparently the first to introduce param- eter MR into biological SAR. They assumed that MR models the dispersion forces. In fact, it has been shown that the correlation of the hapten--antibody interaction using MR, proposed by them, is better explained by steric parameters [26, 27]. In any case, MR is a multi- valent parameter, in which both the dispersion forces and molar volume should associate.

As a parameter of a different type, the "indicator variable" [28-30] (the so-called "fictive parameter") is used in QSAR. The remaining parameters (connectivity of atoms in the molecule [31] and hydrogen bonding) are in a state of development.

All the above parameters may be important in the interaction of a drug with its re- ceptor. There is also the problem of the transfer of the drug molecules from the place of introduction to the site of action, which to a certain extent is governed by the rule of chance. There remains the problem of accounting for the elimination of the drugs by meta- bolic processes, and their removal in urine and excrements. Although these processes are not exaatly clear, they can apparently be well modeled by means of log P or ~ [32, 33].

USE OF QSAR

We can often hear the saying: "Use of QSAR is a way for optimizing the drug design." This means that the research worker obtains an appropriate equation, substitutes in it the parameters for finding the most active drug, and then begins to synthesize it. The author believes that QSAR has a much more important and general aim, namely, it is an example of an effective transformation of medicinal industry from an art into a science. By means of QSARwe can chart the receptors in living systems, separating the parameters related to the transport of the drug and those related to the interaction of the drug with the receptor. Although at present there are many examples showing the predicting value of QSAR [34], infor- mation of the numerous forms of interaction of drugs with macromolecules or macromolecular systems, carefully collected from many small investigations and synthesized by means of QSAR, should lead to deep changes in medicinal chemistry.

ROLE OF HYDROPHOBICITY

The relationships between the lipophilicity of organic compounds (determined by the distribution coefficients in the oil-water system) and their narcotic activity, discovered by Mayer and Overton~ aroused interest at the turn of the century, but by 1920 their work was almost forgotten, except for the anesthetists. Our quantitative determination [7, 20] of the hydrophobic parameters ~ and log P made it possible for biochemists and medicinal chemists to outline the role of hydrophobicity in biological systems. The fact that these forces play a role in the simplest of systems can be seen in the paper of Murakami, et al [36] on the pyrolysis qf p-nitrophenyl acetates, catalyzed by a para-cyclophane (III):

682

Page 6: Use of quantitative structure-activity relationships (QSAR) in drug design (review)

=O AC ~OOO6H4-1qO Z ~ 0 ~ M O ~ H 4 0 H "}" ~GOOH

O=O 1 ~ I i ~ { (OH2,0 F OH[OHZ

r s

log k tel = 0,45~-- 0,53; I1 0,968 0,260 (16)

Equation (16) shows a correlation [35] between the relative rates of hydrolysis, in the course of this reaction, the hydrophobic group R is captured by the hydroPhobic part of the para-cyclophane in such a way that the catalytically acting imidazole residue is in the most favorable position to promote the hydrolysis of the ester. To avoid the formation of mycelia, Murakami, et al. [36] worked at low concentrations. Their research is one of the most interesting of investigations on the production of synthetic and semisynthetic enzymes.

The pioneers in this field, Gitler and Ochoa-Solano, used cetyltrimethylammonium bromide and N-myristyl-L-hystin as the catalytic component during the hydrolysis of p-nitrophenyl acetates. The longer lipophilic myristyl chain ensures the capture of hystin by the mycelia. The results of this investigation can be explained in the following way [37]~ From Eq~ (17) we can see that the greater the lipophilicity of radical R, the higher is the rate of hydrol- ysis

n r s

logkre 1 = 0,C2 !cgP--O,28; 5 0,995 0,060 (17 )

The hydrophobia mycelium appears to concentrate the hystin catalyst and ester and thus increases their effective concentration, and hence their interaction becomes probable. Equation (17) can be compared with the tendency for mycelium formation of the O2NC6H.CH~N +" (CHs) a.RCI- moleculesz

I~ r s

log 1/C = 0,69 l o g P ~ 1,74; 6 0,997 0,120 (18)

where C is the critical concentration of mycelium formation. The coefficients of log P in Eqs. (17) and (18) are practically identical, which shows that QflAR can be used for comparing processes which at the first glance may appear to be different.

The nonspecific toxicity is apparently mainly a phenomenon related to the action through membranes. To understand ~he more complex living systems, we can use studies with simpler nonliving systems. Equations (19)-(25) illustrate this use of QSAR in the analysis of disruptive effects of organic compounds on model and natural membranes.

Disaggregation by silanized glass balls under action of ROH [38]:

n r s

log I/C : 0,98 log P - - 0,80; 4 0,995 0,077

Change i n r e s i s t a n c e o f b l a c k l i p i d m e m b r a n e s u n d e r a c t i o n o f ROH [ 3 8 ] :

n r $

log 1 / C = 1,161ogP--0,S1; 7 0,985 0,262

(19)

(20)

Hemolysis of erythrocytes [38]:

log I/C = 0,93 log p --0,09 (21)

Change in the rest potential of the axon of a lobster under the action of ROH [38]:

f S

log 1/C = 0,87 log P --0,24; 5 0,993 0,100 (22)

I n c r e a s e (20 mV) i n membrane p o t e n t i a l o f B u c c a l g a n g l i o n o f a m o l l u s k u n d e r t h e a c t i o n o f b e n z o i c a c i d s [ 3 9 ] :

,~ r s (23) log 1/'C ~ 0,84 log P _ i o n T 3,31; 30 0,979 0,177

LD~ooro'f a l c o h o l s f o r c a t s [38] :

683

Page 7: Use of quantitative structure-activity relationships (QSAR) in drug design (review)

tz r $

logl /C=l ,061ogP__l ,37; 8 0,986 0,134

A 50% i n h i b i t i o n o f (Na + + K + ) - A T P - a s e b y a l c o h o l s [40] :

(24)

I~ r $

logl/C=O,781ogP@O,11; 12 0,994 0,055 (25)

With the exception~ Of equation (25), the coefficients of log P in. these equations are close to i; hence, the role of hydrophobicity in different processes shouldbe similar. In equation (19), C is the molar concentration of the alcohol, sufficient for disaggregation by very small glass balls covered with oil. The more lipophilic the alcohol, the lower the concentration required for the diaggregation process. Black lipid membranes (Eq. (20) are formed by dissolving lipids of sheep erythroSytes in a hydrocarbon, and pressing the solu- tion through small holes. As a result, thin film of a characteristic lipid double layer is formed. The conductivity of such a membrane under the action of alcohols decreases under the influence of a process most similar to that described by equation (19).

Equation (21) is an average of several different equations correlating hemolysis under the action of several simple neutral organic compounds. The rupture of a living erythrocyte membrane is exactly similar to the rupture of a model system, correlated by equations (19) and (20). The large free term of equation (20) indicates that the concentration of the iso- lipophilic compound (antilogarithm 0.42) must be almost 2.5 times higher to change the re- sistance of the black lipid membrane from 10 8 to 106 Ohm, necessary for hemolysis.

The excitation of the axon of a lobster (Eq. (22)) and the ganglion of a mollusk (Eq. (23)) by lipophilic compounds depends on the hydrophobicity in the same way as the destruc- tion of erythrocytes. In equation (23), P-ion represents the distribution coefficient of the ionized form of benzoic acid. The large free term in equation (23) indicates that ions are much more effective in the destruction of membranes than neutral compounds. The action of of anions is several thousand times stronger than that of neutral molecules.

We found a similar dependence on lipophilicity for several alcohols causing the death of cats. The free term in equation (24) shows that in order to kill a cat, the concentra- tion of the alcohol is only 1/30 of that required to change the rest potential of the axon of a lobster by i0 mV. If finally instead of i0 mV variation employed, we use a 0.I mV vari- ation, equation (22) could have a free term similar to that in equation (24). Such an experi- ment could give a better idea of the level of nerve excitation required to inactivate the neuronic system of a cat, leading to the death of the animal.

Equation (25) describing the inhibition of a membrane enzymehas a somewhat lower value of the coefficient of log P, but its free term is of the same order of magnitude as the free terms in equations (19)-(22), which indicates that the processes of hydrophobic inhibition of the enzyme and the destruction of the membrane are similar; in fact these two processes can be related.

The value of the free terms in equations such as (19)-(25) is determined by two fac- tors." the effectiveness of the pharmacophore group (for example, OH-, COO-, NH2, etc.) and the sensitivity of the system. The latter can change, depending on the characteristic re- quired by the research worker (for example LD~o, LDso, LD,oo). If the pharmacophore group remains constant, we can compare the different systems; if the system remains constant, we can compare different pharmacophore groups.

For example, equations (26)-(29) correlate the fibrinolytic activity (the ability to dissolve blood clots) of four types of organic acids [41]:

~ 000- n r s

x log 1/C=0,52 log P ,-~-1,98(• 22 0,928 0,248 (26) O H

~ O00- x log 1:C=0,47 log P '--0.43Es@1,90(+_0,22) 16 0,883 0,210 (27)

,7@00o- ~ " j ~ u - ~ ' - ~ X log 1.,'C=0,47 log P@2,43(-+-0,10); --k=s 17 0,812 0,165 (28)

684

Page 8: Use of quantitative structure-activity relationships (QSAR) in drug design (review)

CO0-

log I /C=0,54 log P ,'--2,0I(_0,18); 90 0,86t 0, Ii2 ( 2 9 )

The dependence of the fibrinolytic activity on the hydrophobicity is the same for each type of acid (similar coefficients at log P). Except for equation (28), the free terms for all the series of acids are the same (i.e., isolipophilically related compoundscause equiv- alent responses). The free term. in equation (28) clearly differs from those in other equa- tions (confidence interval above 95%), which shows the presence of a more strongly acting pharmacophore group in the phenylanthranilic residue of the molecule. The free terms equal to almQst 2 show that when log P = 0, these related compounds are active at concentrations of 10 -2 M.

The linearity of the relationships leads to the assumption that increase in the lipo- philicity could lead to an increase in the activity. If logP =4, then log I/C= 4. Such compounds can be obtained by introducing lipophilic substituents, for example, into benzoic acid (log P = -2.39): If five I atoms are introduced into the molecule of benzoic acid, log P will increase by a value lower than 5~ I (5.1.27 = 6.25), and will be nearly equal to log P = 4; we can also use three butyl groups (~Bu = 2). In fact, in the case of equations (26)-(29), the most lipophilic of the related compounds has log P-ion =3.06 and log I/C = 3.30; hence we cannot be certain what shall be the limit above 3.06, for which the linear relationship will hold. Equations (26)-(29) show how to conduct most effectively a search for strongly acting fibrinolytic agents. For acids with completely different initial structures, we can synthesize 5 or 6 compounds with fairly different log P values. If the dependence of log I/C on these data does not contain a free term exceeding 2.4, and if the coefficient of log P does not exceed 0.5, we must use a different series of compounds.

The combining of drugs with blood serum proteins was long ago recognized as a problem during the production of drugs. In 1964, it was found that such combining can be studied by the QSAR method [42]. The practical importance of this approach can be illustrated by equations (30) and (31).

The combining of penicillins with human blood serum [43]: r s

log r(B/F) = 0,49n - - O, 63 7 9 0,924 O, 134. ( 3 0 )

The EDso of penicillins with respect to Stap~. aureu8 for mice [42]:

log 1/C = - - 0,45a ~- 5,87 20 0,909 0,191. (31)

Although these two equations were derived in different laboratories from experiments with different penicillins, a consistent pattern is obtained. The negative coefficient in equation (31) shows that the more hydrophilic penicillins are more effective antibacterial agents. Equation (30) shows that the more hydrophobic penicillin, the more readily it com- bines with serum albumin, which explains well the considerable difference in the activity of penicillins in the presence or absence of serum [42]. Many other QSAR were obtained for combining with proteins [44-47]. In general, hydrophobic combining with proteins has a lower value of the coefficient of log P than in the case of membrane combining [38].

Although combining of drugs with serum proteins is considered to be an important rea- son for the losses of lipophilic preparations, their combining with hemoglobin [48] and eryth- rocytes [49, 50] is still more important from the point of view of drug losses~

NONLINEAR RELATIONSHIPS BETWEEN HYDROPHOBICITY AND BIOLOGICAL ACTIVITY

Linear relationships such as equations (19)-(31) imply that the biological activity can be infinitely increased by increasing the lipophilicity of the initial structure. From the experiments it has long been known that this is not so; after a given point, increase in the lipophilicity leads to a decrease in the activity. Several hypotheses have been put for- ward and mathematical expressions have been formulated [32, 51-53] to explain this non- linearity,

Although there are many reasons for nonlinearity [32], one of the most important~ at least in the whole living organism, is the transfer of the drug from the point of introduc- tion to the site of the interaction with the bioreceptors, which to a certain extent is

685

Page 9: Use of quantitative structure-activity relationships (QSAR) in drug design (review)

TABLE i. Distribution Coefficients of CNS Depressants

Preparation log P Preparation logP

1.72 Cyclopropane

Divinyl ether Glutethimide Chloroform Chloroethane Ethrane (CIFCHCF2OCF=H)

1.80 1.90 1.97 2.03 2.10

Ketamine (ketalar) Methoxyfluran (CHaOCFICHCI2) Trichloroethylene Halothane (CFsCHBrCI) Librium (elenium) Valium (diazepam)

2.18

2.2~ 2.29 2.30 2.44 2.82

governed by rules of chance. As we have ascertained, other factors being constant, lipo- philic compounds combine with proteins and membranes proportionally to the values of log Ps Thus drugs with high log P values are retained as the result of combination with the lipo- philic sites, while drugs which are too hydrophilic have a tendency to collect in the water pockets. It follows that for each series of related compounds, there is an optimal value of log P to transport the drug through the biological tissue. In equations (32) and (33), two mathematical models are presented, which were most thoroughly studied, to clarify the role of lipophilicity in the transportation of a drug:

Parabolic model [32]:

Bilinear model [51]:

log ac t iv i ty - - a (log p)2 -l- b log P w c. (32)

log activity = a'logP ~ b'log(~P + i)+ d (33)

Equation (32) represents a simple symmetric parabola, while equation (33), using an additional parameter B, has the form of two straight lines connected by a segment of a curve. Although equation (33) is,apparently, the most flexible for explaining hydrophobic effects, there are many examples where equation (32) ensures better agreement with the experimental data. The reasons are still unclear. The practical importance of equations (32) and (33) is that from these expressions we can calculate the optimal values of log P (log Po) or (To).

We can illustrate the importance of this in designing drugs. In the study of a series of soporific preparations [54~ with different structures (IV-VIII etc.), a mean value of log Po = 2 was found:

0

0 7 2"

q o ~, rz F, O.~Ilio0~oR HC, ~ COHG - - R

The value of log Po is, apparently, independent of the type of animals used in the tests (mice, rats, rabbits, guinea pigs). Of course, it was assumed that these preparations act on the central nervous system (CNS), and if a drug was required with an optimal distri- bution coefficient to ensure its penetration into the CNS, it would be reasonable to begin by selecting compounds with log P = 2. Some of the most effective CNS depressants have been widely applied clinically, and their distribution coefficients in the octanol--water system are listed in Table i.

Other compounds have also been reportedj where the value of log Po for CNS depressants was close to 2 [53-55]. It is assumed that the compounds listed in Table 1 penetrate into the CNS, but by chemical investigations [56] a value of log Po = 2.3 was also determined for a series of phenylboric acids diffusing into the CNS.

686

Page 10: Use of quantitative structure-activity relationships (QSAR) in drug design (review)

0000

o i mo o 8oOo. oi~176174 ~ o

L

Fig. I. Cross section of carboxypentidase, 5.1 thick, perpendicular to axis Z (according to Kwiocho and Lipscomb) showing positions of X, Y for all non- hydrogen atoms, located between the Z values of 8.00 to -12.00. Black circles -- carbon atoms of a side chain. Crosshatched portions have a large fraction of carbons in the side chain. The diameters of atoms are shown on a reduced scale.

Other medicinal compounds have also characteristic values of log Po. The following triazenes acting on leucosis L-1210 have the above values of log Po.

~ X LOI 0,82 4I~

A similar value of log Po is required for heterocycles or benzene derivatives [57] considered as agents for treating leucosis.

THE LIGAND--RECEPTOR INTERACTION

Since very little is known about most of the receptors reacting with drugs, a study of the interactions of ligands with purified enzymes can be used for a deeper examination of the reactions of ligands with macromolecules [58]. Figure i (from the paper of Kuntz [59]) represents the lateral cross section of an enzyme; the black circles represent the hydro- phobic amino acid residues, and white circles the hydrophilic residues. Although there is a considerable heterogeneity, a tendency is observed for the hydrophobic residues to group together and to separate from the aggregates of the hydrophilic residues. Because we know little about the plasticity of proteins, it is difficult to state the magnitude of a ligand which can be accomodated in such pockets; however, such plasticity should be observed, since it was shown that substituents covering a wide range of sizes are distributed in the hydro- phobic pockets of an enzyme, while the linear dependence on ~ or log P holds [60].

How far an individual ligand can be united with an enzyme, at least partially, can be concluded from a study of a specially selected set of related compounds. For example, sub- stituents in specified positions can come into contact with an enzyme, and in other posi- tions cannot. In equation (34), for a better correlation, a term containing ~ is necessary, while the introduction of such term into equation (35) does not improve the correlation. This phenomenon can be easily understood in the case of isolated enzymes, and the research workers must be very aware of these problems when testing drugs on whole organisms.

Hydrolysis of X-phenylglycosides by emulsin [62]:

687

Page 11: Use of quantitative structure-activity relationships (QSAR) in drug design (review)

para-substituents:

meta-substituents:

n F $

log lq/k~ = 0,33n q- 0,610"- -~ 1,801 8 0,964 0,221 ( 3 4 )

log kl/k ~ = 0,950" + 1,63; 6 0,949 0,120 (35)

Another interesting ligand--receptor interaction is observed during the inhibition of dihydrofolate-reductase by triazines of type IX:

~.~ x

OH 3

71 F $

log 1/C = 1,05r(--- 1,21 log (~3.102~.~ - 1) -- 6,64; 28 0,955 0,210 ( 3 6 )

where C is the molar concentration causing 50% inhibition. The best agreement of the data was obtained by using bilinear Kubinyi model. The ascending section of the curve has a slope of 1.05, and the slope of the descending section is equal to --'0.16 (I.05-1.21). We see that the right-hand side of the bilinear model is essentially parallel to the abscissa, and, ap- parently, X is bound hydrophobically only up to ~o = 1.56. Groups with a higher value of ~, apparently D pass outside the enzyme into the surrounding aqueous phase.

There are many cases when the ~ or log P values unsatisfactorily correlate the li- gand--enzyme interactions, but good correlation is observed when MR is used. For example, the hydrolysis of compounds of type X with an L-configuration under the action of chymotryp- sin [61] :

Po

G

O"

log I/Km = 0,77MR~ ~ ! ,13MR~ + 0,47MR3 -- 0,567 ~ 1,35o*--O,055MRI.MR~.MR3 -- n r $

- - 1,64; 84 0,977 0,333 ( 3 7 ) . . . . . . . . . . i . . . . . . -

It is assumed that substituents R~, R2 and Rs fall into the Pl, P2, and P3 spaces of the enzyme. In equation (37),KmisMichaelis constant,MRz refers toRt, etc., lis the in- dicator variable equal to i when R~ is isopropyl and 0 for all the other meanings of R2. The polar Taft parameter (~*) refers only to R3, since no appreciable changes are observed in the a* values for R=, while the electronic effects of RI have been found inappreciable. The negative coefficient at mixed product MR~.MRa-MR3 shows that when the ZMR value becomes too high, combination (and hence, hydrolysis) becomes difficult. The values of the coeffi- cients at MRx, MR2, and MRs show how the different sections of the enzyme surface interact with the substituents. The molar refraction of the substituents is apparently related to the voluminar effects and dispersion forces. When designing drugs, we must bear in mind that in studies on SAR using the data for the whole organism, we must accurately determine the role of ~ or log P in the transport of the drug and the role of w and (or) MR in the interaction of the drug with the receptor, although both of these processes may differ appreciably [62].

QSAR FOR THE WHOLE ORGANISM

An example well illustrating the use of QSAR is the intensification of the activity of the main compound, as shown by the data on the study of phenanthrenecarbinols (XII) acting as antimalarial agents for Po be~ghe~ in nice:

GHOHCHIN~TR 2

" - ~ " ' " ~ " ~ X

688

Page 12: Use of quantitative structure-activity relationships (QSAR) in drug design (review)

t l r S

log I/C ~ 0 .3 !~x+g -7 0,78ox+y v O, 13ZE~ -- O,O15N~ 2 -- 2,35:I02 0,908 0,263 (38)

l o g t / C = O , 2 9 a x + r + O , 9 0 e x + y + O , i t X ~ - - O , O 1 3 E n 2 - - 2 , 4 i ; 2 1 2 0.860 0,319 (39)

The initial application of the QSAR method in the analysis of the data for 102 deriv- atives [63] led to equation (38). After the publication of this equation, data appeared for ii0 new related compounds, and by using these data, equation (39) was obtained [64]. In these equations, C is the molar concentration required to cure 50% of mice. The term ~X+Y refers to the hydrophobicity of the substituents in the two rings of structure XII, while E~ includes the parameters ~ for X~ Y, R~, and R2. The two terms containing Ew indicate a small, but significant decrease in the variance of log I/C. It appears probable that the total hydrophobicity, accounted by means of Z~j is related to random transport routes, while term OX+ Y is due to hydrophobic interactions on the receptor. These equations show that when the second sampling of ii0 compounds is used in the analysis, only very little additional infor- mation is obtained. A somewhat poorer correlation according to equation (39) is partly due to the inclusion of certain thiaphenanthrene and also anthracene analogs. The range of the parameters covered by the two samplings is given below:

~x+Y ~x+Y Xa

S a m p l i n g 1 f r o m - - 0 . 7 7 t o 3 . 3 6 f r o m - - O . 1 7 to 1 . 2 1 f r o m - - 0 . 9 4 to 8 . 5 9

S a m p l i n g 2 f r o m 0 . 0 t o 3 . 7 2 f r o m - - O . 1 4 to 1 . 6 3 f r o m 2 . 2 1 to 9 . 3 0 .

It is possible that research workers studying this problem, for the second hundred of related compounds intuitively used compounds with somewhat more lipophilic substituents and somewhat more electron-accepting groups. Except for these inappreciable changes and the testing of a small number of new structures, for example, thiaphenanthrenes, the more than I00 new derivatives analyzed gave only a few new data. This shows how important the contri- bution of QSAR is in the planning of drugs, and how objective the method of elimination of excess information is in drug design. Equation (39) could be better formulated by using not more than 40-50 carefully selected related compounds, and the use of QSAR would make it pos- sible to determine at early stages which parameters are significant. In addition, it would be possible to use substituents covering a broader range of data. What is the predictability of equations (38) and (39)? Since the coefficients of the two equations are practically the same, and the quality of the two correlations Ks nearly the same, while the variation ranges in a parameter are similar~ the data within the initial sampling of compounds are predicted well by equation (38). It is more interesting to find how well predicted is the activity of related compounds lying outside the sampling of compounds used to derive equation (38). This is illustrated by the data in Table 2.

Compounds from series I (see Table 2) are the most active of the entire set of related compounds used to derive equation (38), while series II contains the most active of the second ii0 related compounds. All six compounds in series II are more active than the most active ones in series Ip but all have been relatively well predicted. Thus, the correla- tional equations can be used for useful prognoses outside the sampling of compounds used for their derivation. However, we must be careful when predicting compounds far outside the limits of the sampling. Here it is useful to mention Fig. i as a model for receptors. There are only limited amounts of relatively homogeneous sections; in the modification of a drug such great changes can be made that the simple linear combination of parameters of substi- tuents will prove to be insufficient. But, of course, it is understood that even the estab- lishment of limits for the applicability during charting of receptors is by itself of great importance.

Recently, equation (39) was supplemented by the inclusion of indicator variables, so that now it correlates the activity of 646 antimalarial preparations [34]. In this large group of compounds there are many examples where two or more drugs have almost the same

activities. These compounds are often called bioisosters, and despite the fact that form- erly different ideas were advanced to explain bioisosterism, none of them was found to be satisfactory. A satisfactory explanation for bioisosterism is given by the QSAR [65].

689

Page 13: Use of quantitative structure-activity relationships (QSAR) in drug design (review)

TABLE 2 .

No,

S~de.s I 1 2 3 4 5

Scries6II l 2 3 4 5 6

Subs'dtuents'in ring

Phenanthrenes Acting against Malaria (P. Berghei)[34] i .ii i ml i i i , i .. w-.-

Ob- log I/C NRR s~rved calculated a log t c I

ft. Eq. (38)

2.4-CI~; 6-CF8 2,3-Br2; 6-CF a 3-CF8: 5.7-CI~ i. 3. 6-Br a 1.3-CI~: 6-CFa 2. 5. 7-Cla

2,4-(CFa)2; 6,7-C12 1.2.3.4-C14: 6-CFa 2,4-(CFa)~: 7-C1 2,4-(CFa)2; 6-Cl 2,4-(CFa)2:6,7-C12 2,4,7-Cla

2- piperidyl N (C4N,)2 2- piperidyl

N(C4Hg)z 2-piperidyl

N(CaHv)2 . 2" piperidyi 2- piperidyl 2- piperidyl 2- piperidyl 2- piperidyl

4,43 4,36 4,35 4,35 4,29 4,29

5.18 4,82 4,82 4.74 4,47 4,46

4.36 4.26 4.21 4.08 4.05 4.29

4.83 5.18 4.34 4.47 4.88 4.26

0.07 0,10 0.14 0,27 0,24 0.28

0,35 0,36 0,48 0,27 0.4l 0.20

CONCLUSION

The problems of improving drugs are so complex that a research worker will never know if he has finally found the "best" compound in the series. In designing drugs good fortune will always play a role [66]. However, the use of QSAR changes over the range of medicinal chemistry being investigated [65]. Our understanding of the distribution and transport processes of drugs in living organisms has considerably broadened [51]. Methods of regres- sion analysis have been developed to such an extent that now the analysis of each proper set of data on the biological response, with rare exceptions, contributes something new to the understanding of an SAR problem. We begin to recognize the molecular parameters which con- trol the reactions of ligands with macromolecules. Unfortuhately, at present there is a lack of coordination in the studies on the QSAR problem; different laboratories individually carrying out research on uncoordinated thematics. It is very necessary to carry out large scale systematic research, in which carefully selected sets of related compounds are studied on isolated receptors or enzymes, on a cellular level, with blood serum proteins, and finally on animals. The results of such investigations will very much broaden our understanding of the action of drugs and the problems in their design.

LITERATURE CITED

i. S.V. Nizhnii and N. A. Epshtein, Usp. Khim., 47, 739-772 (1978). 2. Y.C. Martin, Quantitative Drug Design, New York (1978). 3. A.J. Stuper, W. E. BrHgger, and P. C. Juts, Computer Assisted Studies of Chemical

Structure and Biological Function, New York (1979). 4. C. Hansch, in: Correlation analysis in Chemistry (N. B. Chapman and J. Shoeter,

eds.) New York (1978), Ch. 9. 5. V.E. Golender and A. B. Rozenblum, Computing Methods for Designing Drugs [in Russian],

Riga (1978). 6. Biological Activity and Chemical Structure (J. K. Buisman, ed.), Amsterdam (1977). 7. C. Hansch and A. Leo, Substituent Constants for Correlation Analysis in Chemistry

and Biology, New York (1979). 8. A.J. Hopfinger, Intermolecular Interactions Biomolecular Organization, New York(1977). 9. Arzneimittelentwicklung [Drug Development -- in German], (E. Kutter, ed.), Stuttgart

(1978). I0. W.J. Dunn and S. Wold, J. Med. Chem., 21, 922-930 (1978). Ii. W.J. Dunn, III, and S. Wold, J. Med. Chem., 21, 1001-1007 (1978). 12. S.S. Schiffman, D. A. Reilly, and T. K. Clark, III, Physiol. Behav., 21, i-i0 (1979). 13. J.T. Chou and P. C. Jurs, J. Med. Chem., 22, 792-797 (1979).

690

Page 14: Use of quantitative structure-activity relationships (QSAR) in drug design (review)

14. S. Dove, R. Franke, O. L. Mndshojan, at a!., J. Med. Chem., 22, 90-95 (1979). 15. C.C. Smith, C. S. Genther, and E. A. Coats, Eur. J. Med. Chem., 14, 271-276 (1979). 16. L.P. Hammett, Physica I Organic Chemistry, New York (1970). 17. J.K. Sydel, Mol. Pharmacol., ~, 259-265 (1966). 18. P.H. Bell and R. O. Roblin, J. Am. Chem. Soc., 64, 2905-2917 (1942). 19. S.H. Unger and C. Hansch, Progr. Phys. Org. Chem., 12, 91-118 (1976). 20. T. Fujita, J. Iwasa, and C. Hansch, J. Am. Chem. Sot., 86, 5175-5180 (1964). 21. R.F. Rekker, The Hydrophobic Fragmental Constant, New York (1977). 22. C. Hansch and E. Coats, J. Pharm. Sci., 59, 731-743 (1970). 23. C. Hansch, C. Grieco, C. Silipo, et al., J. Med. Chem., 20, 1420-1435 (1977). 24. T. Fujita, in: Biological Correlations -- the Hansch Approach, Washington (1972),

pp. 7-8. 25. L. Pauling and D. Pressman, J. Am. Chem. Sot., 67, 1003-1012 (1945). 26. E. Kutter and C. Hansch, Arch. Biochem., 135; 126-135 (1969). 27o A. Verloop, W. Hoogenstraaten, and J. Tipker, in: Drug Design, (E. J. Arians, ed.),

New York, Vol. 7 (1976), pp. 165-206. 28. C. Daniel and F. S. Wood, Fitting Equations to Data, New York (1971), pp. 55, 169, 203~ 29~ M. Yoshimoto and C. Hansch, J. Med. Chem., i-9, 71-98 (1976). 30. H. Kubinyi, J. Med. Chem., 20, 625-629 (1977). 31. L~ B. Kier and L. H. Hall, Molecular Connectivity in Chemistry and Drug Research 9

New York (1976). 32. C. Hansch and J. M. Clayton, J. Pharm. Sci., 62, 1-21 (1973). 33. H. Kubinyi, Arzneimittel-Forsch., 27, 750-758 (1977). 34. K.H. Kim, C. Hansch, J. Y. Fukunaga, et al., J. Med. Chem., 22, 366-391 (1979). 35. C. Hansch, J. Org. Chem., 43, 4889-4890 (1978). 36. Y. Murakami, Y~ Aoyama, M. Kida, et al., Bull. Chem. Soc. Jpn., 50 9 3365-3371 (1977). 37. W.J. Dunn, III, and C. Hansch, Chem. Biol. Interact., ~, 75-95 (1974). 38. C. Hansch and W. J. Dunn, III, J. Pharm. Sci., 61, 1-19 (1972). 39. C. Hansch, Intra-Sci. Chem. Rep., ~, 17-35 (1974). 40. Hegyvary, Biochim. Biophys. Acta, 311, 272-291 (1973). 41. C. Hansch, in: Biological Activity and Chemical Structure, (J. A. K. Buisman, ed.),

Amsterdam (1977), pp. 47-61. 42. C. Hansch and A. R. Steward, J. Med. Chem., ~, 691-694 (1964). 43. A~ E. Bird and A. C. Marshall, Biochem. Pharmacol., 16, 2275-2290 (1967). 44. W. Scholtan, Arzneimittel-Forsch., 18, 505-517 (1968). 45. W. Scho!tan, Arzneimettel-Forsch., 28, 1037-1047 (1978). 46. J.M. Vanderbelt, C. Hansch, and C. Church, J. Med. Chem., 15, 787-789 (1972). 47. S.W.M. Koh and G. E. Means, Arch. Biochem., 192, 73-79 (1979) 48. K. Kiehs, C. Hansch, and L. Moore, Biochemistry, ~, 2858-2863 (1968). 49. C. Hansch and W. R. Glave, Mol. Pharmacol., ~, 337-354 (1971). 50. P. Seeman and S. Roth, Biochim. Biophys. Acta, 255, 171-177 (1972). 51. H. Kubinyi, Arzneimittel-Forsch., 29, 1067-1080 (1979). 52. R. Hyde and E. Lord, Eur. J. Med. Chem., 14, 199-202 (1979). 53. J.C. Dearden and K. D. Patel, J. Pharm. Pharmacol., 30, 51P (1978). 54. C. Hansch, A. R. Steward, S. M. Anderson, et al., J. Med. Chem. ii, I-ii (1968). 55. E.D. Druckery, H. Schwartz, and H. Leditschke, Chim. Thero, ~, 188-191 (1972). 56. C. Hansch, in: Drug Design, (E. J. Ari~ns, ed.), Vol. i (1971), p. 300. 57. G.J. Hatheway, C. Hansch, K. H. Kim, et al., J. Med. Chem., 21, 563-574 (1978). 58. C. Hansch, Adv. Pharmacol. Chemotherap., 13, 45-79 (1975). 59. !. D. Kuntz, J. Chem. Soc., 94, 8568-8572 (1972). 60. C. Hansch, R. N. Smith, A. Rockoff, et al., Arch. Biochem., 183, 383-392 (1977). 61. C. Grieco, C. Hansch, C. Sill, o, et al., Arch. Biochem., 194, 542-551 (1979). 62. K.J. Shah and E. A. Coats, J. Med. Chem., 20, 1001-1006 (1977). 63. P.N. Craig and C. Hansch, J. Med. Chem., 16, 661-667 (1973). 64. C. Hansch and J. Y. Fukunaga, Chem. Techn., ~, 120-128 (1977). 65. C. Hansch, J. Med. Chem., 1-9, 1-6 (1976). 66. C. Hansch, J. Chem. Ed., 51, 360-365 (1974).

691


Top Related