pharmacophore definition and 3d searches
TRANSCRIPT
TECHNOLOGIES
DRUG DISCOVERY
TODAY
Drug Discovery Today: Technologies Vol. 1, No. 3 2004
Editors-in-Chief
Kelvin Lam – Pfizer, Inc., USA
Henk Timmerman – Vrije Universiteit, The Netherlands
Lead optimization
Pharmacophore definition and 3DsearchesT. Langer1,*, G. Wolber2
1Computer Aided Molecular Design Group, Institute of Pharmacy, University of Innsbruck, Innrain 52, A-6020 Innsbruck, Austria2Inte:Ligand GmbH, Clemens Maria Hofbauer-G. 6, A-2344 Maria Enzersdorf, Austria
The most common pharmacophore building concepts
based on either 3D structure of the target or ligand
information are discussed together with the applica-
tion of such models as queries for 3D database search.
An overview of the key techniques available on the
market is given and differences with respect to algo-
rithms used and performance obtained are highlighted.
Pharmacophore modelling and 3D database search are
shown to be successful tools for enriching screening
experiments aimed at the discovery of novel bio-active
compounds.
*Corresponding author: (T. Langer) [email protected]: http://pharmazie.uibk.ac.at/CAMD
1740-6749/$ � 2004 Elsevier Ltd. All rights reserved. DOI: 10.1016/j.ddtec.2004.11.015
Section Editor:Hugo Kubiniyi – University of Heidelberg, Germany
Pharmacophore models are hypotheses on the 3D arrangement ofstructural properties, such as hydrogen bond donar and acceptor
properties, hydrophobic groups and aromatic rings of compounds thatbind to a biological target. In the presence of the 3D structure of this
target of by comparison with inactive analogs, further geometric and/orsteric constraints can be defined. The article describes and evaluates
strategies and commercial software for pharmacophore definition,starting from the 3D structures of ligand-protein complexes or from
ligands alone. Once a pharmacophore model is established, 3D searchesin large databases can be performed, leading to a significant enrichment
of active analogs.
Introduction
The key goal of computer-aided molecular design methods in
modern medicinal chemistry is to reduce the overall cost
associated with the discovery and development of a new
drug, by identifying the most promising candidates to focus
on the experimental efforts. Often, drug discovery projects
have reached already a well-advanced stage before detailed
structural data on the target has become available. Experi-
mental screening for lead structure determination suffers
from limitation with respect to the possible number of com-
pounds that can be submitted to a high-throughput bio-assay
and with the low number of hits obtained that is in the range
of 0.1% [1]. Within this context, the pharmacophore
approach has proven to be successful, allowing (i) the percep-
tion and understanding of key interactions between a target
and a ligand and (ii) the enrichment of hit rates obtained in
experimental screening of subsets that have been obtained
from in silico screening experiments (Fig. 1) [2].
Key technologies – structure-based pharmacophores
A pharmacophore (pharmacophore model, pharmacophoric
pattern) can be considered as the ensemble of steric and
electrostatic features of different compounds which are
necessary to ensure optimal supramolecular interactions with
a specific biological target structure and to trigger or to block
its biological response [3]. Feature-based pharmacophores
have turned out to be the most effective type of pharmaco-
phore models and the utility of such models as queries for 3D
database search has been reviewed recently [4,5]. The
strength of this type of pharmacophore models is the general
definition of the pharmacophoric points. The chemical func-
tion character allows searching for very diverse structural
scaffolds because multiple structural elements can express
the same chemical function. Pharmacophore key elements
might be a group of atoms, a part of the volume of the
www.drugdiscoverytoday.com 203
Drug Discovery Today: Technologies | Lead optimization Vol. 1, No. 3 2004
Figure 1. Typical pharmacophore-based virtual screening workflow.
molecule, ‘classical’ pharmacophoric features like H-bond
acceptors (HBA) and donors (HBD), charged or ionizable
groups, hydrophobic (HY) and/or aromatic rings (RA)
together with geometrical constraints like distances, angles,
and dihedral angles. The set of these features is termed a
pharmacophoric ‘model’ or ‘hypothesis’. There are different
possibilities to derive pharmacophores models: The way to
determine a 3D pharmacophore is mainly based on the
availability of the three-dimensional structure of the binding
site of the target. When the 3D structure of the target has been
characterized, and when a certain number of ligands (with or
without associated binding affinity) are available, pharmaco-
phore models can be generated directly from the complex
structure of the ligand and the target. Using the LigandScout
program, [6] available from Inte:Ligand GmbH (http://
www.inteligand.com/), is one possibility to derive automati-
cally a feature-based pharmacophore model from a ligand–
target complex structure. In this program, the first step is the
assignment of ligand information on hybridization status
and bond characteristics that is not present in the input data
files from the Protein Databank [7] by using an extended
heuristic approach together with template-based numeric
analysis. Feature-based pharmacophores are then generated
by determining interactions between ligand and target atoms
on the basis of H-bond formation, charge and hydrophobic
contact. These models can be then refined according to
binding data or several models can be combined into one
204 www.drugdiscoverytoday.com
common feature pharmacophore. The capability of searching
3D databases will be implemented shortly. If only 3D infor-
mation on the binding site is available without a ligand
interacting, another approach to derive a pharmacophore
model can be undertaken: Using the structure-based focusing
(SBF) technique within the Cerius2 software package [8],
available from Accelrys Inc (http://www.accelrys.com/)
allows the construction of binding-site pharmacophore
hypotheses. The procedure is mainly based on (i) calculation
of interaction sites using the algorithms defined in the LUDI
program [9], (ii) clustering of the vectors for H-bonding
donating and accepting groups and of the hydrophobic
regions, and (iii) transformation of the obtained clusters into
a feature-based pharmacophore hypothesis representing the
HBA, HBD, and HY functions. The Unity program [10],
available from Tripos Inc (http://www.tripos.com/) also
allows the construction of structural pharmacophore queries
based on molecules, molecular fragments, or receptor sites. In
addition to atoms and bonds, 3D queries can include features
such as lines, planes, centroids, extension points, hydrogen
bond sites, and hydrophobic sites. Distance, angle, excluded
volume, surface volume, and spatial constraints define the
geometric relationships between features. In the molecular
operating environment MOE (Chemical Computing Group,
http://www.chemcomp.com/) [11], 3D pharmacophore
queries can contain locations of features or chemical groups
as well as restrictions on shape. Restrictions on shape can be
Vol. 1, No. 3 2004 Drug Discovery Today: Technologies | Lead optimization
imposed by specifying the included and/or excluded volume
areas. In MOE, the position and the shape of the volume are
defined by a single sphere or by the union of several spheres.
Additionally, a consensus query from not one but a set of
aligned molecules can be used for the 3D-pharmacophore
database search which provides high control, offering both
partial and systematic matching as well as flexible matching
rules.
Key technologies: ligand-based pharmacophores
If only ligand information is available, the identification of a
pharmacophore, in principle, involves two steps: (i) the
analysis of the training set molecules itself to identify phar-
macophoric features, and (ii) the alignment of the assumed
bio-active conformations of the molecules to determine the
best overlay of corresponding features. Conformational flex-
ibility actually represents one of the main difficulties in
pharmacophore generation, because the bio-active confor-
mations of the molecules are usually not known. Several
programs are available for building pharmacophores from
ligand information: Catalyst [12], available from Accelrys Inc,
is by far the most used one, because it offers large flexibility
during pharmacophore generation together with integrated
high-speed 3D database searching capability. Other success-
ful programs are DiscoTech [13], and Gasp [14], both from
Tripos Inc. The main differences between the programs lie in
the algorithms used for the alignment and in the way in
which the conformational flexibility is handled, and how 3D
database search is performed. In Catalyst, conformational
flexibility is handled by computing a series of low-energy
conformers for each molecule using a randomized search
algorithm together with a poling function allowing an exten-
sive coverage of the conformational space. Two major auto-
matic modes for pharmacophore model generation are
implemented: the algorithm for quantitative models Hypo-
Gen and the builder for purely qualitative, that is, common
feature models, HipHop. In the first step, Catalyst checks
surface accessibility of molecules available for receptor inter-
action and then defines the position of different features by
comparison of absolute coordinates of all conformations
stored for the training set molecules rather than by inter-
feature distances. Model building is started with examination
of the two most active molecules given in the training set, and
all possible pharmacophore hypotheses based on the features
available in these both molecules are enumerated. Following
steps reduce the numbers of hypotheses to be considered by
omitting those models that cannot explain the actual bioac-
tivity data by geometric fitting of the molecular structures to
the chemical features. In quantitative models, each chemical
function includes a weight descriptor that is related to its
relative importance in conferring the activity. Catalyst con-
structs multiple hypotheses that can explain and validate the
structure/activity data in a chemically reasonable fashion.
The program provides the ability to cluster and merge
hypotheses to develop more comprehensive models and
can process numbers of conformations up to 255 per com-
pound. In Disco [15], which is the basis for the commercial
product DiscoTech [13], each molecule is characterized by
ligand points and site points. The ligand points include atoms
with hydrogen bond donor, hydrogen bond acceptor, and
hydrophobic character, or negative charge, or positive
charge. Site points represent the hypothetical position of
complementary atoms in the binding site and are determined
from the position of heavy atoms in the ligand structure.
Conformational flexibility in this case is handled by precom-
puting a series of low-energy conformers for each molecule
with each conformer being treated as a rigid body during the
alignment step. A conformer is represented by the interpoint
distances calculated for the ligand and site points and a clique
detection algorithms used to align structures based on these
distances. In Disco, the molecule with the fewest conforma-
tions, following the active analogue approach paradigm [16],
is used as a reference molecule. The output from a Disco run is
a ranked list of all possible pharmacophore mappings where
each feature of a pharmacophore must be present in all the
molecules. This requirement might result in good pharma-
cophores being missed; hence, Disco has the option of find-
ing solutions where some molecules are excluded from the
model. Gasp [14] is based on a genetic algorithm (GA) and
differs from both Catalyst and Disco in its handling of the
conformational problem: Each molecule is input as a single
conformation and conformational analysis together with
random rotations and a random translation are applied on-
the-fly before any superimposition is made. The pharmaco-
phoric features (hydrogen bond donor protons, acceptor
lone-pairs, and ring centers including projected site points,
however, no charges) are determined in all compounds and
the molecule with the least number of features is chosen as
the base molecule to which the other molecules are fitted.
Within the GA, the chromosomes encode the angles of rota-
tion of the rotatable bonds in all of the molecules and the
mapping of the pharmacophoric features in the base mole-
cule to corresponding features in each of the other molecules.
The fitness function first generates conformations for each
molecule and then uses a least-squares procedure to overlay
each molecule onto the base molecule using the mappings.
Fitness is calculated as a combination of the similarity and the
number of the overlaid features, together with the volume
integral of the overlay. Genetic operators attempt to generate
solutions that maximise the fitness function and thus
correspond to the best possible structural overlay. Gasp big-
gest strength over Disco and Catalyst is that it considers
steric overlap of the ligands during pharmacophore
model generation, whereas the latter two only attempt at
matching pharmacophore features without taking shape into
account.
www.drugdiscoverytoday.com 205
Drug Discovery Today: Technologies | Lead optimization Vol. 1, No. 3 2004
Figure 2. 3D Database search strategies.
In a recent paper, results obtained with Catalyst/HipHop,
Gasp, and Disco have been compared and discussed in detail
[17], indicating that Catalyst and Gasp clearly outperform
Disco at reproducing the five target pharmacophores
described in this study. Catalyst and GASP were found to
provide almost equivalent performance even though the
results were not consistent for all the data sets. A very notable
result is that, for both programs, the target pharmacophores
were found within the first 10 solutions in four out of five data
sets. Gasp was found inherently simpler than Catalyst, how-
ever, the latter providing much more flexibility in setting and
tuning parameters. The biggest advantage of Catalyst over
Gasp is that the pharmacophoric features might be custo-
mized according the requirements of the training set under
investigation.
3D database searching
After having generated a pharmacophore model, there are
two ways to identify new molecules which share its features
and can thus exhibit a desired biological response. First,
there is de novo design. This approach seeks to link the parts
of the pharmacophore together with fragments to generate
molecular structures that are chemically reasonable and
novel. The second method is to perform 3D database phar-
macophore searching, providing the main advantage over
de novo design that one is capable of identifying molecules
which can be obtained from corporate compound libraries
or can be synthesized using a well-established protocol. In
the ideal case, 3D database search is able to identify com-
pounds exhibiting properties outside those of the set of
compounds used for building the pharmacophore allowing
the identification of novel chemical structures and mole-
cular features (termed as scaffold hopping, or lead hopping,
respectively). Technically, there are two possibilities to
search 3D molecular databases with pharmacophore
models (Fig. 2): firstly, using a database file format con-
taining a set of well pre-computed conformations, thus
speeding up the search procedure; secondly, calculate con-
formers on-the-fly and perform the fitting analysis subse-
quently. The latter approach has the advantage that mass
storage capacity is not relevant, which has been an issue for
a long time when using multiconformer databases. By con-
trast, using pre-computed conformations for pharmaco-
phore fitting has been demonstrated to outperform the
on-the-fly calculation approach. In Catalyst, both methods
are possible, because normally, Catalyst databases are
stored in multiconformational data format, however,
permitting additional on-the-fly conformational tuning
while fitting molecules to a pharmacophore model. This
allows the searching of large databases, containing up to
several millions of compounds, within a time frame of few
minutes.
206 www.drugdiscoverytoday.com
Strategy comparison
The key players on the market for pharmacophore-based 3D
database search are Accelrys, Tripos, and the Chemical Com-
puting Group, and their software solutions have been dis-
cussed in the previous section. Additional programs are
available on the market, including C@rol, [18] available from
Molecular Networks GmbH (http://www.molnet.de/), Fea-
ture Trees, [19] available from BioSolveIT GmbH (http://
www.biosolveit.de/), and several academic prototypes
described in recent literature review [20]. All commercial
packages allow, more or less, efficient pharmacophore con-
struction and 3D database search. The MOE system (Chemi-
cal Computing Group) is a highly integrated, however, easily
customizable molecular modelling environment, in which
the pharmacophore approach is well embedded. The corre-
sponding Tripos product, Sybyl, contains the modules Gasp
and Disco for pharmacophore building; for 3D database
search, the integrated system Unity is to be used. Both
Vol. 1, No. 3 2004 Drug Discovery Today: Technologies | Lead optimization
environments are also well integrated and the Sybyl Program-
ming Language (SPL) enables the users to automate many
procedures, including analysis of pharmacophores, hit lists,
etc. The highest performance concerning 3D database search
speed and pharmacophore model customization is offered by
Accelrys’ products Catalyst and Cerius2, however, the cum-
bersome graphical interface of the former has been often
criticized [17]. Also the integration between the Accelrys
products is much lower than that offered by products dis-
tributed by their competitors. The choice which software will
be used for a pharmacophore generation and 3D database
search job might depend more on the flavour of the user than
on hard facts based on possibilities offered by the different
packages. In certain well-defined areas, the programs of the
small software companies will offer better solutions than
those of the key players. The success of such spin-off pro-
grams will probably highly depend on the capability of being
integrated into an existing workflow within the drug discov-
ery and development process.
Conclusions
The pharmacophore concept has proven to be extremely
successful, not only in rationalizing structure-activity rela-
tionships, but also by its large impact in developing the
appropriate 3D-tools for efficient virtual screening. Profiling
of combinatorial libraries and compound classification are
other often-used applications of this concept.
The prior use of pharmacophore models in biological
screening of compounds is an efficient procedure, because
it eliminates quickly molecules that do not possess the
required features thus leading to a dramatic increase of
enrichment, when compared to a purely random screening
experiment. One should not forget, however, that additional
molecular characteristics not reflected by pharmacophore
models (physico-chemical, ADME and toxicological proper-
ties) must be taken into account when deciding upon which
compounds should be further developed.
References1 Oprea, T.I. (2002) Current trends in lead discovery: are we looking for the
appropriate properties? J. Comput. Aided Mol. Des. 16, 325–334
2 Hoffmann, R.D. et al. (2004) Use of 3D pharmacophore searching. In
Computational Medicinal Chemistry and Drug Discovery (Tollenaere,
J., De Winter, H., Langenaeker, W., Bultinck, P. eds), pp. 461–482,
Dekker Inc
3 Wermuth, C.-G. and Langer, T. (1993) Pharmacophore identification. In
3D-QSAR in Drug Design. Theory, Methods, and Applications (Kubinyi,
H., ed.), pp. 117–136, ESCOM Science Publishers
4 Kurogi, Y. and Guner, O.F. (2001) Pharmacophore modeling and three-
dimensional database searching for drug design using catalyst. Curr. Med.
Chem. 8, 1035–1055
5 Langer, T. and Krovat, E-M. (2003) Chemical feature-based pharmaco-
phores and virtual library screening for discovery of new leads. Curr. Opin.
Drug Discov. Dev. 6, 370–376
6 Wolber, G. and Langer, T. (2004) LigandScout: 3D Pharmacophores
derived from protein-bound ligands and their use as virtual screening
filters. J. Chem. Inf. Comput. Sci. Webrelease 24 Nov. 2004, doi:10.1021/
ci049885e
7 Berman, H. et al. (2000) The protein data bank. Nucleic Acids Res. 28, 235–
242
8 Cerius2 available from Accelrys Inc, San Diego, CA, USA
9 Bohm, H-J. (1992) The computer program LUDI: a new method for the
de novo design of enzyme inhibitors. J. Comput. Aided Mol. Des. 6,
61–78
10 Unity/Sybyl available from Tripos Inc., St. Louis, MO, USA
11 MOE available from Chemical Computing Group Inc., Quebec, Canada
12 Catalyst available from Accelrys Inc, San Diego, CA, USA
13 DiscoTech available from Tripos Inc., St. Louis, MO, USA
14 Gasp available from Tripos Inc., St. Louis, MO, USA
15 Martin, Y.C. et al. (1993) A fast new approach to pharmacophore mapping
and its application to dopaminergic and benzodiazepine agonists. J.
Comput. Aided Mol. Des. 7, 83–102
16 Marshall, G.R. et al. (1979) The conformational parameter in drug design:
the active analogue approach. In Computer-Assisted Drug Design, (Vol.
112) (Olson, E.C., Christoffersen, R.E. eds), pp. 205–226, American
Chemical Society
17 Patel, Y. et al. (2002) A comparison of the pharmacophore identification
programs: Catalyst, DISCO and GASP. J. Comput. Aided Mol. Des. 16,
653–681
18 C@rol available from Molecular Networks GmbH, Erlangen, Germany
19 Feature Trees available from BioSolveIT GmbH, Sankt Augustin, Ger-
many
20 Van Drie, J.H. (2003) Pharmcophore discovery – lessons learned, Curr.
Pharm. Des. 9, 1649–1664
www.drugdiscoverytoday.com 207