dk2777_01
TRANSCRIPT
-
7/30/2019 DK2777_01
1/7
1
Structural Biology and StructuralGenomics: A Federal Agency
Perspective
John C. Norvell
National Institutes of Health, Bethesda, Maryland, U.S.A.
Marvin Cassman
University of California, San Francisco, San Francisco, California,
U.S.A.
The first protein structure took about three decades to complete, and all
protein structures solved in the early years required Herculean efforts. Most
aspects of the process were difficult, time consuming, expensive, labor inten-
sive, and problematic. But during the past decade, technological break-
throughs in protein production, crystallization (still the most trying step),
data collection, structure solution, and refinement have dramatically altered
this picture. Although it is difficult to pick the most significant advance,
development of user-friendly synchrotron beamlines for protein crystallo-graphy is high on the list. Of course, many classes of proteinsnotably,
large protein complexes and membrane proteinsoften still require years of
intense effort and imagination to solve. On the other hand, many soluble
globular proteins can now be solved almost routinely. The power of struc-
tural studies to advance biological understanding was obvious from the
start. Three-dimensional structures have already provided unique insight
into macromolecular function and mechanism. Structure has also become
an important aid for targeted drug design. Additionally, a complete set of
structures can provide insights into the architecture of proteins and itsrelationship to function, as well as protein folding and evolution.
1
2003 by Taylor & Francis Group, LLC
-
7/30/2019 DK2777_01
2/7
An inspection of National Institute of General Medical Sciences
(NIGMS) research grant programs reveals the growth of structural biology.
The NIGMS and all the other institutes of the National Institutes of Health
(NIH) provide research support through several mechanisms, especially
investigator-initiated, hypothesis-driven individual research grants (the
R01s). The success and maturation of structural biology over the past dec-
ade has resulted in major changes in the focus of these crystallographic
grants. Initially, almost all of the crystallographic awards were made to
card-carrying crystallographers, i.e., the experts in the field. Now the
awards focus more on biological significance and less on crystallographic
technique. The number of crystallographic-related grants (i.e., those that
contain at least one major structure project) awarded to principal investi-
gators that are not experienced crystallographers is now twice the numberawarded to the expert crystallographers.
Sources of funding in structural biology have also changed over the
years. In the early 1980s, NIGMS provided most of the research support
for structural biology in the United States and about two-thirds of the
total NIH support. Today, NIGMS contributes only about half of the
NIH support for structural biology. As protein structure studies became
more integral to the research mission of other institutes, the relative
percentage of NIGMS funding decreased. Even so, about 15% of the
institutes current research budget is awarded to projects that involve
high-resolution protein structure determination by crystallography or
nuclear magnetic resonance (NMR) spectroscopy. Funding for protein
crystallography by the Department of Energy (DOE), National Science
Foundation (NSF), and other agencies and foundations has also grown
significantly. The NSF and DOE support of user-based synchrotrons and
numerous protein crystallographic beamlines has been essential to the
growth of the field. In addition, the Howard Hughes Medical Institute
has provided substantial support for many investigators in structural
biology.
In about 1998, motivated by the successes and recent technicaladvances of structural biology and the results and demonstrated value of
genome-sequencing projects, scientists began to consider national and inter-
national effort in structural genomics. The field of structural genomics
can be defined many ways, and all of them are justified. In the broadest
sense, it can be defined as high-throughput structure determination guided
by genomic information to identify targets. Currently, there are federally
funded structural genomics efforts under way in a number of countries,
including the United States, Japan, Germany, Canada, France, the United
Kingdom, and Italy. The U.S. effort, called the Protein Structure Initiative(PSI), is spearheaded by the NIGMS. In addition, numerous industrial
2 Norvell and Cassman
2003 by Taylor & Francis Group, LLC
-
7/30/2019 DK2777_01
3/7
efforts focus on high-throughput structure determination for targeted drug
design.
The goals and approach of the Protein Structure Initiative vary sig-
nificantly from many of the other structural genomics programs. The main
goal of the PSI is to arrive at a complete description of protein structures. In
contrast, the goal of many of the international programs and most of the
private efforts is to obtain structures of select proteins based on medical
interest or other biologically important issues. These programs do not have
any explicit interest in completeness, nor do they address this goal in their
target selection strategies, although both approaches rely on genomic data.
This chapter will focus on the basic research goals and approaches of the
NIGMS program. The PSI is a large-scale, high-throughput effort to
increase the number of structures of unique, nonredundant proteins, permit-ting the study of a broad range of protein structures. The PSI is expected to
provide a minimum of 10,000 selected structures in 10 years.
Many scientists had initially agreed on the value of a complete set of
all protein structures found in nature, but such an undertaking seemed
impossible. Since the numbers of proteins are (as we now know) much larger
than the number of genes in an organism (perhaps by an order of magni-
tude), it is neither feasible nor affordable to consider one-by-one structure
determination of the universe of protein structures. However, as many
experts in the field have discussed, computational analyses of sequence
data permit the classification of proteins into structural families and thus
provide a shortcut method to reach for this completeness: experimentally
determining the structure of a representative of each family, followed by
modeling of the homologous proteins in the family. This approach should
make the problem more manageable.
Although the production of protein structures is increasing at a dizzy-
ing rate (with over 15,000 structures now deposited in the Protein Data
Bank), most of these structures are not uniqueinstead, they are many
variants of the same structures and sequences. Such variants, while they
are important for studying the details of biological mechanisms at theatomic level, do not significantly expand our knowledge of protein structure
space. The goal and major rationale for an organized structural genomics
project, specifically the NIGMS Protein Structure Initiative, is to focus on
structures chosen as family representatives and on methodology develop-
ment, leading to a comprehensive and efficient coverage of protein structure
space. In other words, this effort would form an inventory of all the protein
structures in nature. This inventory would be a public resource freely avail-
able to the scientific community.
However, unlike the Human Genome Project, defining completeness ina structural genomics project is not at all obvious. Completeness might be
Structural Biology and Structural Genomics 3
2003 by Taylor & Francis Group, LLC
-
7/30/2019 DK2777_01
4/7
defined in terms of the number of structures that could be both experimen-
tally determined and modeled by homology. This still leaves plenty of room
for interpretation. A recent paper concludes that the goal is obtaining a set
(of protein structures) such that accurate atomic models can be built for
almost all functional domains (1). Other goals are possible, and complete-
ness is likely to be understood as the project advances and our understand-
ing increases of what the global array of structures looks like.
Experimental details and strategies of structural genomics have been
discussed in numerous meetings and scientific articles over the past few
years. An excellent collection of summary articles can be found in a recent
review (2). The first major meeting to discuss large-scale structure determi-
nation was held at the Argonne National Laboratories in January 1998.
organized by the DOE. This meeting was initiated because of a generalfeeling among a number of investigators and federal science administrators
that the time was ripe to consider developing the same global understanding
of protein structure that was being accomplished for gene sequence. Some
small pilot programs had already been established at the DOE and the
NIGMS. Although the discussants by no means uniformly approved of
an organized national program, enough enthusiasm was generated to
prompt further consideration. The enthusiasm arose from the importance
of protein structures and the perceived benefits of a program of global
structure discovery to biologists of all kinds.
Following the Argonne meeting, the NIGMS spent over a year exam-
ining the need for a national program in structural genomics. Three work-
shops and several advisory meetings were held that included many experts in
the various fields involved, with representation from a wide range of back-
grounds and opinions. These were designed to assess whether a large-scale
effort of the kind proposed was timely and appropriate. The three work-
shops were held between April 1998 and February 1999. Participants con-
cluded that the technology was available, the goals were feasible, and the
benefits justified the effort. Attendance at these workshops included repre-
sentatives not only from the U.S. research community but also fromEurope, Israel, and Japan. It became clear that interest extended beyond
the United States, and that the scale of the program required an interna-
tional effort.
Several international meetings have addressed scientific and policy
issues for this field. The First International Structural Genomics Meeting
was held in the United Kingdom in April 2000, followed by a number of
workshops and meetings, including an Organization for Economic and
Cooperative Development conference in Florence, Italy, in June 2000;
the International Conference on Structural Genomics in Yokohama,Japan, in November 2000; and the Second International Structural
4 Norvell and Cassman
2003 by Taylor & Francis Group, LLC
-
7/30/2019 DK2777_01
5/7
Genomics Meeting at the Airlie Center, Virginia, in April 2001. This last
meeting focused on international cooperation and policies such as data
release, publication, coordinate deposition, and intellectual property.
Information on this and other meetings can be found at the NIGMS
PSI website (3).
The first stage of the NIGMS PSI is the creation of several research
centers that serve as pilots for a future production stage. Each research
center must include all components of structural genomics so that it can
test strategies for large-scale high-throughput structure determination by
X-ray crystallography and/or NMR as well as new computational, experi-
mental, and management approaches. Target selection is left to the indi-
vidual groups, but it must be genome driven. The strategies must focus on
obtaining the maximum number of novel structures as protein familyrepresentatives, but can also include other selection criteria: known func-
tion, unknown function, eukaryotic proteins, pathogenicity, phylogenetic
relationships, minimal genomes, etc. Some classes of proteins, such as
membrane proteins, are not suitable for high throughput at this time
and are thus seldom considered for targets. This could change in the
future, with technical improvements under way, including special projects
in several PSI research centers.
Since these PSI grants are intended to prepare the way for a public
resource, grant-related requirements are more stringent than with individual
research grants. Data release and coordinate deposition cannot be delayed
until publication but instead must be completed within four to six weeks of
structure completion. In addition, the identity and status of target proteins
must be made available on each centers publicly available webpage.
Employment of graduate students and postdoctorals must be justified.
The centers do retain intellectual property rights, but only those consistent
with the data release policy.
Seven PSI research center awards were announced in September 2000.
These centers spent the first year organizing themselves into cohesive units
and hired staff and acquired robotic equipment for protein production andsample preparation. Two additional research center awards were made in
September 2001, bringing the institutes support of these centers to $40
million annually. The NIGMS is planning further efforts in support of the
PSI, including workshops on technical bottlenecks, a centralized target
registration Website at the Protein Data Bank, electronic publication of
structures, a facility for storage of resulting physical materials, and an
experimental results database.
The NIGMS expects these research centers to provide guidance for
the future of the project. The structures produced should provide a morerealistic idea of what will be required to achieve complete coverage of
Structural Biology and Structural Genomics 5
2003 by Taylor & Francis Group, LLC
-
7/30/2019 DK2777_01
6/7
protein structures in nature. One outcome is already apparent and should
benefit all structural biologiststhe development of new high throughput
methods and automated equipment for protein production and crystalliza-
tion. The institute hopes that this inventory of structures of protein family
representatives will serve as a public resource for research scientists from
both the public and private sectors and will be a crucial body of knowl-
edge for studies of protein structure, folding, and evolution. The modeled
structures should also serve for subsequent studies of the relationship of
structure to function and as the starting place for studies of targeted drug
design.
Because of its emphasis on large data sets and completeness, the PSI
can be considered a branch of proteomics, which has been defined as the
analysis of complete complements of proteins (4). For example, the incen-tive is not simply to increase the number of enzymes known, but rather to
achieve in an organized manner a complete assessment of some biological
systems, usually by itemizing their molecular components and defining their
interactions. Why this interest in large-scale data collection, which had pre-
viously been denigrated as fishing expeditions or stamp collecting? The
Human Genome Project clearly demonstrates the value of completeness in
the understanding of biological systems. It is not merely the identification of
new genes but the ability to view the architecture of the genome that has
provided a novel understanding of the organization and evolutionary his-
tory of biological systems. Increasingly, it is the ability to contrast and
compare entire genomes from different organisms, rather than just to exam-
ine the differences between a few individual genes, that underlies the pro-
jects great new insights.
It is incumbent on us, however, to view the new enthusiasms for
large-scale data collection with a grain of skepticism. From antipathy to
such data-collection efforts in the early 1990s, we have now swung over to
the view that any global data collection is worth doing. Although it is hard
to argue that data are not, or may not be, useful, these undertakings are
expensive in manpower and dollars, and need to submit to costbenefitanalyses. The primary issue that should govern any such effort is simple
who benefits? Large-scale programs should have large-scale benefits, both
in the breadth of the scientific community that is affected and in the
potential applicability to many biological questions of interest. The struc-
tural genomics programs are no exception. It is our belief that the com-
pendium of complete protein structures that is planned by the NIGMS
Protein Structure Initiative will be of value not only to structural biolo-
gists, but also to the increasing number of scientists in all branches of
biology who find structural information essential in the course of theirresearch.
6 Norvell and Cassman
2003 by Taylor & Francis Group, LLC
-
7/30/2019 DK2777_01
7/7
REFERENCES
1. D Vitkup, E Melamud, J Moult, C Sander. Completeness in structural geno-
mics. Nat Str Biol 8:559566, 2001.2. Nature Structural Biology, Supplement 7S, November 2001.
3. http://www.nigms.nih.gov/funding/psi.html.
4. S Fields. Proteomics in genomeland. Science 291:12211223, 2001.
Structural Biology and Structural Genomics 7
2003 by Taylor & Francis Group LLC