dk2777_01

Upload: yaswanth1992

Post on 04-Apr-2018

223 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 DK2777_01

    1/7

    1

    Structural Biology and StructuralGenomics: A Federal Agency

    Perspective

    John C. Norvell

    National Institutes of Health, Bethesda, Maryland, U.S.A.

    Marvin Cassman

    University of California, San Francisco, San Francisco, California,

    U.S.A.

    The first protein structure took about three decades to complete, and all

    protein structures solved in the early years required Herculean efforts. Most

    aspects of the process were difficult, time consuming, expensive, labor inten-

    sive, and problematic. But during the past decade, technological break-

    throughs in protein production, crystallization (still the most trying step),

    data collection, structure solution, and refinement have dramatically altered

    this picture. Although it is difficult to pick the most significant advance,

    development of user-friendly synchrotron beamlines for protein crystallo-graphy is high on the list. Of course, many classes of proteinsnotably,

    large protein complexes and membrane proteinsoften still require years of

    intense effort and imagination to solve. On the other hand, many soluble

    globular proteins can now be solved almost routinely. The power of struc-

    tural studies to advance biological understanding was obvious from the

    start. Three-dimensional structures have already provided unique insight

    into macromolecular function and mechanism. Structure has also become

    an important aid for targeted drug design. Additionally, a complete set of

    structures can provide insights into the architecture of proteins and itsrelationship to function, as well as protein folding and evolution.

    1

    2003 by Taylor & Francis Group, LLC

  • 7/30/2019 DK2777_01

    2/7

    An inspection of National Institute of General Medical Sciences

    (NIGMS) research grant programs reveals the growth of structural biology.

    The NIGMS and all the other institutes of the National Institutes of Health

    (NIH) provide research support through several mechanisms, especially

    investigator-initiated, hypothesis-driven individual research grants (the

    R01s). The success and maturation of structural biology over the past dec-

    ade has resulted in major changes in the focus of these crystallographic

    grants. Initially, almost all of the crystallographic awards were made to

    card-carrying crystallographers, i.e., the experts in the field. Now the

    awards focus more on biological significance and less on crystallographic

    technique. The number of crystallographic-related grants (i.e., those that

    contain at least one major structure project) awarded to principal investi-

    gators that are not experienced crystallographers is now twice the numberawarded to the expert crystallographers.

    Sources of funding in structural biology have also changed over the

    years. In the early 1980s, NIGMS provided most of the research support

    for structural biology in the United States and about two-thirds of the

    total NIH support. Today, NIGMS contributes only about half of the

    NIH support for structural biology. As protein structure studies became

    more integral to the research mission of other institutes, the relative

    percentage of NIGMS funding decreased. Even so, about 15% of the

    institutes current research budget is awarded to projects that involve

    high-resolution protein structure determination by crystallography or

    nuclear magnetic resonance (NMR) spectroscopy. Funding for protein

    crystallography by the Department of Energy (DOE), National Science

    Foundation (NSF), and other agencies and foundations has also grown

    significantly. The NSF and DOE support of user-based synchrotrons and

    numerous protein crystallographic beamlines has been essential to the

    growth of the field. In addition, the Howard Hughes Medical Institute

    has provided substantial support for many investigators in structural

    biology.

    In about 1998, motivated by the successes and recent technicaladvances of structural biology and the results and demonstrated value of

    genome-sequencing projects, scientists began to consider national and inter-

    national effort in structural genomics. The field of structural genomics

    can be defined many ways, and all of them are justified. In the broadest

    sense, it can be defined as high-throughput structure determination guided

    by genomic information to identify targets. Currently, there are federally

    funded structural genomics efforts under way in a number of countries,

    including the United States, Japan, Germany, Canada, France, the United

    Kingdom, and Italy. The U.S. effort, called the Protein Structure Initiative(PSI), is spearheaded by the NIGMS. In addition, numerous industrial

    2 Norvell and Cassman

    2003 by Taylor & Francis Group, LLC

  • 7/30/2019 DK2777_01

    3/7

    efforts focus on high-throughput structure determination for targeted drug

    design.

    The goals and approach of the Protein Structure Initiative vary sig-

    nificantly from many of the other structural genomics programs. The main

    goal of the PSI is to arrive at a complete description of protein structures. In

    contrast, the goal of many of the international programs and most of the

    private efforts is to obtain structures of select proteins based on medical

    interest or other biologically important issues. These programs do not have

    any explicit interest in completeness, nor do they address this goal in their

    target selection strategies, although both approaches rely on genomic data.

    This chapter will focus on the basic research goals and approaches of the

    NIGMS program. The PSI is a large-scale, high-throughput effort to

    increase the number of structures of unique, nonredundant proteins, permit-ting the study of a broad range of protein structures. The PSI is expected to

    provide a minimum of 10,000 selected structures in 10 years.

    Many scientists had initially agreed on the value of a complete set of

    all protein structures found in nature, but such an undertaking seemed

    impossible. Since the numbers of proteins are (as we now know) much larger

    than the number of genes in an organism (perhaps by an order of magni-

    tude), it is neither feasible nor affordable to consider one-by-one structure

    determination of the universe of protein structures. However, as many

    experts in the field have discussed, computational analyses of sequence

    data permit the classification of proteins into structural families and thus

    provide a shortcut method to reach for this completeness: experimentally

    determining the structure of a representative of each family, followed by

    modeling of the homologous proteins in the family. This approach should

    make the problem more manageable.

    Although the production of protein structures is increasing at a dizzy-

    ing rate (with over 15,000 structures now deposited in the Protein Data

    Bank), most of these structures are not uniqueinstead, they are many

    variants of the same structures and sequences. Such variants, while they

    are important for studying the details of biological mechanisms at theatomic level, do not significantly expand our knowledge of protein structure

    space. The goal and major rationale for an organized structural genomics

    project, specifically the NIGMS Protein Structure Initiative, is to focus on

    structures chosen as family representatives and on methodology develop-

    ment, leading to a comprehensive and efficient coverage of protein structure

    space. In other words, this effort would form an inventory of all the protein

    structures in nature. This inventory would be a public resource freely avail-

    able to the scientific community.

    However, unlike the Human Genome Project, defining completeness ina structural genomics project is not at all obvious. Completeness might be

    Structural Biology and Structural Genomics 3

    2003 by Taylor & Francis Group, LLC

  • 7/30/2019 DK2777_01

    4/7

    defined in terms of the number of structures that could be both experimen-

    tally determined and modeled by homology. This still leaves plenty of room

    for interpretation. A recent paper concludes that the goal is obtaining a set

    (of protein structures) such that accurate atomic models can be built for

    almost all functional domains (1). Other goals are possible, and complete-

    ness is likely to be understood as the project advances and our understand-

    ing increases of what the global array of structures looks like.

    Experimental details and strategies of structural genomics have been

    discussed in numerous meetings and scientific articles over the past few

    years. An excellent collection of summary articles can be found in a recent

    review (2). The first major meeting to discuss large-scale structure determi-

    nation was held at the Argonne National Laboratories in January 1998.

    organized by the DOE. This meeting was initiated because of a generalfeeling among a number of investigators and federal science administrators

    that the time was ripe to consider developing the same global understanding

    of protein structure that was being accomplished for gene sequence. Some

    small pilot programs had already been established at the DOE and the

    NIGMS. Although the discussants by no means uniformly approved of

    an organized national program, enough enthusiasm was generated to

    prompt further consideration. The enthusiasm arose from the importance

    of protein structures and the perceived benefits of a program of global

    structure discovery to biologists of all kinds.

    Following the Argonne meeting, the NIGMS spent over a year exam-

    ining the need for a national program in structural genomics. Three work-

    shops and several advisory meetings were held that included many experts in

    the various fields involved, with representation from a wide range of back-

    grounds and opinions. These were designed to assess whether a large-scale

    effort of the kind proposed was timely and appropriate. The three work-

    shops were held between April 1998 and February 1999. Participants con-

    cluded that the technology was available, the goals were feasible, and the

    benefits justified the effort. Attendance at these workshops included repre-

    sentatives not only from the U.S. research community but also fromEurope, Israel, and Japan. It became clear that interest extended beyond

    the United States, and that the scale of the program required an interna-

    tional effort.

    Several international meetings have addressed scientific and policy

    issues for this field. The First International Structural Genomics Meeting

    was held in the United Kingdom in April 2000, followed by a number of

    workshops and meetings, including an Organization for Economic and

    Cooperative Development conference in Florence, Italy, in June 2000;

    the International Conference on Structural Genomics in Yokohama,Japan, in November 2000; and the Second International Structural

    4 Norvell and Cassman

    2003 by Taylor & Francis Group, LLC

  • 7/30/2019 DK2777_01

    5/7

    Genomics Meeting at the Airlie Center, Virginia, in April 2001. This last

    meeting focused on international cooperation and policies such as data

    release, publication, coordinate deposition, and intellectual property.

    Information on this and other meetings can be found at the NIGMS

    PSI website (3).

    The first stage of the NIGMS PSI is the creation of several research

    centers that serve as pilots for a future production stage. Each research

    center must include all components of structural genomics so that it can

    test strategies for large-scale high-throughput structure determination by

    X-ray crystallography and/or NMR as well as new computational, experi-

    mental, and management approaches. Target selection is left to the indi-

    vidual groups, but it must be genome driven. The strategies must focus on

    obtaining the maximum number of novel structures as protein familyrepresentatives, but can also include other selection criteria: known func-

    tion, unknown function, eukaryotic proteins, pathogenicity, phylogenetic

    relationships, minimal genomes, etc. Some classes of proteins, such as

    membrane proteins, are not suitable for high throughput at this time

    and are thus seldom considered for targets. This could change in the

    future, with technical improvements under way, including special projects

    in several PSI research centers.

    Since these PSI grants are intended to prepare the way for a public

    resource, grant-related requirements are more stringent than with individual

    research grants. Data release and coordinate deposition cannot be delayed

    until publication but instead must be completed within four to six weeks of

    structure completion. In addition, the identity and status of target proteins

    must be made available on each centers publicly available webpage.

    Employment of graduate students and postdoctorals must be justified.

    The centers do retain intellectual property rights, but only those consistent

    with the data release policy.

    Seven PSI research center awards were announced in September 2000.

    These centers spent the first year organizing themselves into cohesive units

    and hired staff and acquired robotic equipment for protein production andsample preparation. Two additional research center awards were made in

    September 2001, bringing the institutes support of these centers to $40

    million annually. The NIGMS is planning further efforts in support of the

    PSI, including workshops on technical bottlenecks, a centralized target

    registration Website at the Protein Data Bank, electronic publication of

    structures, a facility for storage of resulting physical materials, and an

    experimental results database.

    The NIGMS expects these research centers to provide guidance for

    the future of the project. The structures produced should provide a morerealistic idea of what will be required to achieve complete coverage of

    Structural Biology and Structural Genomics 5

    2003 by Taylor & Francis Group, LLC

  • 7/30/2019 DK2777_01

    6/7

    protein structures in nature. One outcome is already apparent and should

    benefit all structural biologiststhe development of new high throughput

    methods and automated equipment for protein production and crystalliza-

    tion. The institute hopes that this inventory of structures of protein family

    representatives will serve as a public resource for research scientists from

    both the public and private sectors and will be a crucial body of knowl-

    edge for studies of protein structure, folding, and evolution. The modeled

    structures should also serve for subsequent studies of the relationship of

    structure to function and as the starting place for studies of targeted drug

    design.

    Because of its emphasis on large data sets and completeness, the PSI

    can be considered a branch of proteomics, which has been defined as the

    analysis of complete complements of proteins (4). For example, the incen-tive is not simply to increase the number of enzymes known, but rather to

    achieve in an organized manner a complete assessment of some biological

    systems, usually by itemizing their molecular components and defining their

    interactions. Why this interest in large-scale data collection, which had pre-

    viously been denigrated as fishing expeditions or stamp collecting? The

    Human Genome Project clearly demonstrates the value of completeness in

    the understanding of biological systems. It is not merely the identification of

    new genes but the ability to view the architecture of the genome that has

    provided a novel understanding of the organization and evolutionary his-

    tory of biological systems. Increasingly, it is the ability to contrast and

    compare entire genomes from different organisms, rather than just to exam-

    ine the differences between a few individual genes, that underlies the pro-

    jects great new insights.

    It is incumbent on us, however, to view the new enthusiasms for

    large-scale data collection with a grain of skepticism. From antipathy to

    such data-collection efforts in the early 1990s, we have now swung over to

    the view that any global data collection is worth doing. Although it is hard

    to argue that data are not, or may not be, useful, these undertakings are

    expensive in manpower and dollars, and need to submit to costbenefitanalyses. The primary issue that should govern any such effort is simple

    who benefits? Large-scale programs should have large-scale benefits, both

    in the breadth of the scientific community that is affected and in the

    potential applicability to many biological questions of interest. The struc-

    tural genomics programs are no exception. It is our belief that the com-

    pendium of complete protein structures that is planned by the NIGMS

    Protein Structure Initiative will be of value not only to structural biolo-

    gists, but also to the increasing number of scientists in all branches of

    biology who find structural information essential in the course of theirresearch.

    6 Norvell and Cassman

    2003 by Taylor & Francis Group, LLC

  • 7/30/2019 DK2777_01

    7/7

    REFERENCES

    1. D Vitkup, E Melamud, J Moult, C Sander. Completeness in structural geno-

    mics. Nat Str Biol 8:559566, 2001.2. Nature Structural Biology, Supplement 7S, November 2001.

    3. http://www.nigms.nih.gov/funding/psi.html.

    4. S Fields. Proteomics in genomeland. Science 291:12211223, 2001.

    Structural Biology and Structural Genomics 7

    2003 by Taylor & Francis Group LLC