contentscollaboratory for structural bioinformatics (rcsb): rutgers, ... the rcsb pdb server is...
TRANSCRIPT
CONTENTS
Message from the RCSB PDB 1
What is the PDB? 2
Data Deposition 5
Data Distribution and Access 6
Other RCSB PDB Services 8
Recent Publications 10
Important Addresses 11
Selected References 12
Mirrors & Partners Inside Back Cover
ABOUT THE COVER COVER REFERENCES1Bolton, W. and Perutz, M.F. (1970) Three dimensional fourier synthe-sis of horse deoxyhaemoglobin at 2.8 Ångstrom units resolution. Nature,228, 551-552. 2Perutz, M.F., Muirhead, H., Cox, J.M. and Goaman, L.C. (1968)Three-dimensional Fourier synthesis of horse oxyhaemoglobin at 2.8 Åresolution: the atomic model. Nature, 219, 131-139. 3Hendrickson, W.A., Love, W.E. and Karle, J. (1973) Crystal structureanalysis of sea lamprey hemoglobin at 2 Å resolution. J Mol Biol, 74,331-361. 4Quiocho, F.A. and Lipscomb, W.N. (1971) Carboxypeptidase A: a pro-tein and an enzyme. Adv Protein Chem, 25, 1-78. 5Watson, H.C. (1969) The stereochemistry of the protein myoglobin.Prog. Stereochem., 4, 299. 6Alden, R.A., Birktoft, J.J., Kraut, J., Robertus, J.D. and Wright, C.S.(1971) Atomic coordinates for subtilisin BPN' (or Novo). BiochemBiophys Res Commun, 45, 337-344. 7Birktoft, J.J. and Blow, D.M. (1972) Structure of crystalline alpha-chymotrypsin. V. The atomic structure of tosyl-alpha-chymotrypsin at 2Å resolution. J Mol Biol, 68, 187-240. 8Huber, R., Kukla, D., Ruhlmann, A., Epp, O. and Formanek, H.(1970) The basic trypsin inhibitor of bovine pancreas. I. Structureanalysis and conformation of the polypeptide chain.Naturwissenschaften, 57, 389-392. 9Watenpaugh, K.D., Sieker, L.C., Herriott, J.R. and Jensen, L.H.(1973) Refinement of the model of a protein: rubredoxin at 1.5 Å reso-lution. Acta Crystallogr B, 29, 943-956. 10White, J.L., Hackert, M.L., Buehner, M., Adams, M.J., Ford, G.C.,Lentz Jr., P.J., Smilely, I.E., Steindel, S.J. and Rossman, M.G. (1976) Acomparison of the structures of apo dogfish M4 lactate dehydrogenaseand its ternary complexes. J Mol Biol, 102, 759-779. 11Mathews, F.S., Argos, P. and Levine, M. (1972) The structure ofcytochrome b5 at 2.0 Ångstrom resolution. Cold Spring Harb SympQuant Biol, 36, 387-395. 12Drenth, J., Jansonius, J.N., Koekoek, R., Swen, H.M. and Wolthers,B.G. (1968) Structure of papain. Nature, 218, 929-932.
2004 marks the 30th anniversary of the ProteinData Bank’s first newsletter, which was publishedin September 1974. It contained forms fordepositing data, provided detailed directions forretrieving data from the PDB, and listed the origi-nal 12 coordinate sets available at that time. Thiscover shows the beginning of the PDB’s journeythrough structural biology, as it started with thenewsletter and an archive of a few structures—hemoglobin,1-3 carboxypeptidase A,4 myoglobin,5
subtilisin,6 alpha-chymotrypsin,7 pancreatic trypsininhibitor,8 rubredoxin,9 lactate dehydrogenase,10
cytochrome b5,11 and papain.12
MESSAGE FROM THE RCSB PDB The RCSB Protein Data Bank is a publicly accessible informa-
tion portal for researchers and students interested in struc-tural biology. At its center is the PDB archive—the sole
international repository for the 3-dimensional structure data ofbiological macromolecules. These structures hold significantpromise for the pharmaceutical and biotechnology industries inthe search for new drugs and in efforts to understand the mys-teries of human disease.
The primary mission of the RCSB PDB is to provide accurate,well-annotated data in the most timely and efficient way possi-ble to facilitate new discoveries and scientific advances. TheRCSB processes, stores, and disseminates these important data,and develops the software tools needed to assist users in deposit-ing and accessing structural information.
Staff are located at three member institutions of the ResearchCollaboratory for Structural Bioinformatics (RCSB): Rutgers,The State University of New Jersey; the San DiegoSupercomputer Center (SDSC) at the University of California,San Diego (UCSD); and the Center for Advanced Research inBiotechnology (CARB/UMBI/NIST).
This report details the RCSB PDB’s accomplishments from July1, 2003, through June 30, 2004, and looks at the history, con-tents, and use of the RCSB PDB resource. As a popular andpublicly available biological database, the RCSB PDB server isextremely active: in an average month during this report period,approximately 429 structures were deposited to the PDB archiveand processed, 411 structures were released, and 5 million files ofindividual structures were downloaded.
The RCSB is dedicated to providing resources that fully utilizethe data contained in the archive. To this end, the RCSB PDBwebsite and database have been reengineered using feedbackderived from correspondence with our help desks, conferenceattendance, focus groups and other personal interactions betweenthe users of the PDB and RCSB staff. This effort has improvedthe usability and navigation of the website, and has introducednew capabilities for browsing and searching the PDB archive thatprovide more reliable query results. This resource, currentlyundergoing public beta testing at pdbbeta.rcsb.org, is described ingreater detail on pages 6-7.
Other highlights during this period include:
• Enhanced distribution features, including a new QuickSearchfeature, and the availability of bulk non-redundant data(page 6)
• Growth and promotion of educational resources for all vari-eties of PDB users (page 8)
• Formation of the Worldwide Protein Data Bank (wwPDB), acollaboration with the RCSB, PDBj, and the MSD-EBIwhich ensures a single, uniform and enduring archive ofexperimentally determined 3-dimensional biological struc-tures (page 4)
• Availability of Ligand Depot—a small molecule informationweb resource (page 5)
• Expansion of resources for structural genomics, including thedevelopment of the Protein Expression Purification andCrystallization Database (PepcDB; pages 8-9)
• Release of PDB_EXTRACT, a program used to captureinformation from various steps of X-ray crystallographicapplications for use in preparing PDB depositions (page 5)
The continued growth of our varied services in the areas of datadeposition, annotation, distribution, community outreach, andeducation are detailed in this report.
We welcome and appreciate your feedback, which helps us tofurther improve and refine this important resource.
Helen M. Berman (Director), Philip E. Bourne, Gary L.Gilliland, John Westbrook (Co-Directors), Judith L. Flippen-Anderson (Production and Outreach Leader)
for the RCSB Protein Data Bank
The RCSB PDB Leadership Team: (left to right) Philip E. Bourne,Judith L. Flippen-Anderson, Helen M. Berman, John Westbrook, GaryL. Gilliland
The RCSB PDB Team
A N N U A L R E P O R T 2 0 0 4 1
WHAT IS THE PDB?
The PDB1, 2 was founded in 1971 atBrookhaven National Laboratory as thesole international repository for three-
dimensional structure data of biologicalmacromolecules. Since July 1, 1999,the PDB archives have been managedby three member institutions of theRCSB.
The growing PDB archive contains thethree-dimensional coordinates and relatedinformation about biological macromole-cules. These structures, including proteins,nucleic acids, and large macromolecular com-plexes, have been determined using X-ray crystal-lography, nuclear magnetic resonance (NMR), andcryo-electron microscopy. PDB data are used by pharmaceutical
and biotechnology industries and aca-demic researchers in their search for newdrugs and in efforts to understand theunderlying causes of human disease.Understanding the shape of these struc-tures and how they bind to various mol-ecules may well provide the key to howthey function.
Since the information about these struc-tures is contained in a database, theRCSB PDB provides several tools forsearching the database and creating tab-ular and graphical reports of the searchresults. Other resources for related
aspects of structural biology,including structural genomics,data representation formats,downloadable software, and edu-cational materials, are also madeavailable.
In 2003, the RCSB formed a col-laboration called the Worldwide
PDB (wwPDB)3 with theMacromolecular Structure Database at
the EMBL’s European BioinformaticsInstitute (MSD-EBI), and Protein Data Bank Japan (PDBj). AllwwPDB sites share responsibilities for data deposition, data pro-cessing, and distribution, and agree to support a single, standard-ized archive of structural data. (see also page 4)
BIOLOGICAL MACROMOLECULES IN THE PDBEach molecule has a role in a biological process, and many haveseveral biological functions. The structures range from smallpieces of protein or DNA to complex machines, such as the ribo-some (pictured).
Among the most intricate and remarkable of nature’s machines,the ribosome’s structure includes more than 100,000 atoms,reflecting its complex and crucial function as the cell’s protein-assembly factory.
Also represented in the PDB’s archive are enzymes, viruses andmolecular assemblies (such as thenucleosome). Their structures pro-vide insight into these molecules’roles in fundamental biologicalprocesses and, in some cases, howthey function in disease or druginteractions.
RCSB PDB USERS As the sole repository for three-dimensional structure data of bio-logical macromolecules, the PDB isa critical resource for academic,pharmaceutical, and biotechnologyresearch. Depositors from thesegroups provide these data, whichare used by these groups to per-form further research. Studentsand educators depend upon theRCSB PDB’s collection ofresources to help elucidate various
PDBHOLDINGS
as of July 1, 2004
26144 TotalStructures
Molecule Type23676
proteins, peptides, and viruses
1338 nucleic acids
1112protein/nucleic acid complexes
18carbohydrates
Experimental Technique
22306X-ray diffraction
and other
3838NMR
13216structure factor files
1964NMR restraint files
The ribosome, as illustrated by the Molecule of theMonth’s David S. Goodsell
2 R C S B P R O T E I N D A T A B A N K
Educators and studentsRCSB PDB member Kyle Burkhardt discusses protein structure with a highschool class (from The Pingry School in Martinsville, NJ) at the RCSB PDBsite at Rutgers.
A N N U A L R E P O R T 2 0 0 4 3
subjects related to biological structure. Someof these diverse user groups are highlightedhere.
Educators and students use the resources madeavailable from the website for their ownresearch in structural biology. The RCSB pre-pares materials for use by all audiences, includ-ing a Molecule of the Month feature that high-lights a different molecule in the PDB (see alsopage 8).
Structural biologists in industry and academiawho focus on pharmaceutical and biotechnolo-gy research contribute PDB data as part oftheir work.
Computational biologists use data from thePDB to discover how macromolecules fold andfunction.
Scientists working in structural genomics usetools produced by the RCSB PDB. Structuralgenomics is a worldwide initiative aimed atdetermining a large number of protein struc-
tures in a high throughput mode. The RCSBPDB has developed tools to facilitate therapid deposition of data produced by thestructural genomics initiatives and has created
databases to track the progress oftheir work.
RCSB PDB MANAGEMENTThis resource is managed by threemember institutions of the RCSB—Rutgers, The State University ofNew Jersey; the San DiegoSupercomputer Center (SDSC), an
Computational Biologists
J. Andrew McCammon, an HHMI Investigator atUCSD, studies biochemical reactions.
ADVISORY COMMITTEEThe RCSB PDB solicits theadvice of several internationalcommittees. The members ofthe Advisory Committee areexperts in a variety of areas,including X-ray crystallogra-phy, NMR, cryo-electronmicroscopy, bioinformatics,and education.
Stephen K. Burley (Chair)Chief Scientific Officer &
Senior Vice President,Research, StructuralGenomiX
Frank Allen Executive Director, Cambridge
Crystallographic Data Centre
Edward N. BakerProfessor, University of
Auckland
Wah ChiuProfessor, Baylor College of
Medicine
Juli Feigon Professor, University of
California, Los Angeles
Nobuhiro GoProfessor, Japan Atomic Energy
Research Institute
Barry HonigProfessor & HHMI Investigator,
Columbia University
Robert KapteinProfessor, Utrecht University
Sung-Hou KimProfessor, University of
California, Berkeley
Seth PinskySenior Vice President, Global
Research and Development &Chief Technical Officer, MDL
David SearlsSenior Vice-President of
Bioinformatics,GlaxoSmithKline
Judith VoetProfessor, Swarthmore College
Structural Biologists using NMR
Ed S. Mooberry and Gregory Zornetzer stand in front of abare Fleckvieh Varian Unity Inova 900 MHz spectrometer atthe National Magnetic Resonance Facility at the University ofWisconsin-Madison. NMRFAM provides state of the art NMRspectrometer facilities used in structure determination to userslocally and nationwide.
Structural Biologists Using X-ray CrystallographyRobert Sweet and others review a diffraction pattern at theNational Synchrotron Light Source at Brookhaven NationalLaboratory. Collecting X-ray diffraction data at a synchrotronbeamline is part of the process of determining a protein structure.
4 R C S B P R O T E I N D A T A B A N K
organized research unit of the Universityof California, San Diego (UCSD); andthe Center for Advanced Research inBiotechnology (CARB), part of a jointventure between the University ofMaryland Biotechnology Institute(UMBI) and the National Institute ofStandards and Technology (NIST). TheRCSB PDB is supported by funds fromthe National Science Foundation (NSF),the National Institute of General MedicalSciences (NIGMS), the Office of Science,Department of Energy (DOE), theNational Library of Medicine (NLM), theNational Cancer Institute (NCI), theNational Center for Research Resources(NCRR), the National Institute ofBiomedical Imaging and Bioengineering(NIBIB), and the National Institute ofNeurological Disorders and Stroke(NINDS).
Five project leaders manage the operationsof the RCSB PDB.
• Helen M. Berman, a Board ofGovernors Professor of Chemistry andChemical Biology at Rutgers, is theoverall Director of the RCSB PDB.She was part of the original team thatdeveloped the PDB at BrookhavenNational Laboratory, and is a co-founder of the Nucleic Acid Database.
Three Co-Directors oversee activities attheir respective sites:
• John Westbrook,Research AssociateProfessor ofChemistry at RutgersUniversity
• Philip E. Bourne,Professor of Pharmacology at UCSDand Adjunct Professor at the BurnhamInstitute
• Gary Gilliland, CARB Fellow, AdjunctProfessor at UMBI, and NISTResearch Chemist.
Judith L. Flippen-Anderson, formerlywith the Naval Research Laboratory, is theProduction and Outreach Leader acrossthe three sites.
THE MISSION OF THE RCSB PDB TEAMThe RCSB seeks to enable science world-wide by offering a variety of resources toimprove the understanding of structure-function relationships in biological sys-tems. The RCSB believes that the avail-ability of consistent, well-annotated three-dimensional data will facilitate new scien-tific advances. For the data to be trulyuseful, we must deliver it in a timely andefficient manner. To fulfill this mission,the capabilities of the RCSB PDB arecontinually being upgraded and signifi-cantly extended.
RCSB COLLABORATIONS: WWPDBThe RCSB, the Macromolecular StructureDatabase at the European BioinformaticsInstitute (MSD-EBI), and the ProteinData Bank Japan (PDBj; at the Institutefor Protein Research at Osaka University)have formed the Worldwide PDB(wwPDB).3 The mission of the wwPDB isto maintain a single PDB archive ofmacromolecular structural data that isfreely and publicly available to the global
community.wwPDB for-malizes theinternationalcharacter ofthe PDB andensures thatthe archive
will remain single and uniform. All threeorganizations serve as deposition, dataprocessing and distribution sites for thePDB Archive. Each wwPDB siteprovides its own view of theprimary data, thus providing avariety of tools and resources.Additionally, wwPDB mem-bers collaborate on key projectsessential to the maintenance ofthe PDB archive.
MSD-EBI (www.ebi.ac.uk/msd) continues toprovide weekly updates of the structuresdeposited and processed at their AutoDepsite. A key accomplishment of this collab-oration is the agreement on an exchangedictionary that is used by wwPDB mem-bers for data exchange.
Close collaborations have been main-tained with PDBj (www.pdbj.org), whereADIT is used for structure deposition andannotation.
The RCSB, MSD-EBI, and PDBj are alsocollaborating on an PDBML/XML repre-sentation of PDB data based on the PDBexchange dictionary (page 9).
RCSB COLLABORATIONS: BMRB The collaboration on NMR data deposi-tion continues with the BioMagResBank
(BMRB;www.bmrb.wisc.edu)4,stressing the develop-ment of a data dic-tionary and an inte-grated depositionsystem based on
ADIT (page 5).
RNA Polymerase, from the April 2003 Molecule of the Month feature.
PDB ID: 1i6h Gnatt, A.L., Cramer, P., Fu, J., Bushnell, D.A., Kornberg, R.D.: StructuralBasis of Transcription: An RNA Polymerase II Elongation Complex at 3.3 ÅResolution (2001) Science, 292, 1876.
DATA INPUT: DEPOSITION, VALIDATION, ANDANNOTATIONA key component of RCSB PDB operations isthe efficient capture (deposition) and curation(validation and processing/annotation) ofexperimental structural data. Data from exper-iments using X-ray crystallography, NMR,cryo-electron microscopy and other methodsare deposited in the PDB. Scientists can con-tribute their data using tools available atRCSB-Rutgers, PDBj, and MSD-EBI. Dataare also accepted via FTP and e-mail. Dataannotated at PDBj and MSD-EBI are sent tothe RCSB for release.
The deposition tool ADIT (AutoDep InputTool) is available online from the RCSB andPDBj sites and as a software download forstandalone desktop use. ADIT provides a userinterface to a collection of programs for datainput, validation, annotation, and formatexchange. The ADIT system uses the PDBexchange format that is based on the macro-molecular Crystallographic Information File(mmCIF) dictionary. mmCIF is an ontologyof more than 2,500 terms defining macromol-ecular structure and related experiments.5
After the annotation staff performs checks onthe data, validation reports and a completedPDB file are returned to the depositor forreview. Depositors also have the option ofindependently performing many of thesechecks using validation software available fromthe RCSB. When finalized, the completeentry, including its status information andPDB ID, is loaded into the core relationaldatabase. This entire process is completedwith an average turnaround of less than two
weeks. Depending upon the hold status select-ed by the depositor, data release occurs when adepositor gives approval (REL), the hold datehas expired (HOLD), or the journal articlehas been published (HPUB). As of May 2003,there is a one-year limit on the length of ahold period, including HPUBs. If the citationfor a structure is not published within theone-year period, depositors are given theoption to either release or withdraw the depo-sition.
SOFTWARE FOR DEPOSITION AND VALIDATIONThe RCSB PDB has developed a variety oftools to facilitate the validation and depositionof structural data by depositors. These toolsinclude PDB_EXTRACT,6 the PDBValidation Suite,1 Ligand Depot,7 and ADIT.1
These programs can be accessed from theRCSB PDB website. Some are also availablefor download and desktop installation fromdeposit.pdb.org/software. These programs areupdated regularly and are available for a vari-ety of platforms.
The RCSB and BMRB have collaborated on aversion of ADIT that collects NMR experi-mental data. ADIT-NMR is available from theBMRB site, www.bmrb.wisc.edu.
STATISTICSDuring the period covered by this report,5153 files were deposited to the PDB archivethrough an international effort. Of the struc-tures deposited during the period of thisreport, approximately 74% were depositedwith experimental data. Around 56% of thedepositions released their sequence data priorto the structure’s release.
RCSB PDB SERVICES: DATA DEPOSITION
PDB_EXTRACTtakes infor-mation aboutdata collec-
tion, phasing, density modifi-cation, and the final structurerefinement from the files pro-duced by many crystal struc-ture determination applica-tions. The collected informa-tion is organized into mmCIFfiles that are ready for valida-tion and deposition. In addi-tion to the web-based andstandalone versions availablefrom the RCSB PDB,PDB_EXTRACT has beenintegrated with the CCP4iinterface of the CCP4 programsuite8 for protein crystallogra-phy (version 5.0).
The PDB Validation Suite can beused to check the format of thecoordinates and validate theoverall structure before deposi-tion. Sequence/coordinatealignment, missing and extraatoms or residues, and datainconsistencies are reported.The validation report also con-tains links to geometrical andexperimental checks from theprograms NUCHECK,9
PROCHECK,10 andSFCHECK.11
Ligand Depot inte-grates databases,
services, tools and methodsrelated to small moleculesbound to macromolecules intoan online data warehouse. Itcan be used to find codes forexisting ligands, to link toPDB entries with a particularligand, and to search for ligandsubstructures.
ADIT can beused to check
file formats, run the PDBValidation Suite, and add oredit information for a deposi-tion. An mmCIF file can bewritten from the desktop ver-sion of ADIT and uploaded tothe web version or sent to thePDB for deposition.
A N N U A L R E P O R T 2 0 0 4 5
DATA DISTRIBUTION AND ACCESSPDB data and RCSB services are availablewithout cost through the Internet. Themain RCSB PDB website at SDSC-UCSD receives an average of more than220,000 hits per day from all over theworld. On average, more than one file isdownloaded every second, 24 hours a day,seven days a week. Additionally, there aresix RCSB PDB mirror sites around theworld at RCSB-Rutgers (US), RCSB-CARB (US), Osaka University (Japan),the National University of Singapore(Singapore), the CambridgeCrystallographic Data Centre (UnitedKingdom), and the Max Delbrück Centerfor Molecular Medicine (Germany). AllRCSB sites are available 24 hours a day,seven days a week. New structures areadded to the PDB holdings by 1:00 A.M.Pacific Daylight Time each Wednesday,52 weeks per year.
There are a number of different ways toquery the database. Entering the PDB IDof the target macromolecule in the searchbox on the home page performs the sim-plest search; the ID is assigned at the time
of deposition, and is referenced in thejournal article describing its structure.This search returns a Structure Explorerpage that provides summary informationabout the entry as well as links to theatomic coordinates, derived geometricdata, and experimental data (X-ray struc-ture factors and NMR constraint data,where available). Links to structurally sim-ilar “neighbor” entries are also provided,along with options to study other aspectsof the molecule, such as the secondarystructure or primary amino acid sequence.Dynamic links to the structure’s entry inother databases are provided by theMolecular Information Agent (MIA), andare accessible under the Other Sourcessection of the Structure Explorer page.Views of the structure’s asymmetric andbiologically active units are provided asstatic images generated by MolScript12 andRaster3D,13 and interactive views are pre-sented in VRML, RasMol,14,15 MICE,16,17
Chime,18 Swiss-Pdb Viewer,19, 20 STING,21
and QuickPDB (Java).22
Multiple structures can be retrieved usingthe keyword search functionality on the
RCSB PDB home page. SearchLite andthe customizable SearchFields interfacesare provided to perform queries usingparameters selected by the user.SearchFields can be used to limit theresults to a subset of structures basedupon the percentage of sequence identity.Files that list these clusters and their rank-ings at 50%, 70% and 90% sequenceidentity are available from the FTP server.QuickSearch is a new keyword search fea-ture that searches across the archiveand/or the static web pages.
The Query Result Browser lists all mole-cules that meet the user’s query specifica-tions, and allows exploration of one ormore of the structures returned. Optionsto refine the query or create tabularreports from the search results are alsoavailable. A PDB or mmCIF format filefor any structure can be downloaded asplain text or in one of several compressedformats. Files may also be downloadedfrom the FTP server.
REENGINEERED DATABASE AND WEBSITE The RCSB PDB is dedicated to providingresources that fully utilize the data con-tained in the PDB archive.
To this end, we invite our users to partici-pate in the beta test of the reengineeredRCSB PDB site and database. The sitewas developed using feedback derivedfrom the RCSB PDB help desk, confer-ence attendees, focus groups and otherpersonal interactions between our usersand RCSB staff. The RCSB PDB hasworked with both users and usabilityexperts to improve navigation and ease ofaccess. Experts with backgrounds inmolecular and structural biology are ana-lyzing the system for errors at each newphase of development.
This new system utilizes data improve-ments resulting from our ongoing effortsin uniformity and standardization(page 9). It offers improved accessibilityfor our diverse user community and willallow for rapid addition of new featuresbased on input from users and in
RCSB PDB SERVICES: DATA DISTRIBUTION AND ACCESS
6 R C S B P R O T E I N D A T A B A N K
response to the underlying evolution inscience and technology.
The new system implements an industry-standard three-tier system composed ofvery distinct database, application logic,and presentation layers. The partitioningof functionality between these layers andthe use of Java Enterprise Edition (J2EE)creates a modular architecture that greatlysimplifies maintenance and extensibilityof this new system. It also enables us toincorporate readily available tools of highquality to improve existing components ofthe site, such as the keyword searchengine and the help documentation,which has been converted to one dynam-ic, content-specific system supported byRoboHelp.23
We have loaded all of the data from ouruniformity processing efforts and the cal-culated derived geometric structural fea-tures into a relational database that pro-vides a single source of data for the
reengineered site. This is a markedimprovement over the current query sys-tem, which stores data in four separatedatabases and flat files that are integratedthrough a CGI layer. The new relationaldatabase implements the PDB exchangedata dictionary schema (pdbschema.sdsc.edu).
The reengineering development processincluded an internal requirements analysisand active testing. The strengths, weak-nesses, and major usage patterns of ourcurrent system were examined, document-ed, and used in planning the new site.
The beta test site is available atpdbbeta.rcsb.org alongside the current pro-duction site. Both sites are updated once aweek.
The RCSB is very excited about thisresource, and encourages the PDB com-munity to test the new site. We look for-ward to your feedback [email protected].
WHAT CAN YOU DO WITHTHE BETA RCSB PDB
WEBSITE? • Browse for structures by exploring
various features associated withsequence, structure, and function.
• Search for ligands and review pro-tein-ligand interactions.
• Query and browse data incorporat-ed from external sources for exten-sive cross-referencing within thedatabase. Examples include brows-ing by genome location, taxonomy,Enzyme Commission number,SCOP,25 CATH,26 MEDLINE,24
and Gene Ontology27 terms andlocating structures by their relation-ship to known diseases.
• Utilize high quality visualizationfeatures based on the molecularbiology toolkit (mbt.sdsc.edu), KiNG(Kinemage Next Generation;kinemage.biochem.duke.edu/), Jmol(open source Java Viewer), andWebMol.28
• Search the database using improvedtools and produce reports organizedlike a scientific paper, with sectionsmaterials and methods, structuredetails, biology and chemistry, ref-erences.
• Navigate the site easily using a newhelp system, site search browser,alphabetical site index, improvednavigation tools, and recall of pre-vious queries.
• Review more comprehensive struc-ture summaries on screen and inhand using the printer-friendlyoption.
• Create more detailed and sortablereports for search results containingmultiple structures.
• Search database using queries basedupon sequence, 3D structure of lig-ands, and proteins.The RCSB PDB beta test home page at pdbbeta.rcsb.org
A N N U A L R E P O R T 2 0 0 4 7
OUTREACH The RCSB PDB interacts with its diverseuser community to provide informationabout the resource, gain feedback fordevelopment, and provide materialsthat promote a broader understand-ing of structural biology. The RCSBstaff responds to general and specif-ic inquiries sent to help desks, andoffers a listserv forum for exchangeamong members of the PDB com-munity.
Recent developments and activitiesare announced weekly on the RCSBPDB website. The RCSB PDB dis-tributes a quarterly newsletter and avariety of online and print materialsabout the resource. Our efforts are alsodescribed in journal papers and articles.
The RCSB PDB exhibited at a variety of annualmeetings, including the American CrystallographicAssociation (ACA), the Biophysical Society, the ExperimentalNuclear Magnetic Resonance Conference, and the InternationalConference on Intelligent Systems for Molecular Biology. Staffmembers presented talks, demonstrations, and posters at morethan thirty meetings around the world, including the NationalScience Teachers Association’s annual meeting.
The 2004 RCSB PDB Poster Prize recognized student posterpresentations involving macromolecular crystallography at eachof the meetings of the International Union of Crystallography’sRegional Associates–the ACA, the Asian CrystallographicAssociation, and the European Crystallographic Association. Theprize was also awarded at the Annual International Conferenceon Research in Computational Molecular Biology (RECOMB).
The RCSB PDB’s “Art of Science” art exhibit highlights vari-ous representations of proteins found in the PDB, includinglarge-scale depictions of the images available from StructureExplorer pages and pictures from the Molecule of the Monthseries. It traveled to the University of Wisconsin-Madison andCalifornia State University, Fullerton during the period of thisreport.
MOLECULE OF THE MONTH ANDEDUCATIONAL RESOURCESEmphasis is placed on educating newusers, students, educators, and the gen-eral public. The RCSB maintains an
extensive portal to educational resourcesfor different levels of expertise. The
Molecule of the Month column highlightedon the RCSB PDB home page focuses on a
different biological molecule each month for ageneral audience. Produced by David S. Goodsell
(The Scripps Research Institute), each installmentincludes an introduction to the structure and function of
the molecule, a discussion of the relevance of the molecule tohuman health and welfare, and suggestions for how visitorsmight view these structures and access further details.
Curricular materials for educators are solicited and posted on theRCSB PDB website and highlighted in our newsletters. Tutorialsand help documentation are also available on the website, withenhanced user help features available on the RCSB PDB beta testsite.
STRUCTURAL GENOMICSStructural genomics is a worldwide initiative aimed at determin-ing a large number of protein structures in a high throughputmode. The RCSB PDB continues to be actively involved indeveloping the informatics of structural genomics projects. This
Crystal Structure Of Human DJ-1.
PDB ID: 1p5f
Wilson, M. A., Collins, J. L., Hod, Y., Ringe, D., Petsko, G. A.: The 1.1 ÅResolution Crystal Structure of DJ-1, the Protein Mutated in Autosomal RecessiveEarly Onset Parkinson's Disease. (2003) Proc.Nat.Acad.Sci.USA 100, 9256-9261.
OTHER RCSB PDB SERVICES
Pyruvate kinase, from the February 2004 Molecule of theMonth feature on glycolytic enzymes.
PDB ID: 1a3w Jurica, M.S. Mesecar, A. Heath, P.J. Shi, W. Nowak, T.Stoddard, B.L. The allosteric regulation of pyruvate kinase by
fructose-1,6-bisphosphate. (1998) Structure, 6, 195-210.
8 R C S B P R O T E I N D A T A B A N K
A N N U A L R E P O R T 2 0 0 4 9
effort includes participation in taskforces, meetings, and individual interac-tions with each of the structuralgenomics centers. The RCSB PDB isresponsible for maintaining the proposeddata deposition specifications and thedata dictionary supporting these specifi-cations (deposit.pdb.org/mmcif). In addition,the RCSB PDB has created software tohelp integrate data from standard struc-ture determination packages, and hascreated the resources Target RegistrationDatabase (TargetDB)29 and PepcDB.
Efficient structure solution on a genomicscale requires a centralized coordinationeffort, to which the timely availability ofstatus information on the progress ofprotein production and structure solu-tion is key. TargetDB collects targetsequences from the nine P50 NIH struc-tural genomics centers and from othergenomics centers worldwide. As of thiswriting, more than 65,400 sequenceshave been entered in TargetDB. PepcDB,the Protein Expression Purification andCrystallization Database, extends thecontent of TargetDB with status history,stop conditions, reusable text protocolsand contact information collected fromthe NIH Protein Structure Initiative(PSI) Centers. A demonstration site hasbeen created (pepcdb.rutgers.edu).
DATA UNIFORMITY: RELEASE OF ASTANDARDIZED ARCHIVEThe RCSB PDB’s focus on making thearchive as consistent and error-free aspossible is ongoing.30-31 This workinvolves continuous archive-wide stan-dardization of nomenclature and usagewithin existing data items. Standardization is required to supportreliable comparisons of data across the archive as well as for inte-gration with other database resources.
All PDB entries have been standardized and released in mmCIFformat. These remediated files can be accessed from theDownload/Display File section of the Structure Explorer page forany entry, or for a set of query results. The remediated mmCIFfiles are also available from the FTP site. The files follow the lat-est version of the PDB exchange data dictionary(deposit.pdb.org/mmcif) that was developed by the RCSB PDB and
the MSD-EBI. Software for translatingthese data into PDB-formatted files canbe downloaded from deposit.pdb.org/software.
All remediated data files are also availablein PDBML/XML format from the betaFTP site and through queries on the pro-duction site. The PDBML/XML data fileshave been created by software-facilitatedtranslation of the remediated mmCIFdata files. As a result, the element andattribute names in the PDBML/XMLdata files directly correspond to the itemnames defined in the PDB ExchangeDictionary. An PDBML/XML schemadocumenting the translated format isavailable at deposit.pdb.org/mmcif. The deliv-ery of PDB data in PDBML/XML formatis the product of a collaboration amongPDBj, MSD-EBI, and the RCSB.
DATA CD DISTRIBUTIONThe RCSB PDB currently distributes aquarterly CD-ROM of PDB holdingsupon request. All PDB entries and relatedexperimental data are distributed in theJanuary release followed by incrementalupdates in the remaining periods.Provided at no cost to users, the CD-ROMs are designed to help researcherswho have limited Internet access or needsubsets or a complete set of the structurefiles for their research.
PHYSICAL ARCHIVES The PDB Physical Archives contain thepaper documents, magnetic tapes, andother materials generated by BNL and theRCSB PDB over the course of its history.The archives continue to be called onperiodically to resolve issues involvingolder structures. Older data have been
restored and incorporated into a file structure that was designedfor easy maintenance and expansion. The overall goal of thePDB Physical Archives is to preserve not only the data submittedby the depositors, but also the records associated with the trans-actions and activities that are part of the evolution of theresource. Access and availability of this information to the RCSBPDB staff provides a resource for resolving issues concerning spe-cific entries, aides in uniformity and value-added annotation,facilitates disaster recovery, and makes the information availablefor research.
Solution Structure Of A Dimeric LactoseDNA-Binding Domain Complexed To ANonspecific DNA Sequence.
PDB ID: 1osl
Kalodimos, C.G., Biris, N., Bonvin, A.M.J.J.,Levandoski, M.M., Guennuegues, M., Boelens, R.,Kaptein, R. Structure and Flexibility Adaptation inNonspecific and Specific Protein-DNA Complexes.(2004) Science 305, 386-389.
10 R C S B P R O T E I N D A T A B A N K
Berman, H.M. (2003) Structural databases ofbiological macromolecules. In Encyclopedia ofthe Human Genome. Nature PublishingGroup Macmillian Publishers Ltd., Londonpp. 406-412.
Berman, H.M., Bourne, P.E., and Westbrook, J.(2004) The Protein Data Bank: A case studyin management of community data. CurrentProteomics 1, 49-57.
Berman, H.M., Bourne, P.E., Westbrook, J., andZardecki, C. (2003) The Protein Data Bank.In Protein Structure. Determination, analysis,and applications for drug discovery. Chasman,D. (ed.), Marcel Dekker, Inc., New York, NYpp. 389-405.
Berman, H.M., Henrick, K., and Nakamura, H.(2003) Announcing the worldwide ProteinData Bank. Nat Struct Biol 10, 980.
Berman, H.M., and Westbrook, J. (2003) Theneed for dictionaries, ontologies, and con-trolled vocabularies. OMICS. A Journal ofIntegrative Biology 7, 9-10.
Berman, H.M., and Westbrook, J. (2004) Theimpact of structural genomics on the ProteinData Bank. Am J Pharmacogenomics 4, 257-252.
Bourne, P.E., Addess, K.J., Bluhm, W.F., Chen,L., Deshpande, N., Feng, Z., Fleri, W.,Green, R., Merino-Ott, J.C., Townsend-Merino, W., Weissig, H., Westbrook, J., andBerman, H.M. (2004) The distribution andquery systems of the RCSB Protein DataBank. Nucleic Acids Res 32, D223-D225.
Bourne, P.E., Westbrook, J., and Berman, H.M.(2004) The Protein Data Bank and lessons indata management. Brief Bioinform 5, 23-30.
Chen, L., Oughtred, R., Berman, H.M., andWestbrook, J. (2004) TargetDB: A target reg-istration database for structural genomicsprojects. Bioinformatics bioinformatics.oupjour-nals.org/cgi/content/abstract/bth300v1.
Feng, Z., Chen, L., Maddula, H., Akcan, O.,Oughtred, R., Berman, H.M., andWestbrook, J. (2004) Ligand Depot: A DataWarehouse for Ligands Bound toMacromolecules. Bioinformatics 20, 2153-2155.
Westbrook, J., Feng, Z., Burkhardt, K., andBerman, H.M. (2003) Validation of proteinstructures for the Protein Data Bank. MethEnz. 374, 370-385.
Westbrook, J., Feng, Z., Chen, L., Yang, H., andBerman, H.M. (2003) The Protein DataBank and structural genomics. Nucleic AcidsRes 31, 489-491.
RECENT PUBLICATIONS
Crystal structure of Bacillus anthracis protective antigen complexedwith human anthrax toxin receptor.
PDB ID: 1t6b
Santelli, E., Bankston, L. A., Leppla, S. H., Liddington, R. C. (2004)Crystal Structure of a Complex Between Anthrax Toxin and its HostCell Receptor. Nature, 430, 905-908.
IMPORTANT WEB ADDRESSES
BETA TEST SITEpdbbeta.rcsb.org
DATA UNIFORMITYwww.rcsb.org/pdb/uniformity
DEPOSITION SERVICES INFORMATIONdeposit.pdb.org
EDUCATION RESOURCESwww.rcsb.org/pdb/education.html
FTPftp://ftp.rcsb.org
MMCIF RESOURCESdeposit.pdb.org/mmcif
MOLECULE OF THE MONTH www.rcsb.org/pdb/molecules/molecule_list.html
PEPCDBpepcdb.rutgers.edu
RCSB PROTEIN DATA BANKwww.pdb.org
RCSB PDB NEWSLETTERwww.rcsb.org/pdb/newsletter.html
STRUCTURAL GENOMICSwww.rcsb.org/pdb/strucgen.html
SOFTWARE RESOURCES www.rcsb.org/pdb/software-list.html
TARGETDBtargetdb.pdb.org
PDBML/XML BETA DATA FILESftp://beta.rcsb.org/pub/pdb/uniformity/data/XML
WWPDBwww.wwpdb.org
HELP DESKS
BETA SITE [email protected]
GENERAL PDB [email protected]
GENERAL DEPOSITION QUESTIONS [email protected]
ADIT [email protected]
TSX Structure Complexedwith Thymidine.
PDB ID: 1tlw
Ye, J., van den Berg, B.: Crystal Structure ofthe Bacterial Nucleoside Transporter TSX.(2004) EMBO J. 23, 3187-3195.
A N N U A L R E P O R T 2 0 0 4 11
12 R C S B P R O T E I N D A T A B A N K
SELECTEDREFERENCES1. Berman, H.M., Westbrook, J., Feng, Z.,
Gilliland, G., Bhat, T.N., Weissig, H.,Shindyalov, I.N. and Bourne, P.E. (2000) TheProtein Data Bank. Nucleic Acids Res., 28,235-242.
2. Bernstein, F.C., Koetzle, T.F., Williams,G.J.B., Meyer Jr., E.F., Brice, M.D., Rodgers,J.R., Kennard, O., Shimanouchi, T. andTasumi, M. (1977) Protein Data Bank: acomputer-based archival file for macromolec-ular structures. J. Mol. Biol., 112, 535-542.
3. Berman, H.M., Henrick, K. and Nakamura,H. (2003) Announcing the worldwideProtein Data Bank. Nat. Struct. Biol., 10,980.
4. Doreleijers, J.F., Mading, S., Maziuk, D.,Sojourner, K., Yin, L., Zhu, J., Markley, J.L.and Ulrich, E.L. (2003) BioMagResBankdatabase with sets of experimental NMR con-straints corresponding to the structures ofover 1400 biomolecules deposited in theProtein Data Bank. J. Biomol. NMR, 26, 139-146.
5. Bourne, P.E., Berman, H.M., Watenpaugh,K., Westbrook, J.D. and Fitzgerald, P.M.D.(1997) The macromolecular CrystallographicInformation File (mmCIF). Meth. Enzymol.,277, 571-590.
6. Yang, H., Guranovic, V., Dutta, S., Feng, Z.,Berman, H.M. and Westbrook, J. (2004)Automated and accurate deposition of struc-tures solved by X-ray diffraction to theProtein Data Bank. Acta Crystallogr. D Biol.Crystallogr., 60, 1833-1839.
7. Feng, Z., Chen, L., Maddula, H., Akcan, O.,Oughtred, R., Berman, H.M. andWestbrook, J. (2004) Ligand Depot: A DataWarehouse for Ligands Bound toMacromolecules. Bioinformatics, 20, 2153-2155.
8. CCP4. (1994) The CollaborativeComputational Project Number 4 suite: pro-grams for protein crystallography. ActaCrystallogr. D Biol. Crystallogr., 50, 760-763.
9. Feng, Z., Westbrook, J. and Berman, H.M.(1998) NUCHECK. Rutgers University, NewBrunswick, NJ.
10. Laskowski, R.A., Rullmann, J.A., MacArthur,M.W., Kaptein, R. and Thornton, J.M.(1996) AQUA and PROCHECK-NMR: pro-grams for checking the quality of proteinstructures solved by NMR. J. Biomol. NMR,8, 477-486.
11. Vaguine, A.A., Richelle, J. and Wodak, S.J.(1999) SFCHECK: a unified set of proce-dures for evaluating the quality of macromol-ecular structure-factor data and their agree-
ment with the atomic model. Acta CrystallogrD Biol Crystallogr., 55, 191-205.
12. Kraulis, P. (1991) MOLSCRIPT: a programto produce both detailed and schematic plotsof protein structures. J. Appl. Cryst., 24, 946-950.
13. Merrit, E.A. and Bacon, D.J. (1997)Raster3D: Photorealistic molecular graphics.Meth. Enzymol., 277, 505-524.
14. Sayle, R. and Milner-White, E.J. (1995)RasMol: biomolecular graphics for all. TrendsBiochem. Sci., 20, 374.
15. Bernstein, H.J. (2000) Recent changes toRasMol, recombining the variants. TrendsBiochem. Sci., 25, 453-455.
16. Bourne, P.E., Gribskov, M., Johnson, G.,Moreland, J. and Weissig, H. (1998) A proto-type Molecular Interactive CollaborativeEnvironment (MICE). In Altman, R.,Dunker, K., Hunter, L. and Klein, T. (eds.),Pacific Symposium on Biocomputing, pp. 118-129.
17. Tate, J.G., Moreland, J. and Bourne, P.E.(1999) MSG (Molecular Scene Generator): aWeb-based application for the visualization ofmacromolecular structures. Journal of AppliedCrystallography, 32, 1027-1028.
18. MDL Information Systems Inc. Chime.United States.
19. Kaplan, W. and Littlejohn, T. (2001) Swiss-PDB Viewer (Deep View). Brief Bioinform, 2,195-197.
20. Guex, N. and Peitsch, M.C. (1997) SWISS-MODEL and the Swiss-PdbViewer: An envi-ronment for comparative protein modeling.Electrophoresis, 18, 2714-2723.
21. Neshich, G., Togawa, R., Vilella, W. andHonig, B. (1998) STING (Sequence To andwithIN Graphics) PDBViewer. Protein DataBank Quarterly Newsletter, 85, 6-7.
22. Shindyalov, I. and Bourne, P.E. AppletQuickPDB. ©1996-1998 San DiegoSupercomputer Center.
23. eHelp Corporation. Robohelp. ©1992-2003.All Rights Reserved.
24. National Library of Medicine. (1989-) MED-LINE [database online]. Bethesda (MD).Updated weekly. Available from: NationalLibrary of Medicine; OVID, Murray, UT;The Dialog Corporation, Palo Alto, CA.x.
25. Conte, L., Bart, A., Hubbard, T., Brenner, S.,Murzin, A. and Chothia, C. (2000) SCOP: astructural classification of proteins database.Nucleic Acids Res., 28, 257-259.
26. Orengo, C.A., Michie, A.D., Jones, S., Jones,
D.T., Swindells, M.B. and Thornton, J.M.(1997) CATH–a hierarchic classification ofprotein domain structures. Structure, 5, 1093-1108.
27. The Gene Ontology Consortium. (2000)Gene Ontology: tool for the unification ofbiology. Nature Genetics, 25, 25-29.
28. Walther, D. (1997) WebMol--a Java-basedPDB viewer. Trends Biochem Sci, 22, 274-275.
29. Chen, L., Oughtred, R., Berman, H.M. andWestbrook, J. (2004) TargetDB: A target reg-istration database for structural genomicsprojects. Bioinformatics, bioinformatics.oupjour-nals.org/cgi/content/abstract/bth300v1.
30. Westbrook, J., Feng, Z., Jain, S., Bhat, T.N.,Thanki, N., Ravichandran, V., Gilliland,G.L., Bluhm, W., Weissig, H., Greer, D.S. etal. (2002) The Protein Data Bank: Unifyingthe archive. Nucleic Acids Res., 30, 245-248.
31. Bhat, T.N., Bourne, P., Feng, Z., Gilliland,G., Jain, S., Ravichandran, V., Schneider, B.,Schneider, K., Thanki, N., Weissig, H. et al.(2001) The PDB data uniformity project.Nucleic Acids Res., 29, 214-218.
A N N U A L R E P O R T 2 0 0 4 13
RCSB PDB LEADERSHIP TEAM
Dr. Helen M. Berman, DirectorRutgers, The State University of New Jersey
Dr. Philip E. Bourne, Co-DirectorSan Diego Supercomputer Center
University of California, San Diego
Judith L. Flippen-AndersonRutgers, The State University of New Jersey
Dr. Gary L. Gilliland, Co-DirectorCenter for Advanced Research in Biotechnology/UMBI/NIST
Dr. John Westbrook, Co-DirectorRutgers, The State University of New Jersey
A list of current RCSB PDB Team Members is available at www.rcsb.org/pdb/rcsb-group.html
MIRRORSSAN DIEGO SUPERCOMPUTER CENTER, UCSD
www.pdb.orgftp://ftp.rcsb.org
RUTGERS, THE STATE UNIVERSITY OF NEW JERSEY
rutgers.rcsb.org
CENTER FOR ADVANCED RESEARCH IN BIOTECHNOLOGY/UMBI/NISTnist.rcsb.org
CAMBRIDGE CRYSTALLOGRAPHIC DATA CENTRE, UKpdb.ccdc.cam.ac.ukftp://pdb.ccdc.cam.ac.uk/rcsb
NATIONAL UNIVERSITY OF SINGAPORE
pdb.bic.nus.edu.sgftp://pdb.bic.nus.edu.sg/pub/pdb
OSAKA UNIVERSITY, JAPAN
pdb.protein.osaka-u.ac.jpftp://ftp.protein.osaka-u.ac.jp/pub/pdb
MAX DELBRÜCK CENTER FOR MOLECULAR MEDICINE, GERMANY
www.pdb.mdc-berlin.deftp://ftp.pub.mdc-berlin.de/pub/pdb
RCSB PDB PARTNERSRUTGERS, THE STATE UNIVERSITY OF NEW JERSEY
Department of Chemistry and Chemical Biology610 Taylor RoadPiscataway, NJ 08854-8087
SAN DIEGO SUPERCOMPUTER CENTERAT THE UNIVERSITY OF CALIFORNIA, SAN DIEGO
9500 Gilman DriveLa Jolla, CA 92093-0537
CENTER FOR ADVANCED RESEARCH IN BIOTECHNOLOGY, PART OF A JOINT VEN-TURE BETWEEN THE UNIVERSITY OF MARYLAND BIOTECHNOLOGY INSTITUTE ANDTHE NATIONAL INSTITUTE OF STANDARDS AND TECHNOLOGY
9600 Gudelsky DriveRockville, MD 20850-3479
www.pdb.orgSend questions or comments to:
Crystal Structure Of Putative OxalateDecarboxylase (TM1287) from Thermotogamaritima at 1.95 Å Resolution.
PDB ID: 1o4t.
Schwarzenbacher, R., Von Delft, F., Jaroszewski,L., Abdubek, P., Ambing, E., Biorac, T.,
Brinen, L. S., Canaves, J. M., Cambell, J.,Chiu, H. J., Dai, X., Deacon, A. M.,
Didonato, M., Elsliger, M. A., Eshagi,S., Floyd, R., Godzik, A., Grittini, C.,Grzechnik, S. K., Hampton, E.,Karlak, C., Klock, H. E., Koesema, E.,K, J. S.: Crystal Structure of a PutativeOxalate Decarboxylase (TM1287)from Thermotoga maritima at 1.95 ÅResolution. (2004) Proteins 56, 392.