biomagresbank (bmrb) data deposition and entry annotation ... · t2 relaxation heteronuclear noe...

31
BioMagResBank (BMRB) Data Deposition and Entry Annotation Requirements Eldon L. Ulrich

Upload: others

Post on 28-Jun-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: BioMagResBank (BMRB) Data Deposition and Entry Annotation ... · T2 relaxation Heteronuclear NOE Homonuclear NOE/ROE Dipole-dipole relaxation Cross correlation Spectral density values

BioMagResBank (BMRB) Data Deposition and Entry Annotation Requirements

Eldon L. Ulrich

Page 2: BioMagResBank (BMRB) Data Deposition and Entry Annotation ... · T2 relaxation Heteronuclear NOE Homonuclear NOE/ROE Dipole-dipole relaxation Cross correlation Spectral density values

Presentation topics

New ADIT-NMR deposition system – Steve Mading and DimitriMaziuk in collaboration with the Rutgers RCSB team (Monica Sundd, Monica Sekharan, Zukang Feng, John Westbrook, Helen Berman, and Jasmine Young)

Restraints processing – Jurgen Doreleijers and Jundong Linin collaboration with the EBI/CCPN (Wim Vranken), UtrechtUniversity (Aart Nederveen, Alexandre Bonvin, and RobertKaptein, and Radboud University (Gert Vriend and Chris Spronk)

Assigned chemical shift validation systems – David Tolmie, KentWenger, and Dimitri Maziuk in collaboration with NESG (Hunter Moseley) and NMRFAM (Gabriel Cornilescu, Hamid Eghlbania, and Liya Wang)

Future issues

Page 3: BioMagResBank (BMRB) Data Deposition and Entry Annotation ... · T2 relaxation Heteronuclear NOE Homonuclear NOE/ROE Dipole-dipole relaxation Cross correlation Spectral density values

BMRB mission and goals

Mission: Gather and distribute in the public domain as much biological NMR data as possible to further research and education

Goals: 1) Create efficient data deposition systemsRequires minimal user effortComplete – follows IUPAC recommendationsFree of obvious errorsPromotes uniformity

2) Through annotation improve the usefulness of the dataIdentify anomalous data and communicate issues

with depositorsMigrate data into standard formatsCarry out value added processes

3) Develop data query systemsEntry retrievalLongitudinal database searches

Page 4: BioMagResBank (BMRB) Data Deposition and Entry Annotation ... · T2 relaxation Heteronuclear NOE Homonuclear NOE/ROE Dipole-dipole relaxation Cross correlation Spectral density values

BMRB NMR-STAR Content

Entry information

Contact persons

Molecular system

Molecules

Natural source

Experimental source

Spectrometer description

Samples

NMR experiments

Sample conditions Software

Applied experiments

Citations

Chemical components

Experimentally derived data

Molecular descriptionExperimental details

Page 5: BioMagResBank (BMRB) Data Deposition and Entry Annotation ... · T2 relaxation Heteronuclear NOE Homonuclear NOE/ROE Dipole-dipole relaxation Cross correlation Spectral density values

NMR spectral dataChemical shift assignmentsChemical shift referenceTheoretical chemical shiftsChemical shift isotope effectsChemical shift anisotropyCoupling constantsResidual dipolar couplingsT1 relaxationT1rho relaxationT2 relaxationHeteronuclear NOEHomonuclear NOE/ROEDipole-dipole relaxationCross correlationSpectral density valuesSpectral peak listsTime-domain data sets

Experimentally derived data

Three-dimensional structures

NMR constraintsConstraint statisticsCoordinates for structure modelsRepresentative model coordinatesStructure quality parameters

Kinetic parametersH-exchange ratesH-exchange protection factorsOrder parameters

(isotropic and anisotropic)

pKa valuesD/H-fractionation factors

Thermodynamic parameters

Secondary structure features

Helix/sheet/turn/loopDeduced H-bondsAuthor interpretation

Page 6: BioMagResBank (BMRB) Data Deposition and Entry Annotation ... · T2 relaxation Heteronuclear NOE Homonuclear NOE/ROE Dipole-dipole relaxation Cross correlation Spectral density values

One-stop RCSB BMRB/PDB ADIT-NMR deposition system(URL:deposit.bmrb.wisc.edu/bmrb-adit/)

BMRB and RCSB-PDB depositions can be generated from a joint interfaceBMRB interface has been streamlinedRCSB-PDB interface has been extended with optional fields for conformer and constraint statisticsFiles in PDB format, mmCIF, and NMR-STAR can be uploaded to pre-populate a deposition

Many fields (i.e., experiment name, software name, software author, etc.) have pull-down lists to choose from for convenience and to improve uniformity (controlled vocabulary)Fields common to multiple forms are linked to eliminate the needto retype information (i.e., uploaded data file names, author names, molecule names and others)Help and examples have been improvedYou can start with either BMRB or PDB and switch between the two as you go along

Page 7: BioMagResBank (BMRB) Data Deposition and Entry Annotation ... · T2 relaxation Heteronuclear NOE Homonuclear NOE/ROE Dipole-dipole relaxation Cross correlation Spectral density values

ADIT-NMR architecture

MAXIT

PDBPDBxmmCIF

PDBx

s2nmr

NMR-STAR v2.1NMR-STAR v3

NMR-STAR v3

nmrstr2nmrifnmrif2nmrstar

NMR-IF

ADIT-NMR

nmrif2pdbxpdbx2nmrif

PDBdeposition

PDB BMRB

v3

v2.1

BMRBdeposition

coordinatesrestraints

+

experimental data+

Page 8: BioMagResBank (BMRB) Data Deposition and Entry Annotation ... · T2 relaxation Heteronuclear NOE Homonuclear NOE/ROE Dipole-dipole relaxation Cross correlation Spectral density values

Precheck and validation of coordinate files

Precheck/Validate:Performs thesame checks

as are availablevia PDB's ADIT

tool.

Uses theCoordinate filegiven above.

Page 9: BioMagResBank (BMRB) Data Deposition and Entry Annotation ... · T2 relaxation Heteronuclear NOE Homonuclear NOE/ROE Dipole-dipole relaxation Cross correlation Spectral density values

ADIT-NMR validation report

Page 10: BioMagResBank (BMRB) Data Deposition and Entry Annotation ... · T2 relaxation Heteronuclear NOE Homonuclear NOE/ROE Dipole-dipole relaxation Cross correlation Spectral density values

ADIT-NMR constraint statistics

Page 11: BioMagResBank (BMRB) Data Deposition and Entry Annotation ... · T2 relaxation Heteronuclear NOE Homonuclear NOE/ROE Dipole-dipole relaxation Cross correlation Spectral density values

0

100

200

300

400

500

600

700

800

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

2006

2007

Dep

ositi

ons

Year

BMRB depositions by year (~260 PDB depositions received)

Page 12: BioMagResBank (BMRB) Data Deposition and Entry Annotation ... · T2 relaxation Heteronuclear NOE Homonuclear NOE/ROE Dipole-dipole relaxation Cross correlation Spectral density values

0

5

10

15

20

25

30

35

40

32

20 0 0 1 1 0

5

BMRB first, PDB second

[0..20min)[20min..1hour)[1..2) hours[2..4) hours[4..8) hours[8..16) hours[16..32) hours[32..64) hours>= 64 hours

Number of depositions

ADIT-NMR dual session latencyFor sessions that are deposited to both PDB and BMRB, how closetogether do the two depositions occur in time?

After depositing to either of the two databanks, much of the data for depositing tothe other databank is also complete, hence the short times seen here. In fact, the2 hour bar can be broken down further, and it turns out that most are less than 15minutes.

0

5

10

15

20

25

30

35

40

19

3

02

1 0

31

9

Depositions to BMRB

[0..20min)[20min..1hour)[1..2) hours[2..4) hours[4..8) hours[8..16) hours[16..32) hours[32..64) hours>= 64 hours

Number of depositions

BMRB first, PDB second PDB first, BMRB second

Page 13: BioMagResBank (BMRB) Data Deposition and Entry Annotation ... · T2 relaxation Heteronuclear NOE Homonuclear NOE/ROE Dipole-dipole relaxation Cross correlation Spectral density values

ADIT-NMR development

Phase I – completed/retiredBMRB only depositions

Phase II – completed/being refinedBMRB-PDB combined depositions

Phase III – being designedAccess to NMR atomic coordinate and restraint

validation toolsAccess to assigned chemical shift validation toolsImproved data import functions (PDB Extract,

CCPN data harvesting tools, others)

Page 14: BioMagResBank (BMRB) Data Deposition and Entry Annotation ... · T2 relaxation Heteronuclear NOE Homonuclear NOE/ROE Dipole-dipole relaxation Cross correlation Spectral density values
Page 15: BioMagResBank (BMRB) Data Deposition and Entry Annotation ... · T2 relaxation Heteronuclear NOE Homonuclear NOE/ROE Dipole-dipole relaxation Cross correlation Spectral density values

NMR Restraints processing

FRED

Synchronize modelsCorrect atom nomenclatureAdd missing

hydrogen atoms(Wattos)

DOCR

NMRRestraints

Grid

PDBE-MSD

RECOORD

Parse restraints(Wattos)

Link parsed restraints and coordinates

NMR-STAR data(FormatConverter, E-MSD)

Analyze coordinates and all NMR

constraint types

Structure recalculation(Amber/CNS/CYANA/Gromacs/

Yasara etc.)

Convert to structure calculation programs(FormatConverter)

Remove surplus, calculate violations, completeness,

information content(Wattos/Queen)

Refine molecular system

(FormatConverter)

NomenclatureCorrection(WHAT IF)

Development site

BMRBEBIOther

Page 16: BioMagResBank (BMRB) Data Deposition and Entry Annotation ... · T2 relaxation Heteronuclear NOE Homonuclear NOE/ROE Dipole-dipole relaxation Cross correlation Spectral density values

‘Surplus’ distance restraints categories

Exceptional – atoms not present in the PDB entry

Double – duplicate restraints

Impossible – do not match topology provided

Fixed – atoms have fixed distances

Redundant – upper bounds greater than what isallowed by topology

Page 17: BioMagResBank (BMRB) Data Deposition and Entry Annotation ... · T2 relaxation Heteronuclear NOE Homonuclear NOE/ROE Dipole-dipole relaxation Cross correlation Spectral density values

PDB Entry counts by criteria

Page 18: BioMagResBank (BMRB) Data Deposition and Entry Annotation ... · T2 relaxation Heteronuclear NOE Homonuclear NOE/ROE Dipole-dipole relaxation Cross correlation Spectral density values

ConclusionsFraction of good converted MR files highly increased from ~500 to 2,271/3,057 and remaining issues documented Contacted authors since April 2006 for entry failing any of fourcriteria. About 10 % of the entries; mostly larger violations that are ok by author

Processing the entries has been further automated; onlyentries failing criteria get checked manually at this stage

Results from RECOORD database analysisWater refinement improved structure qualityLow correlations were found between various quality

indicators Surprisingly, quality indicators did not correlate well

with the number of distance restraints

Page 19: BioMagResBank (BMRB) Data Deposition and Entry Annotation ... · T2 relaxation Heteronuclear NOE Homonuclear NOE/ROE Dipole-dipole relaxation Cross correlation Spectral density values

Assigned chemical shift checksTALOS/NMRPipe/MolMol implementation

Cornilescu, G., Delaglio, F., and Bax, A., J. Biomol. NMR 13, 289 (2001)

Delaglio, F., unpublishedKoradi, R., Billeter, M., and Wüthrich, K., J. Mol. Graph 4,

51 (1996)AVS (Assignment validation software)

Moseley, H.N.B., Sahota, G. and Montelione, G.T., J. Biomol. NMR 28, 341 (2004)

LACS (Linear analysis of carbon-13 chemical shift)Wang, L., Eghbalnia, H. R., Bahrami, A., and Markley,

J.L. J. Biomol. NMR 32, 13-22 (2005)Shifts (planned)

Xu, X.P. and Case, D.A., J. Biomol. NMR 21, 321 (2001)SHIFTX/SHIFTCOR (planned)

Neal S., Nip, A.M., Zhang, H., and Wishart, D.S., J. Biomol. NMR 26, 215 (2003)

Page 20: BioMagResBank (BMRB) Data Deposition and Entry Annotation ... · T2 relaxation Heteronuclear NOE Homonuclear NOE/ROE Dipole-dipole relaxation Cross correlation Spectral density values

AVS chemical shift validation reportA no m alous C he m ical Shift A ssign m ents:

T he assigned che m ical shifts in the follo wing table have been reported as ano m alous, suspicious,or duplicate (A, S or D respectively, in the Error M sg. colu m n) by the softw are e m ployed byB M R B to check for che m ical shift outliers. Please verify these assign m ents by replacing thequestion m arks in the 'C ode' colu m n of the table with the appropriate code. T he codes to use are:V = verified, D = delete, and R = replace. W here R is indicated, please supply the revisedche m ical shift value in the Replace C.S. colu m n of the table. If there are a large nu m ber ofrevised che m ical shifts, it m ay be m ore convenient to edit the full N M R-S T A R file. Pleaseinfor m the annotator in charge of the entry of your m odifications.

A uthor V erificationM ol Res. Res. Ato m O bs. Error E xpected Std. C ode ReplaceID # T ype D elta M sg. D elta D ev. C.S.--------------------------------------------------------------------------------------------------------1 17 G L U N 108.138 A 120.68 3.68 ? ?1 21 L E U H D 2 -0.446 S 0.76 0.28 ? ?1 30 L Y S H 10.556 S 8.22 0.64 ? ?1 34 IL E H D 1 -0.297 S 0.7 0.3 ? ?1 53 T H R C B 62.879 S 69.64 1.7 ? ?

Page 21: BioMagResBank (BMRB) Data Deposition and Entry Annotation ... · T2 relaxation Heteronuclear NOE Homonuclear NOE/ROE Dipole-dipole relaxation Cross correlation Spectral density values

Protein delta chemical shift values (observed – calculated)

Page 22: BioMagResBank (BMRB) Data Deposition and Entry Annotation ... · T2 relaxation Heteronuclear NOE Homonuclear NOE/ROE Dipole-dipole relaxation Cross correlation Spectral density values

Histogram of delta chemical shifts (observed – calculated) for protein helix and sheet residues

Page 23: BioMagResBank (BMRB) Data Deposition and Entry Annotation ... · T2 relaxation Heteronuclear NOE Homonuclear NOE/ROE Dipole-dipole relaxation Cross correlation Spectral density values

LACS visualization

Page 24: BioMagResBank (BMRB) Data Deposition and Entry Annotation ... · T2 relaxation Heteronuclear NOE Homonuclear NOE/ROE Dipole-dipole relaxation Cross correlation Spectral density values
Page 25: BioMagResBank (BMRB) Data Deposition and Entry Annotation ... · T2 relaxation Heteronuclear NOE Homonuclear NOE/ROE Dipole-dipole relaxation Cross correlation Spectral density values

International structure genomics task force committees

• Structural genomics informatics task force

• Task force on numerical criteria for evaluating and assuring structure quality

• Task force on tracking and registration of targets

• Task force on deposition, archiving, and curation of the primary information

• Task force on mechanisms for publication and recording of methods

• Task force on intellectual property rights

Page 26: BioMagResBank (BMRB) Data Deposition and Entry Annotation ... · T2 relaxation Heteronuclear NOE Homonuclear NOE/ROE Dipole-dipole relaxation Cross correlation Spectral density values

Structural genomics NMR-STAR dictionary development collaborations

Cheryl Arrowsmith

Michael Kennedy

John Markley

Guy Montelione

James Prestegard

David Wemmer

Page 27: BioMagResBank (BMRB) Data Deposition and Entry Annotation ... · T2 relaxation Heteronuclear NOE Homonuclear NOE/ROE Dipole-dipole relaxation Cross correlation Spectral density values

NMR dictionary general discussion topicsAlignment with mmCIF - Yes

Use identical tags and definitions whenever possible

Human readable export data format – Yes

Reproduction of the experiment and data derivation - NoInput data, derived data, description of the tools,

protocol files, and parameter files used in the derivationExplicit links between the input data items and individual

derived data items, including the possibility of capturing intermediate results in the derivation process

Application specific data items - NoCapture in the protocol and parameter files

Software must be in place to meet higher deposition requirements - Yes

Page 28: BioMagResBank (BMRB) Data Deposition and Entry Annotation ... · T2 relaxation Heteronuclear NOE Homonuclear NOE/ROE Dipole-dipole relaxation Cross correlation Spectral density values

BMRB Time-domain data summary(www.bmrb.wisc.edu/data_library/timedomain/)

Entries: released 66

NMR experiments:Total >600Unique 75-100Reduced dimensionality 5 (entries)

(entries 5596, 5844, 5859, 7170, 7191)

Pulse sequences and processing parameters are often provided

Other information:Peak lists 15 (entries)Structure calculation (with all intermediate results) 3

(entries 6128, 6176, 6318)

Page 29: BioMagResBank (BMRB) Data Deposition and Entry Annotation ... · T2 relaxation Heteronuclear NOE Homonuclear NOE/ROE Dipole-dipole relaxation Cross correlation Spectral density values
Page 30: BioMagResBank (BMRB) Data Deposition and Entry Annotation ... · T2 relaxation Heteronuclear NOE Homonuclear NOE/ROE Dipole-dipole relaxation Cross correlation Spectral density values

Future issues

Probabilistic approaches to structure determination

Structure determination from chemical shift data

Use of multiple kinds of NMR data in structure calculations

Structure refinement with data from non-NMR techniques

Page 31: BioMagResBank (BMRB) Data Deposition and Entry Annotation ... · T2 relaxation Heteronuclear NOE Homonuclear NOE/ROE Dipole-dipole relaxation Cross correlation Spectral density values

BMRB MadisonJohn L. MarkleyJurgen F. DoreleijersJundong LinSteve MadingDimitri MaziukDavid TolmieHongyang YaoChristopher Schulte

Computer sciences collaboratorsYannis Ioannidis Miron LivnyZachary MillerR. Kent Wenger

RCSBHelen BermanZukang FengMonica SekharanMonica SunddJohn WestbrookJasmine Young

Rutgers UniversityMike BaranDehua HangGuy MontelioneHunter Moseley

AcknowledgementsBMRB Advisory Board

CCPN/EBI Wayne BoucherRasmus FoghJohn IonidesErnest LaueTim StevensWim Vranken

NMRFAMArash BahramiGabriel CornilescuHamid EghbalniaLiya WangWilliam M. Westler

Utrecht UniversityAlexandre BonvinAart NederveenRobert Kaptein

Members of the NMR Community Structural genomics groupsCatherine BougaultBruce JohnsonDavid WishartZsolt Zolnai

Funding

BMRB OsakaHideo AkutsuEiichi NakataniYoko Harano

BMRB FlorenceAntonio Rosato