structure representation and coordinates format lecture 3 structural bioinformatics dr. avraham...

21
Structure Representation and Coordinates Format Lecture 3 Structural Bioinformatics Dr. Avraham Samson 81-871

Upload: theresa-george

Post on 25-Dec-2015

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Structure Representation and Coordinates Format Lecture 3 Structural Bioinformatics Dr. Avraham Samson 81-871

Structure Representation and Coordinates Format

Lecture 3Structural Bioinformatics

Dr. Avraham Samson81-871

Page 2: Structure Representation and Coordinates Format Lecture 3 Structural Bioinformatics Dr. Avraham Samson 81-871

2

The PDB Format

• A full description is here • It was designed around an 80 column punched card!• It was designed to be human readable• It is used by almost every piece of software that

deals with structural data

Page 3: Structure Representation and Coordinates Format Lecture 3 Structural Bioinformatics Dr. Avraham Samson 81-871

3

The PDB Format - Records

• Every PDB file may be broken into a number of lines terminated by an end-of-line indicator. Each line in the PDB entry file consists of 80 columns. The last character in each PDB entry should be an end-of-line indicator.

• Each line in the PDB file is self-identifying. The first six columns of every line contain a record name, left-justified and blank-filled. This must be an exact match to one of the stated record names.

• The PDB file may also be viewed as a collection of record types. Each record type consists of one or more lines.

• Each record type is further divided into fields.

Page 4: Structure Representation and Coordinates Format Lecture 3 Structural Bioinformatics Dr. Avraham Samson 81-871

4

The PDB Format – An Example – The Header

Page 5: Structure Representation and Coordinates Format Lecture 3 Structural Bioinformatics Dr. Avraham Samson 81-871

5

The PDB Format – An Example – The Atomic Coordinates

Page 6: Structure Representation and Coordinates Format Lecture 3 Structural Bioinformatics Dr. Avraham Samson 81-871

6

The Description – Atom Records

Page 7: Structure Representation and Coordinates Format Lecture 3 Structural Bioinformatics Dr. Avraham Samson 81-871

7

What is Wrong with this Approach?

• The description and the data are separate• Parsing is a nightmare – the most complex piece

of code we have in our research laboratory probably remains the PDB parser

• There are no relationships between items of data• Some data just cannot be parsed• The fixed column format cannot represent some of

today’s structures …

Page 8: Structure Representation and Coordinates Format Lecture 3 Structural Bioinformatics Dr. Avraham Samson 81-871

Structures are Spread Over Multiple Files – Most Users are Not Aware of this

8

Page 9: Structure Representation and Coordinates Format Lecture 3 Structural Bioinformatics Dr. Avraham Samson 81-871

9

REMARK 3 REFINEMENT. BY THE RESTRAINED LEAST-SQUARES PROCEDURE OFREMARK 3 J. KONNERT AND W. HENDRICKSON (PROGRAM *PROLSQ*). THE RREMARK 3 VALUE IS 0.168 FOR 2680 REFLECTIONS WITH I GREATER THANREMARK 3 2.0*SIGMA(I) REPRESENTING 74 PER CENT OF THE TOTALREMARK 3 AVAILABLE DATA IN THE RESOLUTION RANGE 10.0 TO 2.0REMARK 3 ANGSTROMS.

REMARK 4 THE ERABUTOXIN A (EA) CRYSTAL STRUCTURE IS ISOMORPHOUS WITHREMARK 4 THE KNOWN STRUCTURE OF ERABUTOXIN B (PROTEIN DATA BANKREMARK 4 ENTRIES *2EBX*, *3EBX*). EA DIFFERS FROM EB BY A SINGLEREMARK 4 SUBSTITUTION - EA ASN 26 FOR EB HIS 26. THE EA STARTINGREMARK 4 MODEL WAS OBTAINED FROM A MOLECULAR REPLACEMENT STUDY INREMARK 4 WHICH COORDINATES FOR 309 OF THE 475 ATOMS IN THE EBREMARK 4 STRUCTURE (*2EBX*) WERE USED.

PDB Format - Important Components of the Data are Lost to

All But Humans

Page 10: Structure Representation and Coordinates Format Lecture 3 Structural Bioinformatics Dr. Avraham Samson 81-871

mmCIF Was Developed to Address these Problems

Methods in Enzymology. 1997 277, 571-590

10

Page 11: Structure Representation and Coordinates Format Lecture 3 Structural Bioinformatics Dr. Avraham Samson 81-871

11

• All PDB data should be captured• Describe a paper’s material and methods

section• Describe biologically active molecule• Fully describe secondary structure but not

tertiary or quaternary• Describe details of chemistry (inc. 2D)• Meaningful 3D views

mmCIF – Scope of the Initial Effort

Page 12: Structure Representation and Coordinates Format Lecture 3 Structural Bioinformatics Dr. Avraham Samson 81-871

12

loop_ _atom_site.group_PDB _atom_site.type_symbol _atom_site.label_atom_id _atom_site.label_comp_id _atom_site.label_asym_id _atom_site.label_seq_id _atom_site.label_alt_id _atom_site.Cartn_x _atom_site.Cartn_y _atom_site.Cartn_z _atom_site.occupancy _atom_site.B_iso_or_equiv _atom_site.footnote_id _atom_site.entity_id _atom_site.entity_seq_num _atom_site.id ATOM N N VAL A 11 . 25.360 30.691 11.795 1.00 17.93 . 1 11 1 ATOM C CA VAL A 11 . 25.970 31.965 12.332 1.00 17.75 . 1 11 2 ATOM C C VAL A 11 . 25.569 32.010 13.881 1.00 17.83 . 1 11 3

mmCIF - Extract from a Data File

Page 13: Structure Representation and Coordinates Format Lecture 3 Structural Bioinformatics Dr. Avraham Samson 81-871

13

Summary• mmCIF has provided the PDB with a robust data

representation which serves as conceptual and physical schema upon which the current RCSB, PDBe and PDBj are built

• This work predated XML and XML-schema but embodies the important concepts inherent in these descriptions

• mmCIF was later exactly converted into XML and is now used more than mmCIF, but much less than the old PDB format

• Today mmCIF has no advantage over PDB

Page 14: Structure Representation and Coordinates Format Lecture 3 Structural Bioinformatics Dr. Avraham Samson 81-871

Other representations

• SMILES http://en.wikipedia.org/wiki/Simplified_molecular-input_line-entry_system

14

Page 15: Structure Representation and Coordinates Format Lecture 3 Structural Bioinformatics Dr. Avraham Samson 81-871

Other representations

15

Page 16: Structure Representation and Coordinates Format Lecture 3 Structural Bioinformatics Dr. Avraham Samson 81-871

Representing Positions

• Cartesian coordinates (x,y,z) are an easy and natural means of representing a position in 3D space

• There are many other alternatives such as polar notation (r,θ,φ) and you can invent others if you want to

Page 17: Structure Representation and Coordinates Format Lecture 3 Structural Bioinformatics Dr. Avraham Samson 81-871

Other representations

-Cartesian coordinates vs. polar coordinates

17

Page 18: Structure Representation and Coordinates Format Lecture 3 Structural Bioinformatics Dr. Avraham Samson 81-871

The center of the graph is called the pole.

Angles are measured from the positive x axis.

Points are represented by a radius and an angle

(r, )

radius angle

To plot the point

4,5

First find the angle

Then move out along the terminal side 5

Page 19: Structure Representation and Coordinates Format Lecture 3 Structural Bioinformatics Dr. Avraham Samson 81-871

Let's generalize this to find formulas for converting from rectangular to polar coordinates.

(x, y)

r y

x

222 ryx

x

ytan

22 yxr

x

y1tan

Page 20: Structure Representation and Coordinates Format Lecture 3 Structural Bioinformatics Dr. Avraham Samson 81-871

Let's generalize the conversion from polar to rectangular coordinates.

r

xcos

,rr

y

x

r

ysin

cosrx

sinry

Page 21: Structure Representation and Coordinates Format Lecture 3 Structural Bioinformatics Dr. Avraham Samson 81-871

• How would you calculate distance?

• How would you calculate centroid?

• How would you calculate dihedral angle?

21