structure superposition ≠ structure alignment lecture 11 chapter 16, du and bourne “structural...

38
Structure superposition ≠ Structure alignment Lecture 11 er 16, Du and Bourne “Structural Bioinformatics”

Upload: claribel-richard

Post on 29-Dec-2015

229 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Structure superposition ≠ Structure alignment Lecture 11 Chapter 16, Du and Bourne “Structural Bioinformatics”

Structure superposition ≠ Structure alignment

Lecture 11Chapter 16, Du and Bourne “Structural Bioinformatics”

Page 2: Structure superposition ≠ Structure alignment Lecture 11 Chapter 16, Du and Bourne “Structural Bioinformatics”

Why?

A. Study the conformational changes of the same protein with or without ligands

-- Same protein sequences

B. Study the effect of mutations on protein structure -- Highly similar protein sequences

C. Assessment of protein structure prediction. -- How accurate is the predicted models? -- Same protein sequences

D. Remote homolog detection. Structures generally are preserved better than sequences over the course of evolution.

e.g. myoglobin and -hemoglobin are homologous and have similar structures, but the sequence identity can be as low as 8.5%!

E. Classification of protein folds

Page 3: Structure superposition ≠ Structure alignment Lecture 11 Chapter 16, Du and Bourne “Structural Bioinformatics”

• Structures may align well even if there sequence similarity is low.

• For example, an optimal superposition of myoglobin and beta-hemoglobin, which are structural neighbors.

• However, their sequence identity is only 8.5%!

Why? Structure conservation > sequence conservation

Page 4: Structure superposition ≠ Structure alignment Lecture 11 Chapter 16, Du and Bourne “Structural Bioinformatics”

Receiver Operating Characteristic

Why? Structure conservation > sequence conservation

Chothia and Lesk

Tru

e po

sitiv

e ra

te (

%)

False positive rate (%)

Page 5: Structure superposition ≠ Structure alignment Lecture 11 Chapter 16, Du and Bourne “Structural Bioinformatics”

ROC experiment:

- For each pair P of proteins in dataset, perform alignment and record score: S(P)

- Rank all pairs according to their scores, from highest to lowest.

- Scan ranked pairs, and record rate of true positives and true negatives.

Receiver Operating Characteristic

ASIDE: Making sense of a ROC curve

Tru

e po

sitiv

e ra

te (

%)

False positive rate (%)

Page 6: Structure superposition ≠ Structure alignment Lecture 11 Chapter 16, Du and Bourne “Structural Bioinformatics”

ASIDE: Making sense of a ROC curve

1.00 Yes0.99 Yes0.98 Yes0.97 Yes0.96 No0.95 No0.93 Yes0.91 Yes0.89 No0.87 No0.85 No0.83 No0.83 Yes0.81 No0.77 No0.74 No0.73 No0.70 No0.69 No0.67 Yes0.62 No0.56 No0.54 No0.53 No

Prediction

Benchmark

(%)

(%)

Page 7: Structure superposition ≠ Structure alignment Lecture 11 Chapter 16, Du and Bourne “Structural Bioinformatics”

Alignment vs. Superposition

• Structural alignment attempts to establish homology between two or more polymer structures based on their shape and 3D structure.

• Structural alignment requires no a priori knowledge of equivalent positions.

• Structural alignment is a valuable tool for the comparison of proteins with low sequence similarity, where evolutionary relationships between proteins cannot be easily detected by standard sequence alignment techniques.

• Conversely, simple structural superposition uses knowledge of at least some equivalent residues to guide a rigid body superposition.

• The most basic possible comparison between protein structures makes no attempt to align the input structures.

• Requires a precalculated alignment as input to determine which of the residues in the sequence are intended to be considered in the RMSD calculation.

Page 8: Structure superposition ≠ Structure alignment Lecture 11 Chapter 16, Du and Bourne “Structural Bioinformatics”

Structural superposition of two CheY orthologs

In pairwise structure superposition, a correspondence set of residue pairs is established by a pairwise sequence alignment.

Page 9: Structure superposition ≠ Structure alignment Lecture 11 Chapter 16, Du and Bourne “Structural Bioinformatics”

• Superposition algorithms optimize the orientation and spatial position of the two molecules with respect to each other.

• Superposition usually starts with a sequence comparison, which establishes the one-to-one relationships between pairs of atoms from which the RMSD is computed.

• This is typically a good assumption at appreciable pairwise sequence identity, but breaks down in the Twilight Zone.

• Once atom-to-atom relationships between two structures are established, the task of the algorithm is to achieve an optimal superposition with the smallest possible RMSD. It is usually impossible to achieve perfect overlap of all atoms pairs even for structures with 100% identical sequence.

• Overlaying one pair of atoms perfectly may push another pair of atoms further apart.

• Also, as in sequence alignment, there is a friction between global vs. local matching that must be considered.

Pairwise structure superposition

Page 10: Structure superposition ≠ Structure alignment Lecture 11 Chapter 16, Du and Bourne “Structural Bioinformatics”

Global alignmentImages and content from Patrice Koehl at UCDavis

Global similarity ≠ local similarity

Page 11: Structure superposition ≠ Structure alignment Lecture 11 Chapter 16, Du and Bourne “Structural Bioinformatics”

Local alignment

Structural motif

Images and content from Patrice Koehl at UCDavis

Global similarity ≠ local similarity

Page 12: Structure superposition ≠ Structure alignment Lecture 11 Chapter 16, Du and Bourne “Structural Bioinformatics”

Choosing an appropriate description of structure

Structure comparisons can be done at several different levels

Individual atoms

--disadvantages?

Residue positions, which can be specified by the coordinates of C, C, and the center of mass of the side-chains

What are advantages and disadvantages of using different residue representations?

Small fragments

Secondary structure elements (SSE)

Page 13: Structure superposition ≠ Structure alignment Lecture 11 Chapter 16, Du and Bourne “Structural Bioinformatics”

Choosing an appropriate description of structure

• Only when the structures to be aligned are highly similar or even identical is it meaningful to align side-chain atom positions.

-- In which case the RMSD reflects not only the conformation of the protein backbone but also the rotameric states of the side chains.

• Other comparison criteria that reduce noise and bolster positive matches include:

-- Secondary structure assignment-- Native contact maps or residue interaction patterns-- Measures of side chain packing-- Measures of hydrogen bond retention

Page 14: Structure superposition ≠ Structure alignment Lecture 11 Chapter 16, Du and Bourne “Structural Bioinformatics”

Contact map

Page 15: Structure superposition ≠ Structure alignment Lecture 11 Chapter 16, Du and Bourne “Structural Bioinformatics”

Structure superposition requires minimizing the error within the framework of some object function. Which one?

• Torsion angle comparison• Distance matrices • Structure superposition (RMSD, TM-score, etc.) Most obvious & common• Secondary structure superposition (SHEBA)

This decision must also be made for structure alignment since superposition is used (many times over) in the harder problem.

Choosing an object function to extremize

Page 16: Structure superposition ≠ Structure alignment Lecture 11 Chapter 16, Du and Bourne “Structural Bioinformatics”

Torsion angles ( ,f y) are:- local by nature- invariant upon rotation and translation of the molecule- compact (O(n) angles for a protein of n residues)

Add 1 degreeTo all , f y

But…

Images and content from Patrice Koehl at UCDavis

Torsion angles

Page 17: Structure superposition ≠ Structure alignment Lecture 11 Chapter 16, Du and Bourne “Structural Bioinformatics”

Images and content from Patrice Koehl at UCDavis

1

2

3

4

6.0

8.1

5.9

1 2 3 4

1 0 3.8 6.0 8.1

2 3.8 0 3.8 5.9

3 6.0 3.8 0 3.8

4 8.1 5.9 3.8 0

• Advantages- invariant with respect to rotation and translation- can be used to compare proteins

• Disadvantages- the distance matrix is O(n2) for a protein with n residues- comparing distance matrices is a difficult problem- insensitive to chirality

Distance matrices

Page 18: Structure superposition ≠ Structure alignment Lecture 11 Chapter 16, Du and Bourne “Structural Bioinformatics”

Scoring DM similarity (or in this case, contact map)

Page 19: Structure superposition ≠ Structure alignment Lecture 11 Chapter 16, Du and Bourne “Structural Bioinformatics”

Introduce a gap

Scoring DM similarity (or in this case, contact map)

In superposition, gap location is defined by an alignment!In alignment, different gap positions are tried till the best overlap is identified.

Page 20: Structure superposition ≠ Structure alignment Lecture 11 Chapter 16, Du and Bourne “Structural Bioinformatics”

• The most common parameter that expresses the difference between two protein structures is RMSD, or root mean squared deviation (distance), in atomic positions between the two structures.

• RMSD can be calculated as a function of all atoms or as a function of some subset of the atoms, such as the backbone or CA atoms.

• Using a subset of the protein atoms is common because it is likely that, when two protein structures are compared, they will not be identical to each other in sequence, and therefore the only atoms between which one-to-one comparison in position can be made will be the backbone atoms.

Root mean squared deviation (RMSD)

Page 21: Structure superposition ≠ Structure alignment Lecture 11 Chapter 16, Du and Bourne “Structural Bioinformatics”

1 2 3 4 5

12

3

45

d5d4d3d2d1

RMSD calculation

The two structures must first be superimposed to calculate a meaningful RMSD value because they are currently in different coordinate systems !!!

Page 22: Structure superposition ≠ Structure alignment Lecture 11 Chapter 16, Du and Bourne “Structural Bioinformatics”

1 2 4 5

12

3

45

d5d4d2d1

RMSD calculation (with a gap)

Blue 1 – 2 – 3 – 4 – 5Red 1 – 2 – x – 4 - 5

Page 23: Structure superposition ≠ Structure alignment Lecture 11 Chapter 16, Du and Bourne “Structural Bioinformatics”

Estimating RMSD by averaging distances generally gets better as the correspondence set size increases.

However, RMSD must always be greater than <dis>.

RMSD vs. average D as a function of n

Page 24: Structure superposition ≠ Structure alignment Lecture 11 Chapter 16, Du and Bourne “Structural Bioinformatics”

1 2 3 4 5

12

3

45

Using RMSD to find the optimal superposition

Page 25: Structure superposition ≠ Structure alignment Lecture 11 Chapter 16, Du and Bourne “Structural Bioinformatics”

1 2 3 4 512

3

45

1 2 3 4 512

3

45

1 2 3 4 51

2

3

45 1 2 3 4 5

12

3

45

1 2 3 4 51

2

3

45 1 2 3 4 51

2

3

45

Superposition is too complicated for manual optimization

Page 26: Structure superposition ≠ Structure alignment Lecture 11 Chapter 16, Du and Bourne “Structural Bioinformatics”

• Simplified problem (compared to structure alignment): we know the correspondence between set A and set B.

• We wish to compute the rigid transformation T that best align a1 with b1, a2 with b2, …, aN with bN

• The error to minimize is defined above.

Old problem, solved in Statistics, Robotics, Medical Image Analysis, etc.

Images and content from Patrice Koehl at UCDavis

Using RMSD to find the optimal superposition

Page 27: Structure superposition ≠ Structure alignment Lecture 11 Chapter 16, Du and Bourne “Structural Bioinformatics”

• A rigid-body transformation T is a combination of a translation t and a rotation R, thus: T(x) = Rx + t.

• The quantity to be minimized is:

• The algorithm includes a fair amount of linear algebra (and a little bit of calculus) that is outside the scope of this class.

• Believe it or not, the algorithm is O(n)!

Images and content from Patrice Koehl at UCDavis Representation of 6 “trivial” DOF

Using RMSD to find the optimal superposition

Page 28: Structure superposition ≠ Structure alignment Lecture 11 Chapter 16, Du and Bourne “Structural Bioinformatics”

Pseudocode: Superposition algorithm in reality

1.) Define error function (RMSD)

2.) Determine correspondence set (pairwise sequence alignment)

3.) Translation = align centers of mass (COM)

4.) Rotation = use matrix methods to solve for rotation that minimizes the error function (variety of methods available)

5.) Evaluate the resultant superposition

6.) Refine the superposition (b/c COM to COM may not be best translation)

7.) Iterate till convergence

Using RMSD to find the optimal superposition

Page 29: Structure superposition ≠ Structure alignment Lecture 11 Chapter 16, Du and Bourne “Structural Bioinformatics”

3

4

5

6

12

3

4

5

6

12

1.) Generate pairwise alignment

1 2 3 - 4 5 1 2 3 4 5 6

2.) Find optimal superimposition - Translation

Back to our toy model…

- Rotation

1 2 3 4 51 2 3 4 5

3

4

56

1 2

Page 30: Structure superposition ≠ Structure alignment Lecture 11 Chapter 16, Du and Bourne “Structural Bioinformatics”

Sequence identity = 83%RMSD = 1.0 Å

Superposition of a pair of CuZnSOD structures

Page 31: Structure superposition ≠ Structure alignment Lecture 11 Chapter 16, Du and Bourne “Structural Bioinformatics”

Sequence identity = 83%RMSD = 1.0 Å

Superposition of a pair of CuZnSOD structures

Page 32: Structure superposition ≠ Structure alignment Lecture 11 Chapter 16, Du and Bourne “Structural Bioinformatics”

<Sequence identity> = 68% 35%<RMSD> = 1.6 Å 0.6 Å

Superposition of several CuZnSOD structures

Page 33: Structure superposition ≠ Structure alignment Lecture 11 Chapter 16, Du and Bourne “Structural Bioinformatics”

Ligand free

Complexed with trifluoperazine

Global vs. local superposition in Calmodulin

Page 34: Structure superposition ≠ Structure alignment Lecture 11 Chapter 16, Du and Bourne “Structural Bioinformatics”

Global alignment: RMSD =15 Å (143 residues)

Local alignment: RMSD = 0.9 Å (62 residues)

Global vs. local superposition in Calmodulin

Page 35: Structure superposition ≠ Structure alignment Lecture 11 Chapter 16, Du and Bourne “Structural Bioinformatics”

RMSD = 0.0 Å Aligned = 95Z-score = 17.3

RMSD = 0.0 Å Aligned = 101Z-score = 18.4

RMSD = 0.0 Å Aligned = 40Z-score = 3.7

By itself, RMSD is not a very useful error function

For example, consider a series of fragments all generated from the blue structure…

Page 36: Structure superposition ≠ Structure alignment Lecture 11 Chapter 16, Du and Bourne “Structural Bioinformatics”

Up-weighting secondary structure, etc.

Based on the assumption that that secondary structure elements should match-up better than coil, we can easily modify the RMSD calculation to reflect that.

That is, a multiplier is applied (where x1 > x2) to up-weight the important stuff.

For example, assuming the red dots correspond to secondary structures in the figure above, RMSD’ < RMSD, which might be expected to be a more accurate reflection of the similarity between the pair.

Page 37: Structure superposition ≠ Structure alignment Lecture 11 Chapter 16, Du and Bourne “Structural Bioinformatics”

Template Modeling Score (TM-score)

• The TM-score is a measure of similarity between two protein structures with different tertiary structures, which is intended as a more accurate measure of the quality of full-length protein structures than the often used RMSD measures.

• The TM-score indicates the difference between two structures by a score between (0,1], where 1 indicates a perfect match between two structures.

• Generally scores below 0.20 corresponds to randomly chosen unrelated proteins whereas structures with a score higher than 0.5 assume roughly the same fold.

• The TM-score is designed to be independent of protein lengths.

do = Normalization factordi = Distance between i-th

residue pairLxxx = Lengths of target

protein and alignment

Y. Zhang, J. Skolnick, Scoring function for automated assessment of protein structure template quality, Proteins, 2004 57: 702-710

Page 38: Structure superposition ≠ Structure alignment Lecture 11 Chapter 16, Du and Bourne “Structural Bioinformatics”

RMSD vs TM-score

RMSD: 12.1ÅTM-score:0.81

RMSD:12.5ÅTM-score:0.22

Images from Dr. Zhang at KU