judgment day. topic 6 chapter 14 & 15, du and bourne “structural bioinformatics”

53
Judgment day. Topic 6 er 14 & 15, Du and Bourne “Structural Bioinformatics”

Upload: marylou-hodges

Post on 25-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Judgment day. Topic 6 Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”

Judgment day.

Topic 6Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”

Page 3: Judgment day. Topic 6 Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”

Science, 314:1856, 2006ABC transporter

For high profile structures, they are not merely contaminations in PDB if serious errors occur. In this case, a software bug “flipped” two columns of data, inverting the electron density map.

Page 4: Judgment day. Topic 6 Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”

Steps in Structure Determination using X-ray Crystallography

Image from “Protein Structure and Function” by Gregory A Petsko and Dagmar Ringe

Steps in Structure Determination using NMR

Experimental Methods for Structure Determination

Models!

Page 5: Judgment day. Topic 6 Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”

The process involves instrumentation, methodology, software, experimental procedures....., so random and systematic error scan occur. Experimental errors vs. interpretation errors.

Limitation of data vs. subjectivity “Given the same data, no two crystallographers will ever produce identical final models” –Kleywegt GL

Local errors vs. global errors

Structure Assessment and Validation, Why?

Page 6: Judgment day. Topic 6 Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”

Global Quality Parameters for X-ray Structures

Rules of Thump for high quality X-ray structures:resolution 2.0 Å or better and R-factor: 0.2 or less

Page 7: Judgment day. Topic 6 Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”

The agreement between the diffraction data and the model is measured by R-factor:

R-free: about 10% of the observations are removed from the data set before refinement. Then, refinement is performed using the remaining 90%. The R-free value is calculated to see how well the model predicts the 10% that were not used in refinement, leading to a less biased quantity.

F: structural factor

R-factor for X-ray Structures

Page 8: Judgment day. Topic 6 Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”

PHOTOACTIVE YELLOW PROTEIN1PHY was solved in 1989, the entire backbone trace is incorrect. 2PHY was solved in 1995.RMSD between 1PHY and 2PHY ~15 Å.

Serious Structural Errors

1PHY 2PHY

Kleywegt GJ., “Validation of protein crystal structures”, Acta Cryst, 2000, D56, 249-265

Blue: N-terminalRed: C-terminal

Page 9: Judgment day. Topic 6 Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”

Obsolete Structures in PDB

Page 10: Judgment day. Topic 6 Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”

Obsolete Structures in PDB

Page 11: Judgment day. Topic 6 Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”

Secondary structure assignments are correct Topology is incorrect

Serious Structural Errors

1PTE 3PTE

Blue: N-terminalRed: C-terminal

Kleywegt GJ., “Validation of protein crystal structures”, Acta Cryst, 2000, D56, 249-265

Page 12: Judgment day. Topic 6 Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”

Nabuurs, et al Plos Computational Biology 2(2), 2006

96% identity

A, D: human (1TGQ)

B, C: Mouse (1Y4O)

Sequence and Structure Ensembles of Two DLC2A Structures

Major Errors from NMR Spectroscopy

Intermolecular contacts vs. intramolecular contacts

Page 13: Judgment day. Topic 6 Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”

From Nabuurs, et al Plos Computational Biology 2(2), 2006…

The observed pattern of dispersed signals, ideally one for each amino acid, provides a “fingerprint” of the protein.

However, the formation of a symmetric dimer, as shown in Figure 1A, does not result in a doubling of the number of observed NMR signals.

Consequently, it is not straightforward to determine the oligomeric state of a protein from its 15N-HSQC NMR spectra alone, and typically assessments have to be made from estimates of the protein's relaxation rates [26].

Therefore, if the oligomeric state of a protein is not known or is incorrectly known, the NMR spectra of a dimeric protein could be easily interpreted as originating from a monomer.

Major Errors from NMR Spectroscopy

Intermolecular contacts vs. intramolecular contacts

Page 14: Judgment day. Topic 6 Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”

Other common errors, which tend to be less severe

Flipped residues -- Asn, Gln, and His.

Missing sidechain atoms -- especially in longer-chain, solvent-exposed residues (i.e., lysine and arginine).

Missing backbone atoms -- especially in loop regions.

Truncated or incomplete chains -- the “PDB sequence” rarely matches perfectly with the sequence encoded by structure. The truncation is generally at the termini ends.

SEVERITY

Page 15: Judgment day. Topic 6 Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”

Flipping: Problems with Gln/Asn/His

Acta Cryst. (2010). D66, 12-21

Page 16: Judgment day. Topic 6 Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”

It should be independent of experimental data

Many criteria that are based on straightforward chemical ideals and physics can be used to validate protein structure quality.

For example, Ramachandran plots, side-chain torsion angles, and contacts are widely used.

Other order parameters that can also be used: H-bonding, chirality, bond angles and distances etc.

Physics-based energy values, calculated using energy potentials.

There are programs available for assessment of protein structure quality:

ProCheck (stereochemistry, Ramachandran plots); ProsaII (energy check); MolProbity (bumps and contacts); WhatIF (all of the above)

There is no one correct way to measure quality!

The What of Validation/Assessment

Page 17: Judgment day. Topic 6 Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”

Empirical vs. first principles

In both cases, we establish what are the structural parameters of importance (i.e., bond lengths and steric clashes, phi/psi angles, etc.).

In empirical methods, we use observed values to establish normal ranges and look for exceptions (which are considered poor quality).

In first principles methods, we start from the fundamental physics and write out an energy function to quantify the energy of the structure.

Page 18: Judgment day. Topic 6 Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”

Geometry and Stereochemistry: Ramachandran plots

Kleywegt GJ., “Validation of protein crystal structures”, Acta Cryst, 2000, D56, 249-265

retinoic acid binding protein II

Page 19: Judgment day. Topic 6 Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”

More About Ramachandran Plots

Left: Ramachandran plot of a wrong structure

Right: Ramachandran values for D-amino acids will look different from L-amino acids. For example, Gramicidin A (1GRM), a prokaryotic antibiotic compound, is composed of alternating L/D amino acids.

Left: Kleywegt GJ., Acta Cryst, 2000, D56, 249-265

Page 20: Judgment day. Topic 6 Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”

Checks the stereochemical quality of a protein structure

Produces a number of PostScript plots analyzing its overall and residue-by-residue geometry

Geometry and Stereochemistry: PROCHECK

Page 21: Judgment day. Topic 6 Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”

Geometry and Stereochemistry: PROCHECK

http://services.mbi.ucla.edu/SAVES/

Page 22: Judgment day. Topic 6 Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”

Geometry and Stereochemistry: PROCHECK

G-factors mapped to structure, in this case, red = unusual phi/psi angles

Page 23: Judgment day. Topic 6 Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”

Davis, IW et al.

http://m

olprobity.biochem.duke.edu/index.php

Page 24: Judgment day. Topic 6 Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”
Page 25: Judgment day. Topic 6 Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”

Energy Plot: ProSA Analysis

From the ProSA webserver site:

ProSA-web provides an easy-to-use interface to the program ProSA (Sippl 1993), which is frequently employed in protein structure validation.

ProSA calculates an overall quality score for a specific input structure.

If this score is outside a range characteristic for native proteins the structure probably contains errors.

A plot of local quality scores points to problematic parts of the model which are also highlighted in a 3D molecule viewer to facilitate their detection.

ProSA is based on a potential of mean force (aka, knowledge-based potential) that uses observed residue-residue pairwise distances to establish energy values.

Page 26: Judgment day. Topic 6 Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”

Radial Distribution Fxn (aka Pair Correlation Fxn)

Page 27: Judgment day. Topic 6 Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”

Radial Distribution Fxn (aka Pair Correlation Fxn)

Cys-SG:CB-Ala

Cys-SG:SG-Cys

Page 28: Judgment day. Topic 6 Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”

Energy Plot: ProSA Analysis

From the ProSA webserver site:

The z-score indicates overall model quality.

Its value is displayed in a plot that contains the z-scores of all experimentally determined protein chains in current PDB.

In this plot, groups of structures from different sources (X-ray, NMR) are distinguished by different colors.

It can be used to check whether the z-score of the input structure is within the range of scores typically found for native proteins of similar size.

Z = -5.65

Page 29: Judgment day. Topic 6 Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”

What is a z-score (aka, standard score)?

Page 30: Judgment day. Topic 6 Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”

1JSQA (retracted) 2HYDA

Energy Plot: ProSA Analysis of ABC transporter

Page 31: Judgment day. Topic 6 Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”

http://sw

ift.cm

bi.ru.nl/servers/html/index.htm

l

Structure Validation Menu:Name check: Checks the nomenclature of torsion angles.Coarse Packing Quality: Checks the normality of the local environment of amino acidsAnomalous bond lengths: Lists bond lengths that deviate more than 4 sigma from normal.Planarity: Checks if planar groups are planar enough.Fine Packing Quality Control: Checks the normality of the local environment of amino acidsCollisions with symmetry axes: Lists atoms that are too close to symmetry axes.Hand check: Lists atoms with a chirality that deviates more than 4 sigma from normal.Ramachandran plot evaluation: Determines the quality of a Ramachandran plot.Omega: Checks if the distribution of omega angles is normal.Proline puckering: Checks if proline pucker falls in a normal range.Anomalous bond angles: Lists bond angles that deviate more than 4 sigma from normal.Checking water & ion: Lists ions that might be waters (and vice versa), or other ions.

Anomalous bond angles:

z-score

Page 32: Judgment day. Topic 6 Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”

Empirical energy potentials (force fields)

Page 33: Judgment day. Topic 6 Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”

Theoretical basis of molecular mechanical force fields

The validity of molecular mechanics is based on two key assumptions:

(1) The Born-Oppenheimer approximation – enables the electronic and nuclear energy to be separated: the much smaller mass of the electrons means that they can rapidly adjust to any change in nuclear positions. Consequently, the energy of the molecule (in its ground state!) can be considered a function of the nuclear coordinates only.

(2) Transferability – enables a set of parameters developed and tested on a relatively small dataset to be applied to a much wider range of chemical problems.

Page 34: Judgment day. Topic 6 Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”

Molecular mechanics

Molecular Mechanics (MM) is a computational technique used to model the conformational behavior and energetic properties of molecules.

The molecule is treated at the atomic level, i.e. the electrons are not treated explicitly. MM uses an Energy Function, defined so that given a particular conformation, (i.e. given a set of spatial coordinates forall the atoms) the energy of the molecule can be calculated. Most MM models cannot describe dissociation of covalent bonds. The energy function is empirical, i.e. it is not entirely derived from rigorous theories. Usually, a combination of quantum mechanical calculations and experimental data are used to construct the energy function.

Page 35: Judgment day. Topic 6 Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”

A simple force field

Many of the MM force fields in use today can be interpolated in terms of a relatively simple four-component picture of the intra- and inter- molecular forces within the system.

Energetic penalties are associated with the deviation of bond lengths (aka, central forces) and angles away from their “reference” values, there is a function that describes how the energy changes as bonds (torsions) are rotated, and finally the force field contains terms that describe interaction between non-bonded parts of the system.

Page 36: Judgment day. Topic 6 Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”

More sophisticated force fields

More sophisticated force fields may have additional terms (such as polarizability, improper torsions, etc.), but invariably contain these four components.

An attractive feature of this representation is that the various terms can be ascribed to changes in specific internal coordinates (i.e., bond lengths, angles, torsion angles, or movements of atoms relative to each other).

Polarizability Improper Torsion

Page 37: Judgment day. Topic 6 Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”

Dissecting the force field

Page 38: Judgment day. Topic 6 Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”

Force-Potential Relationship:

Dissecting the force field

Coulomb’sLaw:

Meaning:

Page 39: Judgment day. Topic 6 Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”

Dissecting the force field

Coulomb’sLaw:

Force-Potential Relationship:

Page 40: Judgment day. Topic 6 Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”

Notes

Hooke’s law, U = 1/2·k·x2

Hooke’s law, U = 1/2·k·x2

We will ignore improper torsions

Sinusoidal potential. Note the three minima, which depending on the local chemistry, may or may not be equally deep.

Positive (destabilizing) values when ++ or --.

Morse curve.

Page 41: Judgment day. Topic 6 Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”

Bond stretching

Inreality, the bond stretching potential would be best approximated by the Morse potential, yet is some cases a Harmonic potential (Hooke’s law) is used.

Pote

ntial

ene

rgy

Page 42: Judgment day. Topic 6 Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”

Bond length and energy deviationsfrom equilibrium values

• Vb = 0.5 · Kb(r-req)2

• Kb = 500-1200 kcal/mol/Å2

• Bond length changes of 0.05 Å implies 1.5 kcal/mol.

Page 43: Judgment day. Topic 6 Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”

Angle bending

The deviation of bond angles is modeled with the Harmonic potential (Hook’s law).

The contribution of each angle is characterized by a force constant and a reference value. Meaning, less energy is required to perturb the equilibrium angle a small bit.

Additionally, the force constant here is much less than that used in the bond stretching potentials. Meaning, bond angles deviate more frequently than bond lengths.

Higher order terms can be included here as well to model more pathologicalsystems, but they generally are not employed.

Page 44: Judgment day. Topic 6 Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”

Bond angle and energy deviationsfrom equilibrium values

• Vb = 0.5 · Ka(- eq)2

• Kb = 80 kcal/mol/radian2

Page 45: Judgment day. Topic 6 Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”

Torsional terms

The bond stretching and angle bending terms are often referred to as the hard degrees of freedom, meaning that substantial energies are required to cause significant deformations.

Most of the variation in chemical structure and relative energies is due to the complex interplay between the torsional and non-bonded terms.

The existence of barriers to rotation about chemical bonds is fundamental to our understanding the structural properties of molecules and conformational analysis.

The three minimum energy staggered conformations (1 anti and 2 gauche) and three maximum energy eclipsed conformations of ethane are a classic example of this.

Page 46: Judgment day. Topic 6 Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”

Torsional terms

Page 47: Judgment day. Topic 6 Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”

Torsional terms

Torsion angle potentials are almost always expressed as a cosine expansion.

Vn is often referred to as the barrier height, however to do so is misleading. The barrier is directly proportional to the sum of V’s when more than one term is present in the expansion. Moreover, other terms contribute to the barrier height as a bond is rotated, especially the non-bonded interactions between atoms 1 & 4. Having said this, the term does give a qualitative indication of the relative barriers to rotation.

Page 48: Judgment day. Topic 6 Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”

Torsional terms

1

2

3

4

0

-1600 180 300120 240 360

Pote

ntial

Ene

rgy

(KJ/

mol

)

Torsion angle

Note: 1 kcal = 4.184 kJ

Page 49: Judgment day. Topic 6 Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”

Attractive non-bonded potentials

Attractive London dispersion (VDW) forces

• Induced dipole

• Varies as 1/r6

• Can be computed “exactly”

• Aij depends STRONGLY on chemistry

Page 50: Judgment day. Topic 6 Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”

Repulsive non-bonded potentials

Repulsive forces (two particles occupying the same space)

• Exponential (Morse) or power law

• V minimum at RVDW determines B from A

• A can be set from depth of well

• Parameters thus determined from depth and position of minimum alone.

where is the depth of the potential well and is the (finite) distance at which the interparticle potential is zero and r is the distance between the particles.

Attractive termRepulsive term

Page 51: Judgment day. Topic 6 Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”

In practice, a truncated potential is used to increase compute efficiency

To reduce compute time, the LJ potential is often truncated at the cut-off distance of rc = 2.5, because VVDW = 0!!!

Page 52: Judgment day. Topic 6 Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”

Electrostatic interactions

• Partial charges are known to exist.

• In fact, peptide has a dipole moment of 3.7 D.

• Terms are small, but there are LOTS of them.

• Dielectric “constant” is a major problem.

• Constant at short range

• = r at longer distances

Page 53: Judgment day. Topic 6 Chapter 14 & 15, Du and Bourne “Structural Bioinformatics”

An aside: Electrostatic interactions

Note that the electrostatic interactions don’t die off abruptly since they are linear with separation distance.

Nevertheless, because the non-bonded terms are the most compute intensive (there are N·(N-1)/2 atom pairs!), cut-off values may be frequently employed to speed up computation time. (This is especially critical when coupled to a minimization algorithm or dynamics simulations)

However, doing so cause the long-range (weaker) electrostatic interactions to be ignored, which is a cause of significant model error.

As such, reaction field methods, Ewald summation, particle mesh Ewald, etc. are used to account for the long-range effects.