solving protein structures,molecular mechanics, and docking lecture 18 introduction to...

59
Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006

Upload: ellen-gaines

Post on 14-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006

Solving protein structures,molecular mechanics,

and docking

Lecture 18

Introduction to Bioinformatics2006

Page 2: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006

Thursday May 4th

NO LECTURE

But …

13:30 – 15:15 hrs in S329 and S345:

PRACTICAL HOMOLOGY SEARCHING

Page 3: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006

Today’s lecture

1. Experimental techniques for determining protein tertiary structure

2. Molecular motion simulated by molecular mechanics

3. Protein interaction and dockingi. Ribosome example

ii. Zdock method

Page 4: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006

If you throw up a stone, it is Physics.

Page 5: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006

If you throw up a stone, it is Physics. If it lands on your head, it is Biophysics.

Page 6: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006

If you throw up a stone, it is Physics. If it lands on your head, it is Biophysics.

If you write a computer program, it is Informatics.

Page 7: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006

If you throw up a stone, it is Physics. If it lands on your head, it is Biophysics.

If you write a computer program, it is Informatics. If there is a bug in it, it is Bioinformatics

Page 8: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006

Experimentally solving protein structures

Two basic techniques:

1. X-ray crystallography

2. Nuclear Magnetic Resonance (NMR) tchniques

Page 9: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006

1. X-ray crystallography

Purified protein

Crystal

X-ray Diffraction

Electron density

3D structureBiological interpretation

Crystallization

Phase problem

Page 10: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006

Protein crystals• Regular arrays of protein molecules

• ‘Wet’: 20-80% solvent• Few crystal contacts

• Protein crystals contain active protein• Enzyme turnover• Ligand binding

Example of crystal packing

Page 11: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006

Examples of crystal packing

2 Glycoprotein I~90% solvent (extremely high!)

Acetylcholinesterase~68% solvent

Page 12: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006

Problematic proteins (no crystallisation)

• Multiple domains

• Similarly, floppy ends may hamper crystallization: change construct

• Membrane proteins

• Glycoproteins

Flexible

Lipid bilayer

hydrophilic

hydrophilic

hydrophobic

Flexible and heterogeneous!!

Page 13: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006

Experimental set-up• Options for wavelength:

– monochromatic, polychromatic – variable wavelength

Liq.N2 gas stream

X-ray source

detector

goniometer

beam stop

Page 14: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006

Diffraction imageDiffraction image

Water ring

Diffuse scattering (from the fibre loop)

reciprocal lattice reciprocal lattice (this case hexagonal)(this case hexagonal)

Beam stop

Increasing resolution

Direct beam

ReflectionsReflections ( (h,k,lh,k,l) ) withwith I( I(h,k,lh,k,l))

Page 15: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006

The rules for diffraction: Bragg’s law

• Scattered X-rays reinforce each other only when Bragg’s law holds:

Bragg’s law: 2dhkl sin = n

Page 16: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006

Phase Problem

• Determining the structure of a molecule in a crystalline sample requires knowing both the amplitude and the phase of the photon wave being diffracted from the sample

• X-rays which are emitted start out with dispersed phases, and so the phases get lost

• Unfortunately, phases contribute more to the informational content of a X-ray diffraction pattern than do amplitudes. It is common to refer to phaseless X-ray data as having "lost phases“

• Luckily, several ways to recover the lost phases have been developed

Page 17: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006

Building a protein model• Find structural elements:

-helices, -strands• Fit amino-acid sequence

Page 18: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006

Building a protein model• Find structural elements:

-helices, -strands• Fit amino-acid sequence

Page 19: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006

Effects of resolution on electron density

Note: map calculated with perfect phases

d = 4 Å

Page 20: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006

d = 3 Å

Effects of resolution on electron density

Note: map calculated with perfect phases

Page 21: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006

d = 2 Å

Effects of resolution on electron density

Note: map calculated with perfect phases

Page 22: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006

d = 1 Å

Effects of resolution on electron density

Note: map calculated with perfect phases

Page 23: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006

Refinement process

• Bad phases poor electron density map

errors in the protein model

• Interpretation of the electron density map improved model

improved phases improved map

even better model

… iterative process of refinement

Page 24: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006

Validation

• Free R-factor (cross validation)– Number of parameters/

observations• Ramachandran plot • Chemically likely (WhatCheck)

– Hydrophobic inside, hydrophilic outside

– Binding sites of ligands, metals, ions

– Hydrogen-bonds satisfied– Chemistry in order

• Final B-factor (temperature) values

Page 25: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006

2. Nuclear Magnetic Resonance (NMR)

800 MHz NMR spectrometer

Page 26: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006

Nuclear Magnetic Resonance (NMR)

• Pioneered by Richard R. Ernst, who won a Nobel Prize in chemistry in 1991, FT-NMR works by irradiating the sample, held in a static external magnetic field, with a short square pulse of radio-frequency energy containing all the frequencies in a given range of interest.

• The polarized magnets of the nuclei begin to spin together, creating a radio frequency (RF) that is observable. Because the signals decays over time, this time-dependent pattern can be converted into a frequency-dependent pattern of nuclear resonances using a mathematical function known as a Fourier transformation, revealing the nuclear magnetic resonance spectrum.

• The use of pulses of different shapes, frequencies and durations in specifically-designed patterns or pulse sequences allows the spectroscopist to extract many different types of information about the molecule.

Page 27: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006

Nuclear Magnetic Resonance (NMR)• Time intervals between pulses allow—among other things—magnetization

transfer between nuclei and, therefore, the detection of the kinds of nuclear-nuclear interactions that allowed for the magnetization transfer.

• Interactions that can be detected are usually classified into two kinds. There are through-bond interactions and through-space interactions. The latter usually being a consequence of the so-called nuclear Overhauser effect (NOE). Experiments of the nuclear-Overhauser variety may establish distances between atoms.

• These distances are subjected to a technique called Distance Geometry which normally results in an ensemble of possible structures that are all relatively consistent with the observed distance restraints (NOEs).

• Richard Ernst and Kurt Wüthrich —in addition to many others— developed 2-dimensional and multidimensional FT-NMR into a powerful technique for the determination of the structure of biopolymers such as proteins or even small nucleic acids.

• This is used in protein nuclear magnetic resonance spectroscopy. Wüthrich shared the 2002 Nobel Prize in Chemistry for this work.

Page 28: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006

Gly

Gly

AspAsn

Asp

Phe

ThrSer

Leu

Val

2D NOESY spectrum

• Peptide sequence (N-terminal NH not observed)• Arg-Gly-Asp-Val-Asn-Ser-Leu-Phe-Asp-Thr-Gly

Page 29: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006

NMR structure determination: hen lysozyme

• 129 residues– ~1000 heavy atoms– ~800 protons

• NMR data set– 1632 distance restraints– 110 torsion restraints– 60 H-bond restraints

• 80 structures calculated• 30 low energy

structures used 0

2000

4000

6000

8000

1 10 4

1.2 10 4

10 20 30 40 50 60 70

Tot

al e

nerg

y

Structure number

Page 30: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006

Solution Structure Ensemble

• Disorder in NMR ensemble– lack of data ?– or protein dynamics ?

Page 31: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006

Problems with NMR

• Protein concentration in sample needs to be high (multimilligram samples)

• Restricted to smaller sized proteins (although magnets get stronger)

• Uncertainties in NOEs introduced by internal motions in molecules (preceding slide)

Page 32: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006

Molecular motions

Proteins are very dynamic systems

• Protein folding

• Protein structure

• Protein function (e.g. opening and closing of oxygen binding site in hemoglobin)

Page 33: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006

X-ray and NMRsummary

• Are experimental techniques to solve protein structures (although they both need a lot of computation)

• Nowadays typically contain many refinement and energy-minimisation steps to optimise the structure (next topic)

Page 34: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006

Protein motion

• Principles

• Simulation– MD– MC

Page 35: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006
Page 36: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006

The Ramachandran plotAllowed phi-psi angles

Red areas are preferred, yellow areas are allowed, and white is avoided

Page 37: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006
Page 38: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006
Page 39: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006

Molecular mechanics techniques

Two basic techniques:

• Molecular Dynamics (MD) simulations

• Monte Carlo (MC) techniques

Page 40: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006
Page 41: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006

Molecular Dynamics (MD) simulation

• MD simulation can be used to study protein motions. It is often used to refine experimentally determined protein structures.

• It is generally not used to predict structure from sequence or to model the protein folding pathway. MD simulation can fold extended sequences to `global' potential energy minima for very small systems (peptides of length ten, or so, in vacuum), but it is most commonly used to simulate the dynamics of known structures.

• Principle: an initial velocity is assigned to each atom, and Newton's laws are applied at the atomic level to propagate the system's motion through

• MD simulation incorporates a notion of time

Page 42: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006
Page 43: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006

q = coordinatesp = momentum

K = kinetic energyV = potential energy

Page 44: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006

Molecular DynamicsKnowledge of the atomic forces and masses can be used to solve the position of each atom along a series of extremely small time steps (on the order of femtoseconds = 10-15 seconds). The resulting series of snapshots of structural changes over time is called a trajectory. The use of this method to compute trajectories can be more easily seen when Newton's equation is expressed in the following form:

The "leapfrog" method is a common numerical approach to calculating trajectories based on Newton's equation. This method gets its name from the way in which positions (r) and velocities (v) are calculated in an alternating sequence, `leaping' past each other in time The steps can be summarized as follows:

v = dri/dt

a = d2ri/d2t

Page 45: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006
Page 46: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006

Force fieldThe potential energy of a system can be expressed as a sum of valence (or bond), crossterm, and nonbond interactions:

The energy of valence interactions comprises bond stretching (Ebond), valence angle bending (Eangle), dihedral angle torsion (Etorsion), and inversion (also called out-of-plane interactions) (Einversion or Eoop) terms, which are part of nearly all force fields for covalent systems. A Urey-Bradley term (EUB) may be used to account for interactions between atom pairs involved in 1-3 configurations (i.e., atoms bound to a common atom):

Evalence = Ebond + Eangle + Etorsion + Eoop + EUB

Modern (second-generation) forcefields include cross terms to account for such factors as bond or angle distortions caused by nearby atoms. Crossterms can include the following terms: stretch-stretch, stretch-bend-stretch, bend-bend, torsion-stretch, torsion-bend-bend, bend-torsion-bend, stretch-torsion-stretch.

The energy of interactions between nonbonded atoms is accounted for by van der Waals (EvdW), electrostatic (ECoulomb), and (in some older forcefields) hydrogen bond (Ehbond) terms: Enonbond = EvdW + ECoulomb + Ehbond

Page 47: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006

Force field

Page 48: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006

f = a/r12 - b/r6 Van der Waals forcesdistance

ener

gy

The Lennard-Jones potential is mildly attractive as two uncharged molecules or atoms approach one another from a distance, but strongly repulsive when they approach too close. The resulting potential is shown (in pink). At equilibrium, the pair of atoms or molecules tend to go toward a separation corresponding to the minimum of the Lennard--Jones potential (a separation of 0.38 nanometers for the case shown in the Figure)

Page 49: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006

Thermal bath

Page 50: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006

Figure: Snapshots of ubiquitin pulling with constant velocity at three different time steps.

Page 51: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006

Monte Carlo (MC) simulation• "Monte Carlo Simulation" is a term for a general class of optimization

methods that use randomization.

• The general idea is, given the current configuration and some figure of merit, e.g., the energy of the folded configuration, to generate a new configuration at random (or semi-random): If the energy of the new configuration is smaller than the old

configuration, always accept it as the next configuration; if it is worse than the current configuration, accept or reject it it

with some probability dependent on how much larger the new energy is than the old energy.

E = E(new)-E(old)

If E<0 then accept

else if random[0, 1] < e-E /kT then accept

else reject

Boltzmann -- probability of conformation c: P(c) = e-E(c)/kT

Page 52: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006

Monte Carlo (MC) simulation• The idea is that by always accepting a better configuration, on the

average the system will tend to move toward a (local) energy minimum, while conversely, by sometimes accepting worse configurations, the system will be able to "climb" out of a sub-optimal local minima, and perhaps fall into the basin of attraction of the global minimum.

• The specific algorithms for probabilistically generating and accepting new configurations define the type of "Monte Carlo" algorithm; some common methods are "Metropolis," "Gibbs Sampler," "Heat Bath," "Simulated Annealing," "Great Deluge," etc.

• MC techniques are computationally more efficient than MD

• MC simulations do not incorporate a notion of time!

E

Configuration space (models)

Local minimum Global

minimum

Page 53: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006
Page 54: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006
Page 55: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006

#! /usr/bin/perl #=============================================================================== # # $Id: mcdemo.pl,v 1.1.1.1 2003/03/12 16:13:28 jkleinj Exp $ # # mcdemo: Demo program for MC simulation of the number pi # # (C) 2003 Jens Kleinjung # # Dr Jens Kleinjung, Room P440 | [email protected] # Bioinformatics Unit, Faculty of Sciences | Tel +31-20-444-7783 # Free University Amsterdam | Fax +31-20-444-7653 # De Boelelaan 1081A, 1081 HV Amsterdam | http://www.cs.vu.nl/~jkleinj # #=============================================================================== # preset parameters $hits = 1; $miss = 1;

for ($i=0; $i<100000; $i++) {

# assign random x,y coordinates $x = rand; $y = rand;

# calculate radius $r = sqrt(($x*$x)+($y*$y));

# sum up hits and misses if ($r <= 1) { $hits++; } else { $miss++; }

# calculate pi $pi = (4*$hits)/($hits +$miss);

# print pi if ($i%100 == 0) { print("$i $pi\n"); }

}

#===============================================================================

Page 56: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006
Page 57: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006

In many conformational search methods based on Monte Carlo (MC), after a MC move, the system is energy minimised, i.e. put in the lowest local energy conformation, for example by gradient descent (steepest descent).

Page 58: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006
Page 59: Solving protein structures,molecular mechanics, and docking Lecture 18 Introduction to Bioinformatics 2006

Take home messages• Experimentally determining protein structures

– X-ray diffraction• From crystallised protein sample to electron density map

– Structure descriptors: resolution, R-factor

– Nuclear magnetic resonance (NMR)• Based on atomic nuclear spin • Produces set of distances between residues (distance restraints)• Distances are used to build protein model using Distance Geometry

• Protein dynamics simulation– Molecular dynamics

• Follows Newton’s equations of motion• Simulates molecular movements through time• Very small time steps (2 femtoseconds)

• Protein conformational search– Monte Carlo

• Conformations are randomly changed• Uses Mitropolis criterion to decide between conformation i and i+1 based on conformational internal

energy and the Boltzmann equation• Has no notion of time, is a conformational search protocol