determination of macromolecular structure and...

78
Determination of macromolecular structure and dynamics from experimental data: NMR structure determination as an example Michael Nilges Unité de Bio-Informatique Structurale Institut Pasteur [email protected]

Upload: lamkhue

Post on 05-Jun-2018

236 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

 Determination of macromolecular structure and dynamics from experimental data:

NMR structure determination as an example

Michael Nilges Unité de Bio-Informatique Structurale

Institut Pasteur [email protected]

Page 2: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

���2

Overview

1. Introduction: relating data to structure

2. Hybrid energy and treatment of errors

3. Minimisation of hybrid energy

4. Relation to probability theory

5. Sampling probability densities

Page 3: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

���3

(1)

1. Relating data to structure and the hybrid energy concept

2. Hybrid energy and treatment of errors

3. Minimisation of a hybrid energy

4. Relation to probability theory

5. Sampling probability densities

Page 4: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

Biomolecular structure determination

• Biomolecular structure determination includes a phase of molecular modeling, • fit a model to the data

• This is (one of) its most important applications • CNS (1998) citations 13806

• CHARMm (1983) citations : 8853

���4

Page 5: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

Why is fitting data difficult

• In particular for biological macromolecules, data are • incomplete

• noisy

• contradict each other

• contradict prior knowledge

• Theoretical (or forward) models are • incomplete (parametric, with non-measurable parameters)

• very approximative

���5

Page 6: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

Data are incomplete

• For biomolecules, number of parameters (coordinates) usually exceeds number of observables

• Number of degrees of freedom: 3N • Number of observables

• X-ray: number of reflections depends on resolution

• NMR: number of NOEs etc: < 20/ aa

• Need to complement data with prior information • geometric: bond lengths, bond angles, planarity, vdW radii

• force fields

���6

Page 7: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

• The measurement of a quantity is not exactly reproducible • Measurements follow a certain distribution • Example:

• Gaussian (normal) distribution of error for x around mean µ,

• standard deviation σ,

• ⇒ probability is

Data are noisy���7

Page 8: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

���8

Page 9: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

Data may contradict each other

• Example: NOEs of same side-chain to different positions • effect of dynamics

• cannot be satisfied in one single structure

���9

Page 10: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

Data and forward models: the NOE

• Inter-proton distances can be derived from NOESY experiments • Ideally, we can measure the

“cross-relaxation rate”

• which depends on spectral densities

• which depend on correlation functions (radial and angular fluctuations)

• internal (local) dynamics can be separated from overall tumbling, simplify to two exponentials

���10

Page 11: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

Data and forward models: the NOE

• Inter-proton distances can be derived from NOESY experiments • Ideally, we can measure the

“cross-relaxation rate”

• which depends on spectral densities

• which depend on correlation functions (radial and angular fluctuations)

• internal (local) dynamics can be separated from overall tumbling, simplify to two exponentials

���11

Page 12: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

NOE and dynamics

• If distance is rigid and known, NOE serves to measure local angular fluctuations • generalised order parameter S2 characterises local fast dynamics

• S2 is 0 when completely flexible

• S2 is 1 when completely ordered

���12

σij ∝ r−6ij f(τc)

Page 13: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

Forward model

• Obtain calculated crossrelaxation rate from structure / long dynamics trajectory

• Can we invert this ? • how to impose crossrelaxation rate on trajectory ?

• how are crossrelaxation rate and NOE related ? • “spin diffusion”

• Standard solution: • neglect dynamics (and spin diffusion) and treat as “noise”

• try to obtain single structure from data

���13

Page 14: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

• Example: distance measurement from NOEs, only protons • Forward model:

• isolated spin pair approximation • NOE depends on distance (< 4 Å)

!!

!

!• approximate model neglects

– internal dynamics – spin diffusion

• calibration factor Ccal unknown (not measurable) • end result: approximate distances

���14

Forward models: approximate and incomplete

NOEij ∝ rij(x)−6

r0

ij ≈ (CcalNOEij)−

1

6

σij ∝ r−6ij f(τc)

Page 15: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

Data may contradict prior information

• Consequence of forward model: • “one rigid structure” should satisfies data

• other approximations in forward model

• incorrect parameter choice

• Consequence of data: • noise

• “false positives”

• Structure calculation is always a compromise between satisfying data and prior information

���15

Page 16: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

���16

• combine data and physical model of molecule force field into one function (target function, hybrid energy function) • X-ray: Jack & Levitt, 1978

!!• guess wdata : “something of a problem” • minimise this function

!• hybrid energy function complements incomplete data

• and prevents that structure deviates too much from expectation

Hybrid Energy

Ehybrid = Ephys + wdataEdata

Page 17: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

1. Relating data to structure and the hybrid energy concept

2. Hybrid energy and treatment of errors

3. Minimisation of a hybrid energy

4. Relation to probability theory

5. Sampling probability densities

���17

Page 18: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

Some history

• combination with MD (for NMR refinement): • van Gunsteren, Kaptein 1985

• combination with MD (for folding): • Brunger, Clore 1986

• combination with MD (for X-ray refinement) • Brunger 1988

���18

Page 19: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

How to construct Ehybrid

• Ephys: • derive from standard force field

• modifications may be necessary

• more rigid

• simplify non-bonded

• derive from statistical analysis • derive from covalent parameters in small molecules

• mean values

• force constants from variation

���19

Page 20: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

���20

NMR structure calculation: simplified force field

• covalent interactions: rigid, uniform force constants

• ideal values from Engh & Huber !

• vdW interaction: quartic potential, soft core, no attractive part

!• no electrostatics

Page 21: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

How to construct Ehybrid

• Edata: • derive from distribution of data

• e.g., for Gaussian distribution of distances from NOEs

!!!

• the potential function would be

!!

���21

P (D|X,σ) ∝ exp

!

−(r − r(X))2

2σ2

"

ENOE ∝

NNOE!

i=1

"

ri(x) − r0

i

#2

Page 22: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

���22

• All data contain errors (experimental noise) • All forward models contain approximations • No ideal agreement between calculated and measured data

possible • Need to adapt Ehybrid

• use appropriate weight

• use appropriate functional form

Treat data and model imperfections in Ehybrid

Page 23: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

���23

Example: Flat-bottom-harmonic-wall (cheat)

• loose upper and lower bounds • FBWH potential

• flat bottom harmonic walls • no force between L and U • not optimal

Edata ∝

Nnoe∑

i

(ri(X) − Li)2 if r(X) < Li

0 if Li ≤ r(X) ≤ Ui

(ri(X) − Ui)2 if r(X) > Ui

Page 24: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

• Advantages: • simple concept of geometrical consistency (“Distance Geometry”)

• weight is not important

• fast

• Disadvantages • not derived from error distribution

• human bias: where to put the upper bound ?

• false sense of “security”

• loss of information

• FBHW potentials are a bad good idea

���24

Page 25: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Adapt weight

• Empirical methods • “experience”

• adjust average gradients (Jack & Levitt): • Ephys and Edata have equal importance

• cross-validation (Brunger) • divide data into “test set” / “working set”

• use only working set for calculation

• use only test set for evaluation

• look for minimum or elbow region

• Bayesian methods

���25

Page 26: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

���26

Overview

1. Introduction: relating data to structure

2. Hybrid energy and treatment of errors

3. Minimisation of hybrid energy

4. Relation to probability theory

5. Sampling probability densities

Page 27: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

���27

• Energy minimisation • Simulated annealing

• Molecular dynamics

• Torsion angle dynamics

• (Monte Carlo)

• Distance geometry • Genetic algorithm • ...

Minimisation algorithms for NMR

Page 28: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

���28

Multiple minimum problem

High energy barriers to fold protein !Standard minimisation only "downhill"

Page 29: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

���29

Minimisation by molecular dynamics

d2r r idt2

= −c

mi

∂∂r r i

Ehybrid

• Molecular dynamics solves Newton's equations of motion • Molecular dynamics can overcome local energy barriers

Page 30: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

���30

Newton dynamics

• MD is a minimiser with memory: • direction of motion depends on • force (derived from force field and experimental restraints) • momentum

Page 31: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

���31

Temperature control and variation: "MD-simulated annealing"

Page 32: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

���32

Energy scaling

• more flexible annealing schemes

• different variation of different energy terms

!

• equivalence: • mass/ energy/ temperature

scaling

d2r r idt2

= −c

mi

∂∂r r i

Ehybrid

Page 33: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

���33

Ephys

Edata

Page 34: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

���34

Page 35: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

���35

Structure calculation with MD

• NMR data: distances

!

• Start: random structure

!

• Difficult search problem: many degrees of freedom

Page 36: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

���36

Structure calculation with MD

100 atoms:

1988:

20000 s per structure on mainframe (DISGEO, Havel)

!

now:

20 s per structure on PC

Page 37: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

���37

Torsion angle dynamics

• dynamics time step dictated by bond stretching: waste of CPU

• important motions are around torsions

• ~ 3 degrees of freedom per AA (cf 3Natom for Newton dynamics)

• Available in X-PLOR, CYANA, CNS, X-PLOR-NIH, ISD

Page 38: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Calculation of structure ensembles

• with identical data/ restraints: • repeat calculation (20-100 times)

• random variation of initial conditions (starting structure/ velocities)

• poor man’s “probability distribution” • obtain information on uniqueness

/ different fold

���38

Page 39: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

���39

Meaning (?) of structure ensembles

• Simple way to assess uniqueness of solution • This has very little to do with dynamics • Distribution depends on

• data • data representation • algorithm • forcefield • algorithm parameters • ...

Page 40: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

So what about dynamics

• all data represent ensemble (and time) averages

• qualitatively, ensembles can show features of real dynamics • e.g., Abseher et al. & Nilges. Proteins.

1998 31:370.

• why ? • NOE distance potential resembles elastic

network

• NOEs can be absent because of dynamic effects

���40

Page 41: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

So what about dynamics

• depends very much on fold

���41

PH domain

dsdb protein

Page 42: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

NMR: rich source dynamic information

• experimental data sensitive to structure and dynamics (time and ensemble averages)

• example NOE: • inconsistencies in the derived distances

• spin relaxation experiments: fast ps dynamics • dipole–dipole interactions between 15N and H

• slower dynamics: • RDCs, J-couplings, chemical shifts, relaxation-dispersion

• dynamics measurement do not give atomic picture of motion • need models (e.g., MD simulations)

���42

Page 43: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

methods to include dynamics in calculation

• ensemble averaged restraints:

• Kim & Prestegard. Biochemistry. 1989 28:8792; Bonvin & Brünger. J Mol Biol. 1995 250:80; J Biomol NMR. 1996 7:72

• with additional data (RDC) in water: Lange et al. & de Groot. Science. 2008 320:1471

• with ad-hoc potential: Clore & Schwieters. Biochemistry. 2004 43:10678; Vögeli et al. & Riek. Nat Struct Mol Biol. 2012 19:1053

• restraint on order parameter: Lindorff-Larsen et al. & Vendruscolo. Nature. 2005 433:128

• time averaged restraints: Torda, Scheek, van Gunsteren. J Mol Biol. 1990 214:223

• unrestrained (accelerated) MD simulation (+ structure selection): Bernadó & Blackledge M. J Am Chem Soc. 2004 126:7760; Markwick et al. & Blackledge M. J Am Chem Soc. 2009 131:16968;

���43

Page 44: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

why is this so difficult

• bad observable to parameter ratio: • NOE data not sufficient to determine structure for ensemble

• statistical relevance (only Bonvin & Brunger, with cross validation)

• is the ensemble / trajectory Boltzmann distributed ?

• how to keep ensemble together ? force field / ad hoc potential • contributions from data ? methods rely heavily rely heavily on quality of MD force fields,

other experimental information; ad hoc potentials

• quantitative data treatment (in particular for NOEs) • structures not determined from NOE data

• exception: Vögeli et al., with “exact” NOEs

• mixture of different time scales • “order parameter” : ps time scale

• RDCs, NOEs: ns - µs time scale

���44

Page 45: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

standard NMR ensembl and EROS ensemble

• EROS ensemble: Lange, O.F. et al, Recognition dynamics up to microseconds revealed from an RDC-derived ubiquitin ensemble in solution Science 320, 1471-1475 (2008)

!45

Page 46: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

���46

1. Introduction: relating data to structure

2. Hybrid energy and treatment of errors

3. Minimisation of hybrid energy

4. Relation to probability theory

5. Sampling probability densities

(4)

Page 47: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

Minimisation and probability

• Where do potential forms come from • Where do all the parameters come from

• bounds

• weights

• any parameter required by theory

���47

Page 48: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

���48

Probability and energy

• force field Ephys ⇔ probability (Boltzmann)

• probability of distortion of molecule • force field: background information I • prior probability

P (X|I) = exp

!

−Ephys(X)

kT

"

Ehybrid = Ephys(X) + wdataEdata(D,X)

Page 49: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

���49

Probability and energy

• similar: Edata ⇔ probability

• probability that data is correct, given structure X: • “likelihood”

Ehybrid = Ephys(X) + wdataEdata(D,X)

Page 50: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

• Inversely, if we know probability distribution, we can derive potential

!

!

!

• For Gaussian error, harmonic potential (”least squares”)

!

!

!

• The weight is related to the error in the data

Likelihood and restraint potential���50

Edata ∝ −log [P (D|X,σ)]

Edata ∝

1

2σ2(r − r(X))2

Page 51: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Distances (NOEs) do not follow Gaussian

Gaussian distribution of logarithms

!

!

!

Gaussian distribution

���51

Rieping, Habeck, Nilges, JACS 2005

Page 52: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

���52

LN(x0, x,σ) ≡ 1√2πσ2x0

exp[− 12σ2

(log[x0] − log[x])2]

Log-normal distribution

• Log-normal distributions

!

• and derived potentials

Page 53: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

• Several interesting properties: • Only one free parameter (weight)

• Shape does not change with exponential (d, d-6)

• “Flattening” of potential for inconsistent distances

���53

Page 54: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

���54

• To calculate joint probability from single probabilities, multiply:

Joint probability from prior and likelihood

P (X|D, I) ∝ P (X|I)P (D|X,σ, I)

Probability of a structure: Posterior Probability

prior distribution

likelihood

Bayes Laplace

Page 55: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

���55

• the hybrid energy function is negative logarithm of joint probability • minimum energy corresponds to maximum probability • data weight “should” depend on data quality • story is incomplete (what about wdata?)

Hybrid energy revisited

P (X|D, I) ∝ P (X|I)P (D|X,σ, I)

Probability of a structure

prior distribution

likelihood

Ehybrid = Ephys(X) + wdataEdata(D,X)

...

Page 56: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Hybrid energy and Bayesian probability���56

3.3 The Inferential Structure Determination Method

In order to rigourously address the problem of obtaining unbiased co–ordinate precision with thefull dependency on all unknowns, we need to abandon the idea of minimising a hybrid energy ormaximising the probability. Rather, we need to evaluate a probability Pi for all possible structuresXi. Generally, a continuum of Pi values is distributed over conformational space. Only if all but onePi vanish, the data are uniquely invertible, and we can obtain exactly one structure that representsthe data. On the other hand, in the case of uniform Pi, the data are completely uninformativewith respect to the structure. In the case of a continuous parametrisation (Cartesian coordinates,dihedral angles) Pi is a density p(X|D, I); the integral

�R dX p(X|D, I) evaluates the probability

that region R of conformational space contains the true structure.For a single — or very few — unknowns (co–ordinates and other parameters), one could calculate

the probability of every conformation for example by a grid search. For the large number ofunknowns typical for the structure determination of a macromolecule this is unfeasible and thespace of possible conformations has to be explored by a suitable sampling algorithm. The recentlydeveloped inferential structure determination method ISD [26] is therefore based on Monte–Carlosampling to explore the probability distribution over conformational space. Monte Carlo samplingis not used as a means to find the maximum of a probability but to evaluate the integrals overparameters that appear in the use of Bayes’s rule, eq. (2) or (11).

Once the model to describe the data (i.e., the likelihood function) has been chosen, the rules ofprobability theory, eq. (2) or (11), uniquely determine the posterior distribution. The appropriatestatistics for modelling distances and NOEs are discussed further below. No additional assumptionsneed to be made.

Nuisance parameters The full power of a full Bayesian treatment of the problem becomesapparent if there are additional unknown, auxiliary parameters. It is basically always necessary tointroduce such auxiliary parameters in order to describe the problem adequately. For example, theparameters A,B, C of the Karplus relationship are, strictly speaking, unknown for the particularprotein that one is investigating. Also, the data quality ⇤ is an unknown parameter, as is thecalibration factor � for NOE volumes.

In Bayesian theory, these additional parameters are called “nuisance parameters”. In ISD, alladditional unknown parameters of the error model and of the theory are estimated along with thestructure. They are treated in the same way as the co–ordinates. To add the unknown ⇤, we simplyreplace X with (X, ⇤) in eq. (2), and the full posterior becomes

p(X, ⇤|D, I) � ⇥(X|I) ⇥(⇤|I) L(D|X, ⇤, I). (11)

Here, we a priori assume independence of X and the nuisance parameters — the prior for thecoordinates does not depend on the values of the ⇤ and vice versa — and we introduce the addi-tional prior ⇥(⇤|I) (Je�reys prior [27]) expressing our ignorance on this parameter. Other nuisance

8

Ehybrid = Ephys + wdataEdata

posterior probability

structure prior

data quality prior

likelihood

?

Page 57: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Bayesian determination of data weight

• Bayesian analysis: • Extended hybrid energy function including data quality

!

!• weight ⇔ overall data quality

• average weight over all structures

• Minimisation does not result in vanishing weight • Update weight iteratively during structure calculation • Meaningful estimate of data quality from chi2

���57

Habeck M, Rieping W, Nilges M (2006). PNAS 103:1756 ; Nilges et al. Structure 2008

2 Procedures and Results

Our new calculation strategy is based on the standard idea of minimization a hybrid energy function.Usually, this energy has the form4,5

Ehybrid(X) = Ephys(X) + wdataEdata(X), (1)

where the force field Ephys compensates the lack of data by imposing physical constraints on thestructure, and Edata(X) is the cost function quantifying the disagreement between a structuralmodel X and the data. The weight wdata controls the contribution of the data relative to the forcefield. In the standard approach, this term needs to be estimated by empirical means.4,6,7

The new method modifies and extends this approach to estimate the most probable weightwdata. To this end, we combine three recent concepts into one rapid and e⇥cient minimisationprotocol:

1. We replace distance bounds by an error–tolerant potential with a single minimum, which wecall the log–harmonic potential. The shape of this potential is derived from the log–normaldistribution, which is a natural choice for distances and NOE–volumes, and models errorsand inconsistencies in the data well.8 This potential has only one free parameter, its weight.

2. We introduce an iterative automatic procedure suggested by Bayesian analysis9 to estimatethe data quality and hence optimise the weight on the experimental data. This removes theone free parameter of the log–harmonic potential.

3. The total energy of each structure is evaluated as the sum of three terms: the physical energyEphys, the restraint energy Edata, and an additional term Esigma depending explicitly on thedata quality, which is introduced by the Bayesian analysis:9

Ehybrid(X) = Ephys(X) + wdataEdata(X) + E�(X), (2)

2.1 The log–normal distribution and the log–harmonic potential

We recently showed that a suitable distribution for NOE intensities and derived distances is givenby the log–normal distribution,8 that is, a normal or Gaussian distribution in the logarithms of thedata:

g(dobs, dcalc(X)) =1⇥

2�⇥2dobs

exp⇤� 1

2⇥2log2

�dobs

dcalc(X)

⇥⌅. (3)

Here, ⇥ is the shape parameter of the distribution, equivalent to the standard deviation for thenormal distribution. In contrast to the normal distribution, this distribution is restricted to positivevalues and is asymmetric around its median dcalc(X). Measurements are incorporated without biasin the sense that the probability of over– or underestimating the true intensity is both 1/2. Oncethe distribution of experimental distance values dobs around the distances dcalc(X) calculated from

3

the molecular structure X is known, the negative logarithm of this distribution represents thecorresponding restraint potential (see Figure 1):

Eidata ⇥ � log[g(dobs, dcalc(X))] (4)

The total energy due to NOE derived distance restraints is then:

Edata =1�

12⇤2

i

log2

⇧di

obs

dicalc

=1�

12⇤2

⌅2(X) (5)

with ⌅2(X) =⌥

i (log(dobs)� log(dcalc(X)))2. � is 1/kBT and defines the energy scale; it is 1if we measure the energy in units of kBT . In contrast to flat–bottom potentials, this potentialhas one well–defined single minimum, and has the opposite behaviour: it is more restrictive forsmall deviations from the experimental distance dobs, but more tolerant to large violations (theasymptotic value of the slope is zero).

The potential has the interesting property that inconsistent distance restraints to the sameproton result in wider, softer potential wells and can result in multiple minima. This is illustratedin Figure 1c, for the situation that the distance to a central proton from two exterior protons isrestrained to two A, and that the distance between the two exterior protons is 4, 6, 8, 10, and 12A. This type of inconsistency could occur if the central proton is on a mobile side–chain oscillatingbetween two positions that are each close to one of the two exterior protons.

2.2 Minimizing the joint energy and automated determination of the weight

We recently showed that the long–standing problem of optimally weighting experimental data canbe successfully solved by Bayesian analysis.9 The hybrid energy function (Eqn. 1), depending onthe coordinates X only, is to be extended by a function that contains an additional term dependingon the data quality ⇤:

Ejoint(X, ⇤) = Ephys(X) +1�

12⇤2

log2�

dobs

dcalc(X)

⇥+

1�

log⇤Z(⇤)⇥(⇤)

⌅, (6)

where we used the log-harmonic potential for Edata. The term log[Z(⇤)/⇥(⇤)] is not included inthe standard target function Ehybrid, eq. (1) but is the result of the Bayesian analysis: Z(⇤) is anormalisation constant, ⇥(⇤) is required by Bayes’s theorem to include prior knowledge about ⇤.9

For the log–normal distribution, further analysis resulted for the average weight in the estimate

⇤wdata⌅ =1�

n

2⌅2(X). (7)

How does one obtain the minimum of Ejoint(X, ⇤)? Within ISD,10 the weight is estimated alongwith all the other unknown parameters, by sampling over di�erent values. Within the contextof minimisation of a hybrid energy, we propose to use an iterative scheme: during the structurecalculation, we iteratively update the current weight using equation 7.

4

Page 58: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

comparison to X-ray for LogNormal���58

BPTI, 0.62 Å

IL8, 1.44 Å IL4, 1.14 Å

GB1, 0.5 Å

Page 59: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

PCA analysis for IL4���59

-1.0 1.0

-0.7

0.5

EV1

EV2

FBHW

LogNormal

X-ray

Page 60: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

���60

(5)

1. Introduction: relating data to structure

2. Hybrid energy and treatment of errors

3. Minimisation of hybrid energy

4. Relation to probability theory

5. Sampling probability densities

Page 61: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

Problems inherent in minimisation

• data incomplete: solution is degenerate • data inconsistent: no solution exists • many unknown parameters (“nuisance parameters”) • sparse data:

• problems with determining auxiliary parameters

• structure calculation difficult

• no objective figures of merit for structures • RMSDs and R-factors depend on all auxiliary parameters

• few restraints can change result drastically

• no concept to evaluate data quality (”don’t overfit”... “use data not used in structure calculation”...)

���61

Page 62: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

More rigorous modelling: probabilistic view���62

http://www.zeably.com/Bayesian_statistics

Page 63: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

Probabilistic view of structure determination

• Evaluate probability for “all” conformations X • and all other parameters necessary for description of problem • prior information: energy of a conformation

• use “weak prior”: covalent geometry / elastic network and “soft spheres”

• likelihood / “satisfaction of restraints” • difference between exp. data and predicted data (forward model) • contains unmeasurable quantities (e.g., data quality σ, theory parameters ξ)

• Bayes’ theorem:

���63

Posterior probabilities prior probabilities likelihood

Uelec

= UCoulomb

+ UGB

(1)

UCoulomb

=

X

i,j

qiqjrij

(2)

UGB

=

✓�1 +

1

◆X

i,j

qiqjFGB

(i, j) (3)

FGB

(i, j) =1

r⇣r2ij + b2ij exp[�r2ij/4b

2

ij ]

⌘ (4)

b2ij =qbibj (5)

UGB

! 0 : Uelec

! UCoulomb

(6)

UGB

! 1 : Uelec

! UCoulomb

(7)

for rij >> bij : Uelec

(i, j) ! UCoulomb

/✏ (8)

for bij >> rij : Uelec

(i, j) ! UCoulomb

(9)

d2ridt2

= � c

mi

@

@E

hybrid

(10)

P (X,�, ⇠|D, I) / P (X|I)P (�)P (⇠)P (D|X,�, ⇠) (11)

1

Page 64: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

Probabilistic view of structure determination

• Evaluate probability for “all” conformations X • and all other parameters necessary for description of problem • prior information: energy of a conformation

• use “weak prior”: covalent geometry / elastic network and “soft spheres”

• likelihood / “satisfaction of restraints” • difference between exp. data and predicted data (forward model) • contains unmeasurable quantities (e.g., data quality σ, theory parameters ξ)

• Bayes’ theorem:

���64

Posterior probabilities prior probabilities likelihood

Uelec

= UCoulomb

+ UGB

(1)

UCoulomb

=

X

i,j

qiqjrij

(2)

UGB

=

✓�1 +

1

◆X

i,j

qiqjFGB

(i, j) (3)

FGB

(i, j) =1

r⇣r2ij + b2ij exp[�r2ij/4b

2

ij ]

⌘ (4)

b2ij =qbibj (5)

UGB

! 0 : Uelec

! UCoulomb

(6)

UGB

! 1 : Uelec

! UCoulomb

(7)

for rij >> bij : Uelec

(i, j) ! UCoulomb

/✏ (8)

for bij >> rij : Uelec

(i, j) ! UCoulomb

(9)

d2ridt2

= � c

mi

@

@E

hybrid

(10)

P (X,�, ⇠|D, I) / P (X|I)P (�)P (⇠)P (D|X,�, ⇠) (11)

1

Page 65: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

Bayesian structure determination

• “Inferential Structure Determination” (ISD) • Michael Habeck, Wolfgang Rieping (Rieping et al., Science 2005) • re-determine forward models for each data type • calculate densities (not single minimum structures) for

• all unknowns (including but not exclusively the coordinates)

• their uncertainty with interdependencies

���65

Posterior probabilities prior probabilities likelihood

Uelec

= UCoulomb

+ UGB

(1)

UCoulomb

=

X

i,j

qiqjrij

(2)

UGB

=

✓�1 +

1

◆X

i,j

qiqjFGB

(i, j) (3)

FGB

(i, j) =1

r⇣r2ij + b2ij exp[�r2ij/4b

2

ij ]

⌘ (4)

b2ij =qbibj (5)

UGB

! 0 : Uelec

! UCoulomb

(6)

UGB

! 1 : Uelec

! UCoulomb

(7)

for rij >> bij : Uelec

(i, j) ! UCoulomb

/✏ (8)

for bij >> rij : Uelec

(i, j) ! UCoulomb

(9)

d2ridt2

= � c

mi

@

@E

hybrid

(10)

P (X,�, ⇠|D, I) / P (X|I)P (�)P (⇠)P (D|X,�, ⇠) (11)

1

Page 66: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Forward model and likelihood in NMR

• Forward model to evaluate likelihood • simple functional form: NOE ∝ r-6

• includes all unknown parameters (e.g., data quality σ)

• data weight depends on σ and is unknown

• NOEs and derived distances: log-normal distribution • Rieping, Habeck, Nilges, JACS 2005

���66

LN(x0, x,σ) ≡ 1√2πσ2x0

exp[− 12σ2

(log[x0] − log[x])2]

Page 67: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Sampling probability distributions

• Posterior P(X,σ,ξ |D,I) is very complex • too many degrees of freedom for systematic search

• coordinates X and other parameters σ, ξ

• Representative samples

• Frequency ∝ probability

• Markov Chain Monte Carlo • detailed balance

���67

Uelec

= UCoulomb

+ UGB

(1)

UCoulomb

=

X

i,j

qiqjrij

(2)

UGB

=

✓�1 +

1

◆X

i,j

qiqjFGB

(i, j) (3)

FGB

(i, j) =1

r⇣r2ij + b2ij exp[�r2ij/4b

2

ij ]

⌘ (4)

b2ij =qbibj (5)

UGB

! 0 : Uelec

! UCoulomb

(6)

UGB

! 1 : Uelec

! UCoulomb

(7)

for rij >> bij : Uelec

(i, j) ! UCoulomb

/✏ (8)

for bij >> rij : Uelec

(i, j) ! UCoulomb

(9)

d2ridt2

= � c

mi

@

@E

hybrid

(10)

P (X,�, ⇠|D, I) / P (X|I)P (�)P (⇠)P (D|X,�, ⇠) (11)

1

Page 68: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

• Computationally complex • cf calculating partition function in statistical mechanics !!

• ISD algorithm uses • hybrid Monte Carlo (HMC) • replica exchange • Tsallis distribution • Gibbs sampling for additional parameters • Habeck, Rieping, Nilges, Phys Rev Lett 2005

• ISD outperforms standard structure calculation in NMR !

• HMC-replica exchange is inefficient, problem for large systems • Test other sampling algorithms (NCMC)

Sampling probability distributions

Page 69: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

���69Sampling nuisance parameters

data quality ⇔ weight

scale factor other parameters !not assumed known !(usually determined by empirical methods: experience, crossvalidation)

Page 70: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

���70Typical trace (SH3 domain)

“energy”

data variance

calibration

replica exchanges

Page 71: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

���71Distribution of σ in Ubiquitin and SH3

P(σ |D,I) = dθdγP(σ |θ, γ)∫ P(θ,γ |D,I)

• Distributions for all parametres

• No fixed “weight” but distribution • “marginalization”:

integration over all other parameters • coordinates

• scale factor

Page 72: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

���72

• a few days on 50 Linux PCs • every “supertransition” is 50 short dynamics trajectories

• in total, > 25000000 hybrid Monte Carlo steps

• convergence of distribution, not only structures

Computational requirements

Page 73: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

in the practical we will use NCMC trajectories

• Sampling: nonequilibrium candidate MC (NCMC) • alternate small and large random perturbations of phi and psi • relaxation (200-1000 steps NVE molecular dynamics)

• simplest version of method by Nilmeier et al. & Chodera (2012) PNAS • implementation in CNS

NCMC step

“ene

rgy”

Page 74: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

• perturbation: • random rotation about a fraction of the torsion angles

• choose optimal weight for the obtained coordinates (no sampling of w_data)

• add compensation term to the total energy

• relaxation: • NVE MD trajectory (250-750 steps)

• Metropolis criterion • total energy differences (Ephys + Edata + Ekin + Esigma)

• between trajectory endpoints

• replica exchange (two replicae) • short trajectories

���74

Page 75: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

���75

• Minimising hybrid energy corresponds to maximizing the probability of a structure, given data and force field

• ...if one knows the data quality, scale factors, ... • Relationship of error distribution and restraint potentials • Weights on data terms

• usually set empirically (trial and error, experience, cross validation)

• Bayesian determination of weight possible

• Bayesian probability theory: • theoretical foundation of structure refinement

Summary

Page 76: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

Summary (Bayes)

• Joint probability distributions can determined by sampling • sampling: frequency is proportional to probability

• e.g., Markov-Chain Monte Carlo methods

• hybrid Monte Carlo has advantages for coordinates

• All unknown parameters can be sampled • coordinates

• parameters of the forward model

• quality of data <=> weights on data

• Distributions of parameters of interest can be obtained by marginalization

���76

Page 77: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

���77

• Braun, W. Distance geometry and related methods for protein structure determination from NMR data. Quart. Rev. BioPhys. 19:115-157, 1987.

• Brunger, AT and Nilges, M. Computational challenges for macromolecular structure determination by X-ray crystallography and solution NMR spectroscopy. Quart. Rev. BioPhys. 26:49-125, 1993.

• Güntert, P. Structure calculation of biological macromolecules from NMR data. Quart Rev Biophys 31:145-237, 1998.

• Nilges, M and O’Donoghue, SI. Ambiguous noes and automated noe assignment. Progr.Nucl. Magn. Reson. Spectrosc. 32:107-139, 1998.

• Güntert P. Automated NMR structure calculation with CYANA. Methods Mol Biol. 2004;278:353-78

• Markwick PR, Malliavin T & Nilges M. (2008) Structural biology by NMR: structure, dynamics, and interactions. PLoS Comput Biol. 4, e1000168.

• Markwick PR, Nilges M.

Literature: general

Page 78: Determination of macromolecular structure and …homepages.laas.fr/jcortes/algosb13/MNilges_data-structure-dynamics.… · Determination of macromolecular structure and dynamics from

Michael Nilges. Structure calculation from NMR data.

���78Literature: Bayesian

• Rieping W, Habeck M, Nilges M. (2005) Inferential structure determination. Science, 309:303-306 • Habeck M, Nilges M, Rieping W. (2005) Bayesian inference applied to macromolecular structure

determination. Physical Reviews E • Habeck M, Rieping W, Nilges M. (2005) Bayesian Estimation of Karplus Parameters and Torsion

Angles from Three-bond Scalar Couplings Constants. J Magn Reson • Habeck M, Nilges M, Rieping W. Replica-exchange Monte Carlo scheme for bayesian data analysis.

Phys Rev Lett. 2005 Jan 14;94(1):018105. • Rieping W, Habeck, M, Nilges, M (2006). Refinement against NOE intensities using a lognormal

distribution improves the quality of NMR structures. JACS, • Nicastro G, Habeck M, Masino L, Svergun, D, Pastore, A. J Biomol NMR 2006, 36, 267–277. • Bayrhuber M, ..., Habeck M ... et al. (2008). Structure of the human voltage-dependent anion

channel. Proc. Natl. Acad. Sci. USA, 105: 15370–5.