automated protein structure solution for weak sad data pavol skubak and navraj pannu automated...

25
Automated protein structure solution for Automated protein structure solution for weak SAD data weak SAD data Pavol Skubak and Navraj Pannu Pavol Skubak and Navraj Pannu Biophysical Structural Chemistry, Leiden University, The Netherlands http://www.bfsc.leidenuniv.nl/softwa re/crank/

Upload: ruby-butler

Post on 22-Dec-2015

222 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Automated protein structure solution for weak SAD data Pavol Skubak and Navraj Pannu Automated protein structure solution for weak SAD data Pavol Skubak

Automated protein structure Automated protein structure solution for weak SAD datasolution for weak SAD data

Pavol Skubak and Navraj PannuPavol Skubak and Navraj Pannu

Biophysical Structural Chemistry,Leiden University, The Netherlands

http://www.bfsc.leidenuniv.nl/software/crank/

Page 2: Automated protein structure solution for weak SAD data Pavol Skubak and Navraj Pannu Automated protein structure solution for weak SAD data Pavol Skubak

Low resolution and/or weak anomalous signal SAD data sets

• With a sufficient anomalous signal and resolution better than 3 Angstroms, your structure is likely to be automatically built.

• What can be done if your SAD data is weak?

Page 3: Automated protein structure solution for weak SAD data Pavol Skubak and Navraj Pannu Automated protein structure solution for weak SAD data Pavol Skubak

Simultaneously combining experimental phasing steps to

improve structure solution

• Traditionally structure solution is divided into distinct steps:• Substructure detection

• Obtain initial phases

• (Density) modify the initial experimental map

• Build and refine the model

• By combining these steps, we can improve the process.

Page 4: Automated protein structure solution for weak SAD data Pavol Skubak and Navraj Pannu Automated protein structure solution for weak SAD data Pavol Skubak

Phases, phase probabilitiesPhases, phase probabilities

Experimental data, anomalous substructure

Experimental data, anomalous substructure

ModelModel

Experimental phasingExperimental phasing

Density modification, phase combination

Density modification, phase combination

Model building and refinementModel building and refinement

Tra

dit

ion

al a

pp

roac

h

Phases, phase probabilitiesPhases, phase probabilities

Traditional structure solution

•Step-wise•Information is propagated via ‘phase probabilities’

Page 5: Automated protein structure solution for weak SAD data Pavol Skubak and Navraj Pannu Automated protein structure solution for weak SAD data Pavol Skubak

Phase probabilities

• The phase distribution can be approximated via 4 “Hendrickson-Lattmann” coefficients, A, B, C, D.

• We rely on programs to estimate these coefficients.

• ‘Even better than the real thing?’

))2sin()2cos()sin()cos(exp()( DCBAP

Page 6: Automated protein structure solution for weak SAD data Pavol Skubak and Navraj Pannu Automated protein structure solution for weak SAD data Pavol Skubak

Phase probabilities

How do we choose the phase to use for our electron density map?

The “best phase” corresponds to mean/average/expected value (Blow and Crick).

What are ways to determine how good and accurate your phases are?FOM: figure of merit: mean cosine of the phase error (a

value of 1 means phases are perfect, a value of 0 means phases is the same as random).

Page 7: Automated protein structure solution for weak SAD data Pavol Skubak and Navraj Pannu Automated protein structure solution for weak SAD data Pavol Skubak

Density modification● Density modification is a problem of combining

information:

Page 8: Automated protein structure solution for weak SAD data Pavol Skubak and Navraj Pannu Automated protein structure solution for weak SAD data Pavol Skubak

Kevin Cowtan, [email protected] 2012

Density modification

2. Phase weighting:

|F|, φ

ρmod(x)

ρ(x)

|Fmod|, φmod

φ=f(φexp,φmod)

FFT

FFT-1

Modify ρ

Real spaceReciprocal space

Page 9: Automated protein structure solution for weak SAD data Pavol Skubak and Navraj Pannu Automated protein structure solution for weak SAD data Pavol Skubak

HL propagation and the independence assumption

• Use the experimental data and anomalous substructure directly!• Do not need to assume independence or rely

on HL coefficients.• Need multivariate distributions at each step

that take into account correlations between the model and data.

Page 10: Automated protein structure solution for weak SAD data Pavol Skubak and Navraj Pannu Automated protein structure solution for weak SAD data Pavol Skubak

PhasesPhases

Experimental data, anomalous substructure

Experimental data, anomalous substructure

ModelModel

Experimental phasingExperimental phasing

Density modification, phase combination

Density modification, phase combination

Model building and refinementModel building and refinement

Tra

dit

ion

al a

pp

roac

h

Step-wise multivariate structure solution

•Still step-wise•Information is propagated via the data and model(s).

PhasesPhases

Page 11: Automated protein structure solution for weak SAD data Pavol Skubak and Navraj Pannu Automated protein structure solution for weak SAD data Pavol Skubak

PhasesPhases

Experimental data, anomalous substructure

Experimental data, anomalous substructure

Density modificationDensity modification Model buildingModel building

Electron densityElectron density ModelModel

Combined experimental phasing, phase combination and model

refinement

Combined experimental phasing, phase combination and model

refinement

Combined structure solution

•Simultaneously use information from experimental phasing, density modification and model refinement

Page 12: Automated protein structure solution for weak SAD data Pavol Skubak and Navraj Pannu Automated protein structure solution for weak SAD data Pavol Skubak

Tests of > 140 real SAD data sets

• Resolution range of data sets is 0.94 to 3.88 Angstroms

• Types of anomalous scatterers: selenium, sulfur, chloride, iodide, bromide, calcium, zinc (and others).

• We compare with the step wise multivariate approach (current CRANK) versus the combined approach.

Page 13: Automated protein structure solution for weak SAD data Pavol Skubak and Navraj Pannu Automated protein structure solution for weak SAD data Pavol Skubak

Model building results on over 140 real SAD data sets

(using parrot and buccaneer)

Page 14: Automated protein structure solution for weak SAD data Pavol Skubak and Navraj Pannu Automated protein structure solution for weak SAD data Pavol Skubak

Summary of large scale test

• The average fraction of the model built increased from 60% to 74% with the new approach.

• If we exclude data sets built to 85% by the current approach or where the substructure was not found, 45 data sets remain and the average fraction of the model built increased from 28 to 77%.

Page 15: Automated protein structure solution for weak SAD data Pavol Skubak and Navraj Pannu Automated protein structure solution for weak SAD data Pavol Skubak

3.88 Angstrom RNA polymerase II• 3.88 Angstrom SAD data with signal from zinc.• Authors could not solve the structure with SAD data alone, but

with a partial model, multi-crystal MAD and manual building. • > 80% can be built with SAD data alone with the new

algorithm automatically to an R-free of 37.6%

Page 16: Automated protein structure solution for weak SAD data Pavol Skubak and Navraj Pannu Automated protein structure solution for weak SAD data Pavol Skubak

MR-SAD tests4.5 Angstrom ATPase SecA-SecY

complex

• 4.5 Angstrom data set with weak anomalous signal from Se-Met SecY.

• Authors could not solve the structure with SAD data alone, but used a partial MR structure, (2-fold NCS averaging), cross-crystal averaging, and manual model building

• > 60% can be built automatically starting just from selenium positions (obtained from partial MR solution)

• R-free obtained was under 40%. • If we start with the partial MR structure, results are

worse!

Page 17: Automated protein structure solution for weak SAD data Pavol Skubak and Navraj Pannu Automated protein structure solution for weak SAD data Pavol Skubak

Related RNA polymerases complex from Cramer et al.

• 3.3 Angstrom data with signal from zinc.• Could not solve the structure with anomalous data alone.• With the new method, a majority can be built automatically

in minutes.

Page 18: Automated protein structure solution for weak SAD data Pavol Skubak and Navraj Pannu Automated protein structure solution for weak SAD data Pavol Skubak

4.6 Angstrom SKI2-3-8 complex

• 4.6 Angstrom data set Se-Met data set.• > 50% can be built automatically starting just from

selenium positions.• R-free obtained was under 40%. • If we start with the partial MR structure, results are

again worse! (MR models are higher resolution and fit ‘fairly’ well to the final model).

Page 19: Automated protein structure solution for weak SAD data Pavol Skubak and Navraj Pannu Automated protein structure solution for weak SAD data Pavol Skubak

Future work on MR-SAD

• Are we throwing away useful information or is it biased?

• At the moment, if a (partial) MR solution is available, it is best to run two crank2 jobs:• Input the whole MR solution

• Input just the heavy atoms

Page 20: Automated protein structure solution for weak SAD data Pavol Skubak and Navraj Pannu Automated protein structure solution for weak SAD data Pavol Skubak

Crank1 vs Crank2

• Crank is suitable for S/MAD and S/MIRAS experiments and implements a stepwise multivariate function.

• Crank2 is its replacement that implements a combined multivariate function for SAD only.

• Both are available in CCP4 6.4.0 at the moment.

Page 21: Automated protein structure solution for weak SAD data Pavol Skubak and Navraj Pannu Automated protein structure solution for weak SAD data Pavol Skubak

Programs in Crank

Page 22: Automated protein structure solution for weak SAD data Pavol Skubak and Navraj Pannu Automated protein structure solution for weak SAD data Pavol Skubak

• The number of cycles run.• The number of atoms to search for.

– Should be within 10-20% of actual number

– A first guess uses a probabilistic Matthew’s coefficient

• The resolution cut-off:– For MAD, look at signed anomalous difference

correlation.

– For SAD, a first guess is 0.5 + high resolution limit.

Important parameters in substructure detection

Page 23: Automated protein structure solution for weak SAD data Pavol Skubak and Navraj Pannu Automated protein structure solution for weak SAD data Pavol Skubak

Is my map good enough?

• Statistics from substructure phasing:– Look at FOM from BP3.

– For SAD, look at Luzzati parameters.

– Refined occupancies.

• Statistics from density modification:– Compare the “contrast” from hand and enantiomorph

(output of solomon or shelxe).

• Does it look like a protein? (model visualization)• For Crank2, look to see if R-comb < 40%.

Page 24: Automated protein structure solution for weak SAD data Pavol Skubak and Navraj Pannu Automated protein structure solution for weak SAD data Pavol Skubak

References

Crank Ness et al (2004) Structure 12, 1753-1761. Pannu et al (2011) Acta Cryst D67, 331-337.

Combined approach and Crank2 Skubak and Pannu (2013) Nature Communications 4: 2777.

Using data directly in refinement Skubak et al (2004) Acta Cryst D60, 2196-2201. Skubak et al (2009) Acta Cryst D65, 1051-1061.

Multivariate phase combination Waterreus et al (2010) Acta Cryst D66, 783-788.

Page 25: Automated protein structure solution for weak SAD data Pavol Skubak and Navraj Pannu Automated protein structure solution for weak SAD data Pavol Skubak

Acknowledgements

• All dataset contributors (JCSG, Z. Dauter, M.Weiss, C.Mueller-Dieckmann)

• Garib Murshudov, Kevin Cowtan, George Sheldrick, Victor Lamzin

• http://www.bfsc.leidenuniv.nl/software/crank/

Cyttron