computational crystallography initiativephysical biosciences division exploring symmetry, outlier...

25
Computational Crystallography Initiative Physical Biosciences Division Exploring Symmetry, Outlier Detection & Twinning update Peter Zwart

Upload: prudence-lester

Post on 17-Jan-2016

225 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Computational Crystallography InitiativePhysical Biosciences Division Exploring Symmetry, Outlier Detection & Twinning update Peter Zwart

Computational Crystallography Initiative Physical Biosciences Division

Exploring Symmetry, Outlier Detection & Twinning update

Peter Zwart

Page 2: Computational Crystallography InitiativePhysical Biosciences Division Exploring Symmetry, Outlier Detection & Twinning update Peter Zwart

Computational Crystallography Initiative Physical Biosciences Division

Overview

• Exploring metric symmetry– iotbx.explore_metric_symmetry

• Outlier detection– mmtbx.remove_outliers

• Twinning– mmtbx.twin_map_utils– Actually: cctbx.python

$MMTBX_DIST/mmtbx/twinning/twin_map_utils.py

Page 3: Computational Crystallography InitiativePhysical Biosciences Division Exploring Symmetry, Outlier Detection & Twinning update Peter Zwart

Computational Crystallography Initiative Physical Biosciences Division

Exploring metric symmetry

• Protein crystals grown under various conditions can sometimes exhibit drastic changes in symmetry and unit cell dimensions

• Sometimes, the crystal symmetries are related– The relation is not always obvious – Finding the relation between two unit cells can be

not so straightforward

• Knowing the relations between the different crystal forms can be helpful during structure solution

Page 4: Computational Crystallography InitiativePhysical Biosciences Division Exploring Symmetry, Outlier Detection & Twinning update Peter Zwart

Computational Crystallography Initiative Physical Biosciences Division

Exploring metric symmetry

• How to find relations between unit cells?– A sub-lattice formalism allows one to

generate a family of related lattices from a given lattice• The number of unique unit cells that are N times

larger than the original unit cell is quite smallRutherford, Acta Cryst. (2006). A62, 93-97

– Unit cells of approximate equal volume can be compared to each other by checking a large number of uni-modular transforms• Ralfs work

Page 5: Computational Crystallography InitiativePhysical Biosciences Division Exploring Symmetry, Outlier Detection & Twinning update Peter Zwart

Computational Crystallography Initiative Physical Biosciences Division

Exploring metric symmetry

• Sub lattice?– Given all lattice

points, ignore some of them while ensuring that the remaining lattice points form a regular lattice

Page 6: Computational Crystallography InitiativePhysical Biosciences Division Exploring Symmetry, Outlier Detection & Twinning update Peter Zwart

Computational Crystallography Initiative Physical Biosciences Division

Exploring metric symmetry

• Examples

Native : P212121 61.8 97.7 148.9 90 90 90

SeMet1 : P21 115.5 149.0 115.6 90 115 90

SeMet2 : C2221 123.6 195.4 148.9 90 90 90

Poulsen, et al, (2001). Acta Cryst. D57, 1251-1259.

Page 7: Computational Crystallography InitiativePhysical Biosciences Division Exploring Symmetry, Outlier Detection & Twinning update Peter Zwart

Computational Crystallography Initiative Physical Biosciences Division

Exploring metric symmetry

• Future– Provide reindexing methods between

related unit cells.• Would make molecular replacement of

related structures easier• Useful for multi crystal averaging

– Obtain non-merohedral twin laws from this analyses

Page 8: Computational Crystallography InitiativePhysical Biosciences Division Exploring Symmetry, Outlier Detection & Twinning update Peter Zwart

Computational Crystallography Initiative Physical Biosciences Division

Outlier detection

• Outliers can have a detrimental effect on the progress of structure solution and refinement– Read, Acta Cryst. (1999). D55, 1759-1764

• The detection of outliers should be performed on the basis of all information available.– Use model info if you can

• One would like to have the flexibility of correcting for mistakes made earlier– Those reflection with E-values larger then 5 could

have been valid observations!

Page 9: Computational Crystallography InitiativePhysical Biosciences Division Exploring Symmetry, Outlier Detection & Twinning update Peter Zwart

Computational Crystallography Initiative Physical Biosciences Division

Outlier detection

• What is an outlier?– A data point that does not fit a model

because of an abnormal situation such as an erroneous measurement.

• How to spot them?– If Fobs is not reconcilable with Fcalc,

Fobs might be an outlier• Reconcilable?

– Fobs should be explainable from Fcalc and the current quality of the model (A)

Page 10: Computational Crystallography InitiativePhysical Biosciences Division Exploring Symmetry, Outlier Detection & Twinning update Peter Zwart

Computational Crystallography Initiative Physical Biosciences Division

Outlier detection

• Model based outlier detection is done in a similar way to the method described by Read (Acta Cryst. (1999). D55, 1759-1764)

– Fobs and Fcalc are normalized to get Eobs & Ecalc

A is estimated for each reflection• Combining standard likelihood techniques with kernel

methods to obtain smooth varying estimates

– Find :

– Compute :

)],|([maxarg calcobsmaxobs

AE

EEPLogE

)])([)][(2 maxobs EPLogP(ELogQ

Page 11: Computational Crystallography InitiativePhysical Biosciences Division Exploring Symmetry, Outlier Detection & Twinning update Peter Zwart

Computational Crystallography Initiative Physical Biosciences Division

Outlier detection

• Q is approximately 2 distributed• Acceptable values of Q are determined by

the size of the dataset– If the dataset is large, large deviations are

expected

• A p-value is computed for each reflection– The p-value is the probability that if this particular

Q-value was the largest in the dataset, a Q value of equal or larger value is observed by chance.

• Observations for which the p-value is smaller than 5% are considered outliers.

Page 12: Computational Crystallography InitiativePhysical Biosciences Division Exploring Symmetry, Outlier Detection & Twinning update Peter Zwart

Computational Crystallography Initiative Physical Biosciences Division

Outlier detection

• Example: 1ty3• Wilson statistics

indicate 1 outlier(25,6,-43) Eobs = 3.938 centric = Truep-wilson = 1.83E-07p-extreme = 9.0E-03

• Model based outlier detection indicate that the (25,6,-43) is a valid observation

0

2 7 12 17Q

P(Q

)

Theoretical Chi Squares

Observed distribution of Q

Page 13: Computational Crystallography InitiativePhysical Biosciences Division Exploring Symmetry, Outlier Detection & Twinning update Peter Zwart

Computational Crystallography Initiative Physical Biosciences Division

Outlier Detection

• The outlier detection algorithm is embedded in a class that caches the original observed data.

• This will allow one to perform outlier detection during different macro-cycles/rebuilding states and update

• Will be incorporated in phenix.refine at the appropriate juncture– Command line tool available

Page 14: Computational Crystallography InitiativePhysical Biosciences Division Exploring Symmetry, Outlier Detection & Twinning update Peter Zwart

Computational Crystallography Initiative Physical Biosciences Division

Twinning progress report

• Routines available– Least squares target functions

• Both intensity and amplitude• Target values and first derivatives

– Detwinning• Standard and a la Sheldrick

– R-values– Map coefficients

• 2mFo-DFc & gradient maps

– Bulk solvent scaling• Estimation of twin fraction, ksol Bsol, U* and overall scale

on twinned data– Using global optimizer (differential evolution) for the

moment

Page 15: Computational Crystallography InitiativePhysical Biosciences Division Exploring Symmetry, Outlier Detection & Twinning update Peter Zwart

Computational Crystallography Initiative Physical Biosciences Division

Twinning progress report

• Bulk solvent scaling and detwinned map generation available as a command line tool mmtbx.twin_map_utils

• Results similar to CNS

• mmtbx.twin_map_utils should be seen as the first step to full integration of twin utilities in phenix.refine

Page 16: Computational Crystallography InitiativePhysical Biosciences Division Exploring Symmetry, Outlier Detection & Twinning update Peter Zwart

Computational Crystallography Initiative Physical Biosciences Division

Twinning progress report

mmtbx.twin_map_utils CNS

Page 17: Computational Crystallography InitiativePhysical Biosciences Division Exploring Symmetry, Outlier Detection & Twinning update Peter Zwart

Computational Crystallography Initiative Physical Biosciences Division

Twinning progress report

Twinning not taken into account

1eyx: twin fraction = 0.47; difference maps at 2.5 sigma Ligands and waters deleted (10% of total model)

Twinning taken into account

Page 18: Computational Crystallography InitiativePhysical Biosciences Division Exploring Symmetry, Outlier Detection & Twinning update Peter Zwart

Computational Crystallography Initiative Physical Biosciences Division

Twinning progress report

Twinning not taken into account

Difference in 2mFO-DFC density is less striking

Twinning taken into account

Page 19: Computational Crystallography InitiativePhysical Biosciences Division Exploring Symmetry, Outlier Detection & Twinning update Peter Zwart

Computational Crystallography Initiative Physical Biosciences Division

Twinning progress report

• Future plans– Likelihood based map coefficients

• in collaboration with Randy Read

– Incorporation of least squares targets in phenix.refine

– Likelihood based targets • in collaboration with Randy Read

Page 20: Computational Crystallography InitiativePhysical Biosciences Division Exploring Symmetry, Outlier Detection & Twinning update Peter Zwart

Computational Crystallography Initiative Physical Biosciences Division

Ackowledgements

Paul Adams

Ralf Grosse-Kunstleve

Pavel Afonine

Nigel Moriarty

Nick Sauter

Michael Hohn

Cambridge

Randy Read

Airlie McCoy

Los Alamos

Tom Terwilliger

Li Wei Hung

Texas A&M Univeristy

Jim Sacchettini

Tom Ioerger

Eric McKee

Duke University

Jane Richardson

David Richardson

Phenix industrial Consortium

Robert Nolte

Eric Vogan

Funding:

– LBNL (DE-AC03-76SF00098)

– NIH/NIGMS (P01GM063210)

– PHENIX Industrial Consortium

Page 21: Computational Crystallography InitiativePhysical Biosciences Division Exploring Symmetry, Outlier Detection & Twinning update Peter Zwart

Computational Crystallography Initiative Physical Biosciences Division

Kernel methods

• Discrete binning of X-ray data introduces discontinuous jumps of properties that are continuously varying properties– Mean intensity (normalisation)

– The estimation of A

• Possible remedies:– Spline functions

• Used extensively by K. Cowtan

– Kernel methods

Page 22: Computational Crystallography InitiativePhysical Biosciences Division Exploring Symmetry, Outlier Detection & Twinning update Peter Zwart

Computational Crystallography Initiative Physical Biosciences Division

Kernel methods

• Discreet binning assumes a constant value in a certain range

Page 23: Computational Crystallography InitiativePhysical Biosciences Division Exploring Symmetry, Outlier Detection & Twinning update Peter Zwart

Computational Crystallography Initiative Physical Biosciences Division

Kernel methods

• With Kernel methods, the estimate at each position is based on a full dataset.– The amount that each datum contributes is determined

by a weighting function (usually depending on the

squared distance)

Page 24: Computational Crystallography InitiativePhysical Biosciences Division Exploring Symmetry, Outlier Detection & Twinning update Peter Zwart

Computational Crystallography Initiative Physical Biosciences Division

Kernel methods

• Kernel method available for normalisation– Used by xtriage in intensity statistics

• Kernel method available for of A

estimation– Used in the outlier detection

Page 25: Computational Crystallography InitiativePhysical Biosciences Division Exploring Symmetry, Outlier Detection & Twinning update Peter Zwart

Computational Crystallography Initiative Physical Biosciences Division

Kernel methods

• Determination of alpha from A estimated using kernel methods results in values similar as those obtained by what is available in phenix.refine

• Similar results for beta

y = 0.9969x - 0.0177

R2 = 0.964

0

0.2

0.4

0.6

0.8

1

1.2

0 0.2 0.4 0.6 0.8 1 1.2

mean alpha (phenix.refine )

mea

n a

lph

a (sigmaA

)