computational crystallography initiativephysical biosciences division exploring symmetry, outlier...
TRANSCRIPT
Computational Crystallography Initiative Physical Biosciences Division
Exploring Symmetry, Outlier Detection & Twinning update
Peter Zwart
Computational Crystallography Initiative Physical Biosciences Division
Overview
• Exploring metric symmetry– iotbx.explore_metric_symmetry
• Outlier detection– mmtbx.remove_outliers
• Twinning– mmtbx.twin_map_utils– Actually: cctbx.python
$MMTBX_DIST/mmtbx/twinning/twin_map_utils.py
Computational Crystallography Initiative Physical Biosciences Division
Exploring metric symmetry
• Protein crystals grown under various conditions can sometimes exhibit drastic changes in symmetry and unit cell dimensions
• Sometimes, the crystal symmetries are related– The relation is not always obvious – Finding the relation between two unit cells can be
not so straightforward
• Knowing the relations between the different crystal forms can be helpful during structure solution
Computational Crystallography Initiative Physical Biosciences Division
Exploring metric symmetry
• How to find relations between unit cells?– A sub-lattice formalism allows one to
generate a family of related lattices from a given lattice• The number of unique unit cells that are N times
larger than the original unit cell is quite smallRutherford, Acta Cryst. (2006). A62, 93-97
– Unit cells of approximate equal volume can be compared to each other by checking a large number of uni-modular transforms• Ralfs work
Computational Crystallography Initiative Physical Biosciences Division
Exploring metric symmetry
• Sub lattice?– Given all lattice
points, ignore some of them while ensuring that the remaining lattice points form a regular lattice
Computational Crystallography Initiative Physical Biosciences Division
Exploring metric symmetry
• Examples
Native : P212121 61.8 97.7 148.9 90 90 90
SeMet1 : P21 115.5 149.0 115.6 90 115 90
SeMet2 : C2221 123.6 195.4 148.9 90 90 90
Poulsen, et al, (2001). Acta Cryst. D57, 1251-1259.
Computational Crystallography Initiative Physical Biosciences Division
Exploring metric symmetry
• Future– Provide reindexing methods between
related unit cells.• Would make molecular replacement of
related structures easier• Useful for multi crystal averaging
– Obtain non-merohedral twin laws from this analyses
Computational Crystallography Initiative Physical Biosciences Division
Outlier detection
• Outliers can have a detrimental effect on the progress of structure solution and refinement– Read, Acta Cryst. (1999). D55, 1759-1764
• The detection of outliers should be performed on the basis of all information available.– Use model info if you can
• One would like to have the flexibility of correcting for mistakes made earlier– Those reflection with E-values larger then 5 could
have been valid observations!
Computational Crystallography Initiative Physical Biosciences Division
Outlier detection
• What is an outlier?– A data point that does not fit a model
because of an abnormal situation such as an erroneous measurement.
• How to spot them?– If Fobs is not reconcilable with Fcalc,
Fobs might be an outlier• Reconcilable?
– Fobs should be explainable from Fcalc and the current quality of the model (A)
Computational Crystallography Initiative Physical Biosciences Division
Outlier detection
• Model based outlier detection is done in a similar way to the method described by Read (Acta Cryst. (1999). D55, 1759-1764)
– Fobs and Fcalc are normalized to get Eobs & Ecalc
A is estimated for each reflection• Combining standard likelihood techniques with kernel
methods to obtain smooth varying estimates
– Find :
– Compute :
)],|([maxarg calcobsmaxobs
AE
EEPLogE
)])([)][(2 maxobs EPLogP(ELogQ
Computational Crystallography Initiative Physical Biosciences Division
Outlier detection
• Q is approximately 2 distributed• Acceptable values of Q are determined by
the size of the dataset– If the dataset is large, large deviations are
expected
• A p-value is computed for each reflection– The p-value is the probability that if this particular
Q-value was the largest in the dataset, a Q value of equal or larger value is observed by chance.
• Observations for which the p-value is smaller than 5% are considered outliers.
Computational Crystallography Initiative Physical Biosciences Division
Outlier detection
• Example: 1ty3• Wilson statistics
indicate 1 outlier(25,6,-43) Eobs = 3.938 centric = Truep-wilson = 1.83E-07p-extreme = 9.0E-03
• Model based outlier detection indicate that the (25,6,-43) is a valid observation
0
2 7 12 17Q
P(Q
)
Theoretical Chi Squares
Observed distribution of Q
Computational Crystallography Initiative Physical Biosciences Division
Outlier Detection
• The outlier detection algorithm is embedded in a class that caches the original observed data.
• This will allow one to perform outlier detection during different macro-cycles/rebuilding states and update
• Will be incorporated in phenix.refine at the appropriate juncture– Command line tool available
Computational Crystallography Initiative Physical Biosciences Division
Twinning progress report
• Routines available– Least squares target functions
• Both intensity and amplitude• Target values and first derivatives
– Detwinning• Standard and a la Sheldrick
– R-values– Map coefficients
• 2mFo-DFc & gradient maps
– Bulk solvent scaling• Estimation of twin fraction, ksol Bsol, U* and overall scale
on twinned data– Using global optimizer (differential evolution) for the
moment
Computational Crystallography Initiative Physical Biosciences Division
Twinning progress report
• Bulk solvent scaling and detwinned map generation available as a command line tool mmtbx.twin_map_utils
• Results similar to CNS
• mmtbx.twin_map_utils should be seen as the first step to full integration of twin utilities in phenix.refine
Computational Crystallography Initiative Physical Biosciences Division
Twinning progress report
mmtbx.twin_map_utils CNS
Computational Crystallography Initiative Physical Biosciences Division
Twinning progress report
Twinning not taken into account
1eyx: twin fraction = 0.47; difference maps at 2.5 sigma Ligands and waters deleted (10% of total model)
Twinning taken into account
Computational Crystallography Initiative Physical Biosciences Division
Twinning progress report
Twinning not taken into account
Difference in 2mFO-DFC density is less striking
Twinning taken into account
Computational Crystallography Initiative Physical Biosciences Division
Twinning progress report
• Future plans– Likelihood based map coefficients
• in collaboration with Randy Read
– Incorporation of least squares targets in phenix.refine
– Likelihood based targets • in collaboration with Randy Read
Computational Crystallography Initiative Physical Biosciences Division
Ackowledgements
Paul Adams
Ralf Grosse-Kunstleve
Pavel Afonine
Nigel Moriarty
Nick Sauter
Michael Hohn
Cambridge
Randy Read
Airlie McCoy
Los Alamos
Tom Terwilliger
Li Wei Hung
Texas A&M Univeristy
Jim Sacchettini
Tom Ioerger
Eric McKee
Duke University
Jane Richardson
David Richardson
Phenix industrial Consortium
Robert Nolte
Eric Vogan
Funding:
– LBNL (DE-AC03-76SF00098)
– NIH/NIGMS (P01GM063210)
– PHENIX Industrial Consortium
Computational Crystallography Initiative Physical Biosciences Division
Kernel methods
• Discrete binning of X-ray data introduces discontinuous jumps of properties that are continuously varying properties– Mean intensity (normalisation)
– The estimation of A
• Possible remedies:– Spline functions
• Used extensively by K. Cowtan
– Kernel methods
Computational Crystallography Initiative Physical Biosciences Division
Kernel methods
• Discreet binning assumes a constant value in a certain range
Computational Crystallography Initiative Physical Biosciences Division
Kernel methods
• With Kernel methods, the estimate at each position is based on a full dataset.– The amount that each datum contributes is determined
by a weighting function (usually depending on the
squared distance)
Computational Crystallography Initiative Physical Biosciences Division
Kernel methods
• Kernel method available for normalisation– Used by xtriage in intensity statistics
• Kernel method available for of A
estimation– Used in the outlier detection
Computational Crystallography Initiative Physical Biosciences Division
Kernel methods
• Determination of alpha from A estimated using kernel methods results in values similar as those obtained by what is available in phenix.refine
• Similar results for beta
y = 0.9969x - 0.0177
R2 = 0.964
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1 1.2
mean alpha (phenix.refine )
mea
n a
lph
a (sigmaA
)