reusing phenix.refine for powder data? ralf w. grosse-kunstleve computational crystallography...
TRANSCRIPT
Reusing phenix.refine for powder data?
Ralf W. Grosse-Kunstleve
Computational Crystallography InitiativeLawrence Berkeley National Laboratory
Workshop on developments and directions of powder diffraction on proteins, June 22/23, 2007
My two lives
• Live 1 (PhD project):– Zeolite structure determination from
powder data using extracted intensities
• Live 2:– Contributions to Xplor/CNS
• Single-crystal protein crystallography• About 80% of all PDB entries refined with Xplor/CNS
– Phenix project• Fresh start after losing a legal battle
Funding: NIH Program Project (NIGMS, PSI), Director - Paul Adams
CCI APPS
SOLVE / RESOLVE
PHASER
TEXTAL
MolProbity / REDUCE
Computational Crystallography Initiative (LBNL)-Paul Adams, Ralf Grosse-Kunstleve, Pavel Afonine-Nigel Moriarty, Nicholas Sauter, Peter Zwart
Los Alamos National Lab (LANL)-Tom Terwilliger, Li-Wei Hung
Cambridge University -Randy Read, Airlie McCoy
Texas A&M University -Tom Ioerger, Jim Sacchettini, Erik McKee
Duke University - Jane Richardson, David Richardson, Ian Davis
Phenix Collaboration
Spectrum of phenix components
• Automated analysis of data quality: phenix.xtriage
• Rapid substructure determination: phenix.hyss
• Phasing: Maximum likelihood – SOLVE, PHASER for SAD
• Density modification: Statistical density modification (RESOLVE)
• Automated model building:– Pattern matching methods (RESOLVE or TEXTAL)
• Structure refinement: phenix.refine (likelihood, annealing, TLS)
• Advanced automation: AutoSol – hkl to map
• Ligand building and fitting: eLBOW, AutoLigand
• Validation and Hydrogens: MolProbity + Reduce
phenix.refine
- Group ADP refinement
- Rigid body refinement
- Restrained refinement (xyz, iso/aniso ADP)
- Automatic water picking
- Bond density
- Unrestrained refinement
- FFT or direct summation
- Hydrogens
- Automatic NCS restraints
- Simulated Annealing
- Occupancies (individual, group)
- TLS refinement
- Twinned data
- X-ray, Neutron, joint X-ray + Neutron refinement
Refinement flowchart
Input data and model processing
Refinement strategy selection
Bulk-solvent, Anisotropic scaling, Twinning parameters refinement
Ordered solvent (add / remove)
Target weights calculation
Coordinate refinement(rigid body, individual)
(minimization or Simulated Annealing)
ADP refinement(TLS, group, individual iso / aniso)
Occupancy refinement (individual, group)
Output: Refined model, various maps, structure factors, complete statistics
PDB model,Any data format (CNS, Shelx, MTZ, …)
Files for COOT, O, PyMol
Repeated several times
Designed to be very easy to use
Refinement of individual coordinates and B-factors:
% phenix.refine model.pdb data.hkl
Same as above plus water picking:
% phenix.refine model.pdb data.hkl ordered_solvent=true
Run with parameter file:
% phenix.refine model.pdb data.hkl parameter_file
refinement.main { high_resolution = 2.0 simulated_annealing = True ordered_solvent = True number_of_macro_cycles = 5}refinement.refine.adp { tls = chain A tls = chain B}
How to best make ends meet?
• GSAS & proteins– Extending a small-molecule powder program to deal with
proteins– Advantage: program designed for the field
• Community used to inputs, outputs, idiosyncrasies
– Disadvantage: some approaches suitable for small molecules don’t scale
• Direct-summation structure factor calculation• Neighborhood calculations (nonbonded interactions, a.k.a.
anti-bumping restraints)
• phenix.refine– Extending a single-crystal protein program to deal with
powders– Advantage: program designed to deal with large structures
• Protein, RNA/DNA restraint libraries, optimized algorithms
– Disadvantage: new data formats, differences in terminology
Two main challenges
• Challenge 1:– Input/output of powder-specific format
• Fundamentally trivial but potentially tedious
• New command?– No interference with existing, non-trivial algorithms for automatic
recognition, processing, and consolidation of already very heterogeneous inputs
• Extend the existing input algorithms?– Nicer, but requires higher degree of collaboration
• Challenge 2:– Development of a powder-specific target function
• Based on extracted intensities or primary pattern + pre-fitted profile parameters?
• Maximum likelihood with or without cross-validation?
• Will probably require some refactoring of the refinement engine
Modular design
• Application level– phenix wizards (data in, structure out)– phenix.refine– phenix.hyss (hybrid substructure search)– Visible source
• Library level– cctbx project, organized in modules
• libtbx, scitbx, cctbx, iotbx, mmtbx
– cctbx is intended to cover small-molecule work• But nothing yet specific to powders
– Unrestricted open source
Existing target functions
• Least-squares (variety)• Maximum likelihood on amplitudes• Maximum likelihood with experimental
phases• Least-squares twin target• SAD-specific maximum likelihood target
implemented in Phaser– Reusing target from external application!
• Dirty laundry– Severe code duplication in implementation of twin target
• Needs to be consolidated
– Some friction integrating the Phaser ML-SAD target• Phaser target relatively slow: we need better bookkeeping to avoid repeated calculations with exactly the same
input
Precedence for reusing cctbx?
• cctbx used heavily by all phenix collaborators• Phaser uses cctbx -> cctbx supported by CCP4 6.0
and up• smtbx: small-molecule toolbox
– Group at Durham University, U.K. collaborating with David Watkin at Oxford University, U.K.
– Long-term goal: highly integrated single-crystal structure determination (direct methods), automatic model building and refinement
– Initial focus: iterative model building and refinement– Initial approach: reuse + adjust cctbx core libraries directly
combined with copying sub-modules to smtbx where they are modified
– Long term: consolidate duplications as much as possible• half the code = half the bugs, reuse of optimizations
Summary of ideas
• Implement powder-specific target function(s) that plug into the refinement engine in the open source cctbx libraries– Can be done stand-alone using ad-hoc input/output methods– Collaborate in making the necessary adjustments to the existing
libraries• Figure out the best way to handle input/output at the
application level– Learn and re-evaluate as we go
• If the powder field joins in there will be the potential for direct cross-fertilization between three specializations in crystallography– Single-crystal protein– Single-crystal small-molecule– Powder diffraction protein– More? (powder diffraction small-molecule)
• cctbx libraries are very general• Ever increasing integration is the secret behind the stunning
successes in the development of computing technology– Can we make this idea work in crystallography?
Availability
• Phenix incl. Graphical User Interface– http://www.phenix-online.org/
– Freely available to academic (non-profit) groups
• Core libraries (cctbx)– http://cctbx.sourceforge.net/
– Freely available to all
Acknowledgments
• Phenix developers
– P.D. Adams– P. Afonine– T.R. Ioerger– A.J. McCoy– E.W. McKee– N.W. Moriarty– R.J. Read– N.K. Sauter– J.N. Smith– L.C. Storoni– T.C. Terwilliger– P.H. Zwart
• Funding: – LBNL (DE-AC03-76SF00098)
– NIH/NIGMS (1P01GM063210)
– PHENIX Industrial Consortium