![Page 1: Reusing phenix.refine for powder data? Ralf W. Grosse-Kunstleve Computational Crystallography Initiative Lawrence Berkeley National Laboratory Workshop](https://reader036.vdocuments.site/reader036/viewer/2022062714/56649d045503460f949d754f/html5/thumbnails/1.jpg)
Reusing phenix.refine for powder data?
Ralf W. Grosse-Kunstleve
Computational Crystallography InitiativeLawrence Berkeley National Laboratory
Workshop on developments and directions of powder diffraction on proteins, June 22/23, 2007
![Page 2: Reusing phenix.refine for powder data? Ralf W. Grosse-Kunstleve Computational Crystallography Initiative Lawrence Berkeley National Laboratory Workshop](https://reader036.vdocuments.site/reader036/viewer/2022062714/56649d045503460f949d754f/html5/thumbnails/2.jpg)
My two lives
• Live 1 (PhD project):– Zeolite structure determination from
powder data using extracted intensities
• Live 2:– Contributions to Xplor/CNS
• Single-crystal protein crystallography• About 80% of all PDB entries refined with Xplor/CNS
– Phenix project• Fresh start after losing a legal battle
![Page 3: Reusing phenix.refine for powder data? Ralf W. Grosse-Kunstleve Computational Crystallography Initiative Lawrence Berkeley National Laboratory Workshop](https://reader036.vdocuments.site/reader036/viewer/2022062714/56649d045503460f949d754f/html5/thumbnails/3.jpg)
Funding: NIH Program Project (NIGMS, PSI), Director - Paul Adams
CCI APPS
SOLVE / RESOLVE
PHASER
TEXTAL
MolProbity / REDUCE
Computational Crystallography Initiative (LBNL)-Paul Adams, Ralf Grosse-Kunstleve, Pavel Afonine-Nigel Moriarty, Nicholas Sauter, Peter Zwart
Los Alamos National Lab (LANL)-Tom Terwilliger, Li-Wei Hung
Cambridge University -Randy Read, Airlie McCoy
Texas A&M University -Tom Ioerger, Jim Sacchettini, Erik McKee
Duke University - Jane Richardson, David Richardson, Ian Davis
Phenix Collaboration
![Page 4: Reusing phenix.refine for powder data? Ralf W. Grosse-Kunstleve Computational Crystallography Initiative Lawrence Berkeley National Laboratory Workshop](https://reader036.vdocuments.site/reader036/viewer/2022062714/56649d045503460f949d754f/html5/thumbnails/4.jpg)
Spectrum of phenix components
• Automated analysis of data quality: phenix.xtriage
• Rapid substructure determination: phenix.hyss
• Phasing: Maximum likelihood – SOLVE, PHASER for SAD
• Density modification: Statistical density modification (RESOLVE)
• Automated model building:– Pattern matching methods (RESOLVE or TEXTAL)
• Structure refinement: phenix.refine (likelihood, annealing, TLS)
• Advanced automation: AutoSol – hkl to map
• Ligand building and fitting: eLBOW, AutoLigand
• Validation and Hydrogens: MolProbity + Reduce
![Page 5: Reusing phenix.refine for powder data? Ralf W. Grosse-Kunstleve Computational Crystallography Initiative Lawrence Berkeley National Laboratory Workshop](https://reader036.vdocuments.site/reader036/viewer/2022062714/56649d045503460f949d754f/html5/thumbnails/5.jpg)
phenix.refine
- Group ADP refinement
- Rigid body refinement
- Restrained refinement (xyz, iso/aniso ADP)
- Automatic water picking
- Bond density
- Unrestrained refinement
- FFT or direct summation
- Hydrogens
- Automatic NCS restraints
- Simulated Annealing
- Occupancies (individual, group)
- TLS refinement
- Twinned data
- X-ray, Neutron, joint X-ray + Neutron refinement
![Page 6: Reusing phenix.refine for powder data? Ralf W. Grosse-Kunstleve Computational Crystallography Initiative Lawrence Berkeley National Laboratory Workshop](https://reader036.vdocuments.site/reader036/viewer/2022062714/56649d045503460f949d754f/html5/thumbnails/6.jpg)
Refinement flowchart
Input data and model processing
Refinement strategy selection
Bulk-solvent, Anisotropic scaling, Twinning parameters refinement
Ordered solvent (add / remove)
Target weights calculation
Coordinate refinement(rigid body, individual)
(minimization or Simulated Annealing)
ADP refinement(TLS, group, individual iso / aniso)
Occupancy refinement (individual, group)
Output: Refined model, various maps, structure factors, complete statistics
PDB model,Any data format (CNS, Shelx, MTZ, …)
Files for COOT, O, PyMol
Repeated several times
![Page 7: Reusing phenix.refine for powder data? Ralf W. Grosse-Kunstleve Computational Crystallography Initiative Lawrence Berkeley National Laboratory Workshop](https://reader036.vdocuments.site/reader036/viewer/2022062714/56649d045503460f949d754f/html5/thumbnails/7.jpg)
Designed to be very easy to use
Refinement of individual coordinates and B-factors:
% phenix.refine model.pdb data.hkl
Same as above plus water picking:
% phenix.refine model.pdb data.hkl ordered_solvent=true
Run with parameter file:
% phenix.refine model.pdb data.hkl parameter_file
refinement.main { high_resolution = 2.0 simulated_annealing = True ordered_solvent = True number_of_macro_cycles = 5}refinement.refine.adp { tls = chain A tls = chain B}
![Page 8: Reusing phenix.refine for powder data? Ralf W. Grosse-Kunstleve Computational Crystallography Initiative Lawrence Berkeley National Laboratory Workshop](https://reader036.vdocuments.site/reader036/viewer/2022062714/56649d045503460f949d754f/html5/thumbnails/8.jpg)
How to best make ends meet?
• GSAS & proteins– Extending a small-molecule powder program to deal with
proteins– Advantage: program designed for the field
• Community used to inputs, outputs, idiosyncrasies
– Disadvantage: some approaches suitable for small molecules don’t scale
• Direct-summation structure factor calculation• Neighborhood calculations (nonbonded interactions, a.k.a.
anti-bumping restraints)
• phenix.refine– Extending a single-crystal protein program to deal with
powders– Advantage: program designed to deal with large structures
• Protein, RNA/DNA restraint libraries, optimized algorithms
– Disadvantage: new data formats, differences in terminology
![Page 9: Reusing phenix.refine for powder data? Ralf W. Grosse-Kunstleve Computational Crystallography Initiative Lawrence Berkeley National Laboratory Workshop](https://reader036.vdocuments.site/reader036/viewer/2022062714/56649d045503460f949d754f/html5/thumbnails/9.jpg)
Two main challenges
• Challenge 1:– Input/output of powder-specific format
• Fundamentally trivial but potentially tedious
• New command?– No interference with existing, non-trivial algorithms for automatic
recognition, processing, and consolidation of already very heterogeneous inputs
• Extend the existing input algorithms?– Nicer, but requires higher degree of collaboration
• Challenge 2:– Development of a powder-specific target function
• Based on extracted intensities or primary pattern + pre-fitted profile parameters?
• Maximum likelihood with or without cross-validation?
• Will probably require some refactoring of the refinement engine
![Page 10: Reusing phenix.refine for powder data? Ralf W. Grosse-Kunstleve Computational Crystallography Initiative Lawrence Berkeley National Laboratory Workshop](https://reader036.vdocuments.site/reader036/viewer/2022062714/56649d045503460f949d754f/html5/thumbnails/10.jpg)
Modular design
• Application level– phenix wizards (data in, structure out)– phenix.refine– phenix.hyss (hybrid substructure search)– Visible source
• Library level– cctbx project, organized in modules
• libtbx, scitbx, cctbx, iotbx, mmtbx
– cctbx is intended to cover small-molecule work• But nothing yet specific to powders
– Unrestricted open source
![Page 11: Reusing phenix.refine for powder data? Ralf W. Grosse-Kunstleve Computational Crystallography Initiative Lawrence Berkeley National Laboratory Workshop](https://reader036.vdocuments.site/reader036/viewer/2022062714/56649d045503460f949d754f/html5/thumbnails/11.jpg)
Existing target functions
• Least-squares (variety)• Maximum likelihood on amplitudes• Maximum likelihood with experimental
phases• Least-squares twin target• SAD-specific maximum likelihood target
implemented in Phaser– Reusing target from external application!
• Dirty laundry– Severe code duplication in implementation of twin target
• Needs to be consolidated
– Some friction integrating the Phaser ML-SAD target• Phaser target relatively slow: we need better bookkeeping to avoid repeated calculations with exactly the same
input
![Page 12: Reusing phenix.refine for powder data? Ralf W. Grosse-Kunstleve Computational Crystallography Initiative Lawrence Berkeley National Laboratory Workshop](https://reader036.vdocuments.site/reader036/viewer/2022062714/56649d045503460f949d754f/html5/thumbnails/12.jpg)
Precedence for reusing cctbx?
• cctbx used heavily by all phenix collaborators• Phaser uses cctbx -> cctbx supported by CCP4 6.0
and up• smtbx: small-molecule toolbox
– Group at Durham University, U.K. collaborating with David Watkin at Oxford University, U.K.
– Long-term goal: highly integrated single-crystal structure determination (direct methods), automatic model building and refinement
– Initial focus: iterative model building and refinement– Initial approach: reuse + adjust cctbx core libraries directly
combined with copying sub-modules to smtbx where they are modified
– Long term: consolidate duplications as much as possible• half the code = half the bugs, reuse of optimizations
![Page 13: Reusing phenix.refine for powder data? Ralf W. Grosse-Kunstleve Computational Crystallography Initiative Lawrence Berkeley National Laboratory Workshop](https://reader036.vdocuments.site/reader036/viewer/2022062714/56649d045503460f949d754f/html5/thumbnails/13.jpg)
Summary of ideas
• Implement powder-specific target function(s) that plug into the refinement engine in the open source cctbx libraries– Can be done stand-alone using ad-hoc input/output methods– Collaborate in making the necessary adjustments to the existing
libraries• Figure out the best way to handle input/output at the
application level– Learn and re-evaluate as we go
• If the powder field joins in there will be the potential for direct cross-fertilization between three specializations in crystallography– Single-crystal protein– Single-crystal small-molecule– Powder diffraction protein– More? (powder diffraction small-molecule)
• cctbx libraries are very general• Ever increasing integration is the secret behind the stunning
successes in the development of computing technology– Can we make this idea work in crystallography?
![Page 14: Reusing phenix.refine for powder data? Ralf W. Grosse-Kunstleve Computational Crystallography Initiative Lawrence Berkeley National Laboratory Workshop](https://reader036.vdocuments.site/reader036/viewer/2022062714/56649d045503460f949d754f/html5/thumbnails/14.jpg)
Availability
• Phenix incl. Graphical User Interface– http://www.phenix-online.org/
– Freely available to academic (non-profit) groups
• Core libraries (cctbx)– http://cctbx.sourceforge.net/
– Freely available to all
![Page 15: Reusing phenix.refine for powder data? Ralf W. Grosse-Kunstleve Computational Crystallography Initiative Lawrence Berkeley National Laboratory Workshop](https://reader036.vdocuments.site/reader036/viewer/2022062714/56649d045503460f949d754f/html5/thumbnails/15.jpg)
Acknowledgments
• Phenix developers
– P.D. Adams– P. Afonine– T.R. Ioerger– A.J. McCoy– E.W. McKee– N.W. Moriarty– R.J. Read– N.K. Sauter– J.N. Smith– L.C. Storoni– T.C. Terwilliger– P.H. Zwart
• Funding: – LBNL (DE-AC03-76SF00098)
– NIH/NIGMS (1P01GM063210)
– PHENIX Industrial Consortium