high data rate macromolecular crystallography at nsls-ii · fuchs, m. r. et al. “nsls-ii...
Post on 28-May-2020
6 Views
Preview:
TRANSCRIPT
High Data Rate Macromolecular Crystallography at NSLS-II
Martin Fuchs (BNL Photon Sciences, NSLS-II)Workshop on real-time data analysis challenges
2019-01-24
• FMX and AMX beamlines
• Micro introduction:Macromolecular crystallography (MX)
• Raster scanning and serial crystallography:High data rate MX
• Data challenges• Rate
• Volume
• Analysis
• Storage
Overview
2
Soft X-Ray Scattering & Spectroscopy23-ID-1: Coherent Soft X-ray Scattering (2015)23-ID-2: Soft X-ray Spectro & Polarization (2015)21-ID: Photoemission-Microscopy Facility (2016)2-ID: Soft Inelastic X-ray Scattering (2017)22-BM: Magneto, Ellips, High-P Infrared (2018) Complex Scattering10-ID: Inelastic X-ray Scattering (2015)11-ID: Coherent Hard X-ray Scattering (2015)11-BM: Complex Materials Scattering (2016)12-ID: Soft Matter Interfaces (2016) Diffraction & In Situ Scattering28-ID-1: X-ray Powder Diffraction (2015)28-ID-2: X-ray Powder Diffraction (2017)4-ID: In-Situ & Resonant X-Ray Studies (2016)27-ID: High Energy X-ray Diffraction (2020)Hard X-Ray Spectroscopy8-ID: Inner Shell Spectroscopy (2016)7-BM: Quick X-ray Absorption and Scat (2017)8-BM: Tender X-ray Absorption Spectros (2016)7-ID-1: Spectroscopy Soft and Tender (2017)7-ID-2: Spectroscopy Soft and Tender (2017)6-BM: Beamline for Mater. Measurement (2017) Imaging & Microscopy3-ID: Hard X-ray Nanoprobe (2015)5-ID: Sub-micron Resolution X-ray Spectro (2015)4-BM: X-ray Fluorescence Microscopy (2017)18-ID: Full-Field X-ray Imaging (2018) Structural Biology17-ID-1: Frontier Macromolec Cryst (2016)17-ID-2: Automated Macromolec Cryst (2016)16-ID: X-ray Scattering for Biology (2016)17-BM: X-ray Footprinting (2016)19-ID: Microdiffraction Beamline (2017)
• 19 Operating/Commissioning
• 10 Under Developmenthttp://www.bnl.gov/ps/nsls2/beamlines/map.php
NSLS-II Suite of Beamlines
4
Experiment description: Protein Crystallography
Crystallization(cubic Insulin)
Diffraction experiment
Electron density map
Structure (Insulin)
M CL FT W1 W2 W3 W4 E1 E2
R. Bingel-Erlenmeyer
Protein expression,purification
Phasing
Model building, refinement
From protein to structure
FMX and AMX
AMX
FMX
AMXFMX
5 10 15 20 25 30 keV
5 µm10 µm
1 µm
Two MX beamlineswith overlapping and complementary capabilities
50 µm
Fuchs, M. R. et al. “NSLS-II biomedical beamlines for micro-crystallography, FMX, and for highly automated crystallography, AMX: New opportunities for advanced data collection AIP Conf. Proc. SRI2015, 2016, 1741, 030006
5 – 30 keV
0.4 – 2.5 Å
3.5×1012 ph/s
1 × 1.5 µm2
1 – 20 µm
Eiger 16M
5 – 18 keV
0.7 – 2.5 Å
5×1012 ph/s
7 × 5 µm2
5 – 50 µm
Eiger 9M
Energy range
Wavelength range
Flux at focus at 12.7 keV
Focal spot min (H×V)
Focal spot range
Detector
FMXSpecifications AMXMicrofocus Automation
MX Beamlines – Beam Size vs Flux Density
▪ Low synchrotron emittance translates into bright beamlines, high dose rate
▪ FMX current specs: At 12.7 keV full beam, the time to Garman limit is 30 ms!
Planned upgrades
Spot size ~ flux
NSLS-II Horizontal emittance: < 1 nm rad (Wang et al., 2016)
smal
ler
Multilayer upgrade for 100× higher flux
7
▪ 1% bandpass Horizontal-bounce Double Multilayer Monochromator for 100× flux increase, alternative to current Si111 HDCM
▪ Why? - Scanning & jet serial crystallography are still flux limited
▪ Integrate HDMM upstream of HDCM (small modifications to HDCM tank)
➢ Scanning & jet serial crystallography at max detector & sample delivery speeds
➢ The increased flux will support measurements in the µs timescale (now ms)
FMX Experimental Station
Main goniometer
Sample microscope
Sample mounting robot
Sample position
Secondary goniometer
Eiger 16Mdetector
Detector Systems – single photon counting
Eiger 16M Detector at FMXHybrid pixel array detector
▪ Pixel size 75 μm
▪ 18 M pixel
▪ 311 × 328 mm2 area
▪ Frame rate
o 133 Hz (16M px)
o 750 Hz (4M px ROI)
▪ Continuous readout3 µs dead time
Eiger16M32 modules18 M pixels
133 Hz
Eiger 4M ROI8 modules
4.5 M pixels750 Hz
Eiger 9M at AMX▪ Frame rate
o 238 Hz (9M px)
o 750 Hz (4M px ROI)
Data reduction & ligand binding pipelines at AMX/FMX
Software Tuning
• New, fast, more reliable screeningmode for DOZOR spot finder
• aimless and pointless in fast_dp splitoff to designated workstation
• Hooks provided to allow XDS to useeiger2cbf instead of dectris-neggia
• New fast and faster modes providedfor dimple
Scalable Storage and Processing Nodes: sustaining growth
Rapid Data Analysis and Reduction on fast buffer (200 cores / BL).
Data are then written on disks (GPFS flexible policy).
Spot finders: DOZOR, DIALS spotfinder.Data reduction: XDS, dials, fast_dp,
fast_dp_NSLS-IIPipelines: fast_ep, fast_ep_NSLS-II, dimple,
dimple_NSLS-IIWe are also investigating dynamics using MX.
NSLS-II Computing Facility (May 2018)
x40 Gb/s
21 Compute Nodes
~ 750 cores
GPFS Storage / SSDBuffer
860 TB 180 HDs
20 TB/AMX20 TB/FMX
AMX / FMX
EIGER 9/16M(750 Hz 4M ROI)
Fast buffer (NFS)1day
Jakoncic, Bernstein et al., in preparation
Fast & automated to accelerate rate of drug discovery
Fast rastering• From 5 ms to 50 ms per frame • Up to 1 row / sec (accel./decel. time) • Dedicated nodes (DIALS find spot server/client | DOZOR)• Distributed fast_dp for data reduction …
• Origin: Storage ring room temperature MX
• Renaissance: FEL diffraction before destruction
• Storage ring sample delivery• Jets, fixed targets, membranes, conveyor belts, …
• RT vs cryo-cooling, Rotation vs stills
• Acronyms:
Serial crystallography
Yamamoto 2017 IUCrJ
SFX, SMX, IMISX, SSX, MSS, …
Max. scanning Fq. –Piezo
> 100 Hz
Max speed – Piezo > 15 mm /s
Max speed – Coarse 0.8 mm /s
Resolution – Piezo 10 nm
Repeatability – Piezo 25 nm
Travel range – Piezo (200 µm)3
Travel range – Coarse 4 x 4 mm2
Mirror assembly mounted at sample location for metrology test
Cylinder for runout measurement
Laser interferometer from
• Scan frequency~100× higher than standard goniometer
• 2×higher precision
Customized ultrahigh-speed piezo scanner
Yuan Gao and Weihe Xu in Nanometrology lab
2 µm step raster scan with 2.5 ms exposure (5 ~ 10 µm Proteinase K crystals)
Raster crystal location
Videos of 2 Hz and 20 Hz raster
over a 180x180 µm2 area (taken
with a calibration tungsten
needle)
2 Hz
20 Hz
1 µm step raster scans with 1.3 to 5 ms exposure (~5 µm Proteinase K crystals)
Raster data collection
Raster for location or for collectionHigh speed rastering
30 rasters for 1 dataset: Shutter-open time 18s!
Hierarchical cluster-analysis
Data processing
Proteinase K structure refined to 2.0 Å resolutionRwork=16.7% and Rfree=21.3% (500Hz dataset)
XDS: Unit Cell Orientation
XDS: Unit Cell Size
Automated clustering using Machine Learning algorithm
• ~200 partial datasets for structure solution
• Equally high data quality for detector frame rates from 200 – 750 Hz
Jet Serial Crystallography
Partner User: Spence Lab (ASU)▪ First successful run in 2018-01
▪ 500 Hz data collection from Proteinase K microcrystals
▪ Flow speed = 300 um/s: Experiment flux limited!
➢Optimal sample delivery for LCP crystallization
➢New jet media and temperature control
➢ N. Zatsepin, U. Weierstall, J. Martin-Garcia, E. Chun, S. Zaare, H. Hu, J. Spence
Jet Serial Crystallography
Partner User: Spence Lab (ASU)▪ First successful run in 2018-01
▪ 500 Hz data collection from Proteinase K microcrystals
▪ Flow speed = 300 um/s: Experiment flux limited!
➢Optimal sample delivery for LCP crystallization
➢New jet media and temperature control
➢ N. Zatsepin, U. Weierstall, J. Martin-Garcia, E. Chun, S. Zaare, H. Hu, J. Spence
Analysis challenges shared for Scanning and Jet SMX
• Large data volumes: The Dectris Eiger 16M detector operates either in full frame mode with frame rates up to 133 Hz, or in the 4M mode up to the frame rate the samples’ diffraction power allows
• Signal to noise: For the smallest crystals, the S/N will be extremely low. Under a certain threshold, indexing will be impossible and the data unusable for merging
Average Experiment Duration
Both methods will continue to collect data, until the completeness goal under given boundary conditions like resolution are met.
Serial crystallography data analysis challenges
18
Average data acquisition rate• 16M frames with sustained rate of 70 Hz: 120 MB/s – 2.1 GB/s
• Max file size 30 MB (dep on compression development, still improving, 20 MB realistic)
• Min file size 1.5 MB (dep on signal)
• 16M frames with burst rate of 133 Hz for max 30 s• Data volume scales with frame rate
• 4M frames with sustained rate of ~400 Hz: 250 MB/s – 3.2 GB/s• Max file size 8 MB
(dep on compression development, still improving, 5 MB realistic)• Min file size 0.5 – 1 MB (dep on signal)
• 4M frames with burst rate of 750 Hz for max 30 s• Data volume scales with frame rate
Max data rates• Current: 5 GB/s
• Future: 5k frame/s detector, 100x flux increase – 50 GB/s
Data Rate
19
Individual event
• One image, streamed at up to 750 images/s
• Can be processed independently
Total data volume / day
• ~100 mounts of 300 s data collection per dayEach mount = ~100 GB : 10 TB
• Assume small frame size of ~300 MB due to low signal levels
• Wide variation of volumes and throughput possible
• Current, standard MX: 10 TB
• Future throughput, serial: 100 TB (1 PB with new detector?)
Total data volume
20
Time Limitations for Analysis Results
• Processing should provide information about data quality (resolution and completeness) in less than in few minutes to enable optimized strategies for a limited number of mounts.
Current Analysis Pipeline Requirements
• Software development strongly in flux
• Spot finding: Discard empty frames
• Indexing: Statistics on crystal morphology, cluster analysis
• Integration, scaling merging• Rotation data collection: XDS, DIALS• Still frame data collection: CrystFEL, DIALS
• Structure solution pipelines follow standard MX procedures
High Data Rate Analysis Pipeline
21
• In serial crystallography, a large fraction of the images can be empty: 10-99% without data
• There are methods to extract data from sparse data frames, i.e. simple spot finding will not suffice in some cases to reject frames.Lan et al. ”Solving protein structure from sparse serial microcrystal diffraction data at a storage-ring synchrotron source” IUCrJ, 2018, Vol. 5(5), pp. 548-558. DOI: 10.1107/S205225251800903X
• Short term storage: Fast storage required for parallel data processing
• Long term storage: Main concern is storage space and data transfer to users.
Storage, retention considerations
22
• Diffraction simulations using MLFSOM and fastBragg (J. Holton, 2015)
• Background vs 1 µm size crystal
• All files are 16 Mpx, or 64 MB in 32 bit and 16 MB in 16 bit and 16 MB in byte-offset
Better compression: Background subtraction?
bzip2 bslz4 NibOff
No water 0.009 0.4 4
With 20 um water 11.4 13.7 12.4
Background (diff) 11.4 13.7 12.4
Background average 2 5.9 4
<BG> subtracted ? 9.7 16.9 15.7
BG subtracted ? 7.7 10 11.7
No waterWith BG BG only Avg BG subtracted
Image size
➢ J. Jakoncic, H. Bernstein (2016)
CatB at FMX
• Storage ring experiments can reach• Data rates up to 50 GB/s
• Data volumes – up to 100 TB/day
• The MX experiment is not event triggered
• Data culling currently happens in post processing
• Real time challenge• Determine data completeness and quality
• Real time means seconds and minutes
• Spot finding• Speed-up is challenging
• Bottleneck for culling
Summary
24
top related