snapshot of daq challenges for diamond martin walsh

17
Snapshot of DAQ challenges for Diamond Martin Walsh

Upload: garey-moore

Post on 18-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Snapshot of DAQ challenges for Diamond Martin Walsh

Snapshot of DAQ challenges for Diamond

Martin Walsh

Page 2: Snapshot of DAQ challenges for Diamond Martin Walsh

Your role in all this...- SAC is the advisory body for Diamond: how do we

make the most of your collective knowledge, experience and wisdom ?

- Providing the information you need

- Organisation and content of meetings

- The most effective forum for discussion

- Efficient transmission of your advice to us

- Informing you how Diamond has acted on your advice – and what the result has been

Harwell Campus

ISISCLF

RAL Space

Mary Lyon centre mouse functional genomics

 International Space Innovation Centre (ISIC) 

MRC Harwell

RALRutherford Appleton Laboratory

Public Health England

The European Centre for Space Applications and Telecommunications (ECSAT) 

Research Complex

Page 3: Snapshot of DAQ challenges for Diamond Martin Walsh

Beamlines by Village

Macromolecular Crystallography

Soft Condensed Matter

Spectroscopy

Materials

Engineering and Environment

Surfaces and Interfaces

eBIC

Page 4: Snapshot of DAQ challenges for Diamond Martin Walsh

< 100GB/day< 1TB/day> 1TB/day

Per Beamline Data Rates

Tomography Beamlines have collected nearly 2PB of data, more than the rest of Diamond beamlines put together.

New Arrival EM2 Microscopes @ 5TB/day XFEL from

2017

Page 5: Snapshot of DAQ challenges for Diamond Martin Walsh

Some Numbers for 2014-15

• Total number of user proposals: 642, delivered shifts 7964 (1 shift =8 hours)

• Total number of users 7,696 – (4,988 on site + 2708 remote)– MX remote use now exceeds 50% of use

• Total number of Unique PhD’s 857• Total Journal papers published 3,883 (677

published in 2014)

Page 6: Snapshot of DAQ challenges for Diamond Martin Walsh

Increasing resolution

CryoET

Single particle cryo EM

X-ray crystallography

B21 X-ray Solution Scattering

Cryo-electron tomography

One major player – integrated Structural biology Increasing biological complexity and integrity

Fluorescence microscopy (B24/CLF)

B24 X-ray microscopy Cellular cryo-electron tomography

B22 - Infraredmicrospectroscopy

Page 7: Snapshot of DAQ challenges for Diamond Martin Walsh

Cell/tissue

Solution

CrystallineElectron

Microscopy

XFEL

Life Science & DLS• B22 Infrared• B24 Cryo X-ray microscopy• I18, I20, B18, I14 X-ray

spectroscopy

• I22/B21 SAXS• B23 CD• Spectroscopy

MX village• ( I02, I03, I04)• (I24, I04-1)• (I23, VMX)

National facility for EM in life & Physical Sciences

UK Hub for XFEL sample and software developments

• I08 X-ray STXM• I13 X-ray tomography

& coherent diffraction

Page 8: Snapshot of DAQ challenges for Diamond Martin Walsh

Cell BiologyOPPF-UK

MPLRC@H

Diamond Beamlines:Macromolecular Crystallography, Scattering, X-ray

spectroscopyISIS beamlines:

SANSNeutron Reflection (NR)

Computational environment /

CCP4, CCP-EM

HPC

Synchrotron Imaging

UK XFEL Hub@Diamond

Cryo-EM/ETElectron Bio-

Imaging Centre (eBIC)

An integrated Approach to Structural Biology

Fluorescence microscopy

(CLF(STFC & DLS)

Page 9: Snapshot of DAQ challenges for Diamond Martin Walsh

Diamond Data Rates/Volumes History

• Early 2007:– Diamond first user.– No detector faster than ~10 MB/sec.

• Early 2009:– first Lustre system (DDN S2A9900)– first Pilatus 6M system @ 60 MB/s.

• Early 2011:– second Lustre system (DDN SFA10K)– first 25Hz Pilatus 6M system @150 MB/s.

• Early 2013:– first GPFS system (DDN SFA12K)– First 100 Hz Pilatus 6M system @ 600 MB/sec– ~10 beamlines with 10 GbE detectors (mainly Pilatus and PCO Edge).

• Early 2016:– delivery of Eiger 16M for MX (initially at 6.75 GB/s, potential 13.5 GB/s)

2007 2009 2011 2013 201510

100

1000

10000

Peak Detector Per-formance (MB/s)

Doubling time = 7.5

months

Page 10: Snapshot of DAQ challenges for Diamond Martin Walsh

Challenges

• Hardware life cycles are fast, and hardware problems can be solved with sufficient money.– So detector data rates are not the problem.

• Software life cycles are slow – our analysis routines have a clear lineage often dating back 40 years.– Software is a problem

• Synchrotron have to support a diverse range of techniques.– Systems and skills developed for one beamline are not appropriate for all

beamlines.– Need to be able to attract talented software scientists AND software

engineers• Remote access to large scale facilities such as synchrotrons, XFELs,

national facilities (e.g. Electron microscopy, HPC) – Need for dedicated light paths between these facilities to deal with data

volumes generated

Page 11: Snapshot of DAQ challenges for Diamond Martin Walsh

Use CASE:Structural Biology

Numbers:– Raw Data Macromolecular crystallography

• Currently 0.5 - 1 TB/day/beamline @ DIAMOND ( 3-6TB/day)• 2016 – detector technology will enable easily X10 increase in data. Upgrades to beamlines will enable

better exploitation (hardware can currently produce >100TB/day if samples available. • Near future expect to produce 5PB MX data/year –this is at DIAMOND ALONE!• Including SR MX beamlines over Europe expect to reach/exceed (25PB/year) • European XFEL – SFX beamline at full operations potential to generate 300 TB/day

– Raw Data Electron microscopy

• Electron Microscopy currently at 1TB/day/microscope – high resolution experiments to start in November ‘16 which will produce >5TB/day/microscope

• High resolution EM work from Jan 2016 will generate >10TB/day of data

– Data reduction and analysis

• Requirement for Light paths to be established between large scale and national facilities – SR, CryoEM, Data Centers etc

• UK will have dedicated light path from European XFEL SFX beamline to DIAMOND – plans needed to extend ...

• Currently large investment in software for data analysis is required to exploit developments in parallelized systems /new HPC storage

Page 12: Snapshot of DAQ challenges for Diamond Martin Walsh

The future

• A lot of software will need to be redeveloped:– Incorporate modern paradigms like map-reduce.– May include middle layers processing that runs close to

distributed data chunks.– Intermediate data will be cached between processing

steps.• Synchrotrons/Structural biology infrastructure will

become turnkey sites.– Users may not come to site/facility– Results will be in the form of processed, not raw data.– There must be trust between the site and the user,

backed up by data provenance and full metadata.– High speed light links between centers required.

Page 13: Snapshot of DAQ challenges for Diamond Martin Walsh

Overview

- First Impressions

- Science highlights

- Technical developments

- Industrial engagement

- Plans for the future

- Finance

Thanks for your attention

Page 14: Snapshot of DAQ challenges for Diamond Martin Walsh

Example data access rates

Example of 12GB/s…typically at 3-4GB/s

Page 15: Snapshot of DAQ challenges for Diamond Martin Walsh
Page 16: Snapshot of DAQ challenges for Diamond Martin Walsh

Tomography rates

Page 17: Snapshot of DAQ challenges for Diamond Martin Walsh

MX/EM data storage 2015