it for photon science...it for analytical facilities photon science is a competitive environment...
TRANSCRIPT
Page 2 IT for Photon Science | Rudolf Dimper | Diamond 15-November-2016
IT for Photon Science
In the frame of the
European Road Map for Accelerator
Based X-Ray Sources
LEAPS
Rudolf Dimper – [email protected]
IT for analytical facilities
Photon Science is a competitive environment
Large, mobile, and dynamic user community
IT as a common denominator for PaNs; working together as a
community since 2007
Initially collaboration driven by funding opportunities but also technical
needs
Now collaboration driven by technical needs but also funding
opportunities
Page 3 IT for Photon Science | Rudolf Dimper | Diamond 15-November-2016
RIs
Technical
Needs
Brussels'
Expectations
IT for
LEAPS
My Brussels Meter
PaNdata User Survey (1/4)
Counting Users of 17 RIs via the User Offices:
http://pan-data.eu/Users2014-Results
August 2012 to July 2014 - 2 years
10 Photon Sources – ALBA, PETRA-III+FLASH, DLS, ELETTRA, ESRF, SLS,
SOLEIL, BESSY-II, LCLS
7 Neutron Sources – FRM-II, ILL LLB, ISIS, SINQ, BER-II, SNS
41 665 unique users from 95 countries (USA – 4 840)
Page 4 IT for Photon Science | Rudolf Dimper | Diamond 15-November-2016
(Courtesy: F. Schlünzen, DESY)
PaNdata User Survey (2/4)
Page 5 IT for Photon Science | Rudolf Dimper | Diamond 15-November-2016
Number of return visits to the Facilities
1/3 are 1st time users!
(Courtesy: F. Schlünzen, DESY)
PaNdata User Survey (3/4)
Page 6 IT for Photon Science | Rudolf Dimper | Diamond 15-November-2016
(Courtesy: F. Schlünzen, DESY)
PaNdata User Survey (4/4)
Chord chart on common users:
• Length of the facility arc
represents the number of users of
that facility
• Thickness of the arc between
facilities relates to the number of
common users
Conclusion
Users of Analytical Facilities are
young and mobile
(Courtesy: F. Schlünzen, DESY)
Page 7 IT for Photon Science | Rudolf Dimper | Diamond 15-November-2016
PaN RIs working together on IT since 2006
FP7+H2020 project/proposal
VEDAC
PaNData-Europe
PaNData-ODI
IRVUX
CRISP
EUCALL
CALIPSO+
NFFA
PaNDaaS
Page 8 IT for Photon Science | Rudolf Dimper | Diamond 15-November-2016
Ouput
User Statistics
Data Policy
Software Catalogue
ICAT
NeXus/HDF5 consensus
Umbrella FIM
...
IT – for Photon Science
Page 9 IT for Photon Science | Rudolf Dimper | Diamond 15-November-2016
1. Share Knowledge and Best Practices (1/3)
IT evolves quickly and much faster than our staff turns over
New techniques and practices must be mastered – programming, system
administration, network
Learning from each other is not an option, it’s a MUST
Example:
EIROforum IT-TWG – meets twice per year – 4-6h knowledge exchange
October meeting: IT security, Disaster Recover & Business Continuity, EU
General Data Protection Regulation (GDPR)
Page 10 IT for Photon Science | Rudolf Dimper | Diamond 15-November-2016
1. Share Knowledge and Best Practices (2/3)
Technologies the ESRF IT-Sysadmins have to master:
Page 11 IT for Photon Science | Rudolf Dimper | Diamond 15-November-2016
Fiber optic, Ethernet,
Infiniband, Gigabit,
TCP/IP, Internet
Renater, Tigre
Extreme Networks,
Firewall, TIS FWTK, DMZ, MTA
Checkpoint, EPNshaper
WiFi, Radius
WCS, Prime, Cisco
Nagios, Icinga, Observium, OMD
MRTG/RRD, Cacti
NAC, SSL
IPM, Efficient IP,
DNS, DHCP,
NIC France, ARIN
VPN, SSL
Microsens, APC
Windows, Office
Groupshare
Active Directory
GPO
Dell
K1000
SEP
Thunderbird
Visio
Teamviewer
Ghost
Ricoh, HP
Scopia
Skype
Jira
LTO, TiNa, Atempo
Mysql, MariaDB
Apache, Wiki
Squid, Squidguard
Sendmail, Brightmail, ClamAV
Sympa, IMAP, Thunderbird
Roundcube, Dovecot, Samba, CIFS
syslog-ng
KVM
GPFS, NFS, NAS, NetApp, DDN
Openstack, Docker, GlusterFS
Oracle/StorageTek
SSH, NX, Nomachine
OpenLDAP
CentOS, RedHat, Debian
OAR, Dell blades
eGroupWare, EPL, Stylite
DHCP, PXE
DNS, NIS, CAS
1. Share Knowledge and Best Practices (3/3)
Proposal #1
Organise regular thematic “LEAPS IT-workshops”:
Cloud technology
Virtualisation techniques
IT-security
Disaster recovery and Business Continuity
User Portals
FIM
Data Policy implementation
Resource allocation mechanisms
Batch schedulers
…
Create a “LEAPS IT-Leadership forum”
Page 12 IT for Photon Science | Rudolf Dimper | Diamond 15-November-2016
RI -My Brussels Meter- EU
2. Implement an Open Data Policy (1/6)
Page 13 IT for Photon Science | Rudolf Dimper | Diamond 15-November-2016
2. Implement an Open Data Policy (2/6)
ESRF is custodian of data and metadata
ESRF to collect high quality metadata to facilitate the use of data
ESRF will keep metadata forever
ESRF will keep raw (or reduced) data for 10 years
Data will be registered in a data catalogue
Data will be published with DOIs
The experimental team has exclusive access to data during the
embargo period (3 years which can be extended on request)
Data will be made public after the embargo period under license
CC-BY 4.0
The data policy will be implemented on all beamlines by 2020
…but the implementation is challenging:
Page 14 IT for Photon Science | Rudolf Dimper | Diamond 15-November-2016
2. Implement an Open Data Policy (3/6)
• 2011: PaNData Europe FP7 work package 2 delivered a
generic data policy
• 2011: Proposed the ESRF management to adopt it but failed to
convince
• 2011 to 2015: rise of Open Science→ Data Management Plans
→ Open Data
• 2015: ESRF Data Policy become necessary + feasible, adopted
by Council 1/12/2015
• 2020: ESRF Data Policy implemented on all beamlines
« be patient, do not give up, now is the time!
Page 15 IT for Photon Science | Rudolf Dimper | Diamond 15-November-2016
2. Implement an Open Data Policy (4/6)
Page 16 IT for Photon Science| Rudolf Dimper | Diamond 15-November-2016
The landscape is changing!
Data Management will be
mandatory!
2. Implement an Open Data Policy (5/6)
Herculean tasks to implement a data policy:
1. Get a data policy approved
2. Define and implement meaningful metadata
3. Chose and implement a metadata catalogue
4. Define and implement a data format
5. Select and implement an electronic log-book
6. Make data findable
7. Archive data
8. Implement persistent Identifiers for all users
9. Define and implement the tools to access the data
Page 17 IT for Photon Science| Rudolf Dimper | Diamond 15-November-2016
2. Implement an Open Data Policy (6/6)
Page 18 IT for Photon Science| Rudolf Dimper | Diamond 15-November-2016
Proposal #2
Common data policy implementation roadmap
Metadata ontology
Cross-facility searchable metadata catalogue
Photon Science e-logbook
Develop common tools to interact with archived data
Link to public/commercial clouds (EOSC, AWS, …)
RI -My Brussels Meter- EU
Make data Findable, Accessible, Interoperable and
Reusable (FAIR)
http://ec.europa.eu/research/openscience
3. Harness the data (1/7)
Page 19 IT for Photon Science | Rudolf Dimper | Diamond 15-November-2016
The ESRF beamlines produce currently 4 PB/year
The data rate will increase until the long shutdown in 2019
In 2021 (first full year of operation with the new storage ring) we expect at
least 10-15 PB/year
The number of files per experiment is dramatically increasing
The data rate drives the ESRF IT infrastructure
3. Harness the data (2/7)
Limit for long-term archiving:
1000 files / 8hours / experiment
~20 000 files / experiment / week HDF5 on all beamlines
Page 20 IT for Photon Science | Rudolf Dimper | Diamond 15-November-2016
3. Harness the data (3/7) – X-ray difficration tomography
Page 21 IT for Photon Science| Rudolf Dimper | Diamond 15-November-2016
3. Harness the data (4/7) – X-ray difficration tomography
Page 22 IT for Photon Science| Rudolf Dimper | Diamond 15-November-2016
Data rage = 250Hz
Experiment duration = 1 week
Sample volume = 10003
Data volume = 10PB
To have reduced data on-line
= <10TB
Use PyFAI accelerated on
GPU to reduce data in real-
time by a factor of 1000
3. Harness the data (5/7)
Data volume is growing each year
Next generation experiments can produce PBs/week
Multiple data bottlenecks:
1. Acquiring data from detectors to storage (Gigabytes/s)
2. → Reducing data online fast enough (Gigabytes/min)
3. → Analysing data fast enough (Terabytes/day)
4. → Archiving data efficiently (Tera to Petabytes/day)
5. Getting results to users efficiently (Giga to Terabytes)
6. Helping Scientists analysing their data on-site (and off-site) (DaaS)
Page 23 IT for Photon Science| Rudolf Dimper | Diamond 15-November-2016
3. Harness the data (6/7) – The PaNDaaS proposal
Match data processing capabilities with advancements in detectors and
sources
Join forces to do this collaboratively in Europe
Share best practices and solutions
Share the workload
Create a homogeneous and compatible data reduction/analysis
environment for our Users Community
Make the environment sustainable
PaNDaaS H2020-Infradev-4 proposal (not approved)
“Big Data” private cloud infrastructure
Cloud federation
21 Science use cases
3.5 years
21 partners (5 ESFRI projects)
1553 PMs effort
13.6 M€ (for human resources only)
Page 24 IT for Photon Science | Rudolf Dimper | Diamond 15-November-2016
3. Harness the data (7/7) On-line data reduction
Page 25 IT for Photon Science| Rudolf Dimper | Diamond 15-November-2016
Proposal #3
Work on common on-line data reduction algorithms
Data streaming, data format, compression algorithms
Work hand-in-hand with main detector manufacturers
Integrate algorithms into the lab specific framework
RI -My Brussels Meter- EU
4. Data reduction/analysis (1/3)
Kmap – strain scanning
Experiment = 10 minutes
Sample size = 100x100 mm
Resolution = 100 nm
Data reduction = 10 hrs
Data analysis = bottleneck for
users i.e. needs specialist
Solution: hire data scientist for
18 months to rewrite scientists
application
Page 26 IT for Photon Science| Rudolf Dimper | Diamond 15-November-2016
4. Data reduction/analysis (2/3)
Financing = IRT-Nanoelec
File format = HDF5
Optimise = legacy code
Execution = 2 minutes
Speedup = 300 times!
Next step = make available
remotely as a service
Long term = new program is
based on SILX and therefore
easier to maintain
Conclusion: hire more data
scientists to rewrite scientific
application
Page 27 IT for Photon Science| Rudolf Dimper | Diamond 15-November-2016
Scientist:
“Amazing - I have never
seen my data like this!”
4. Data reduction/analysis
Page 28 IT for Photon Science| Rudolf Dimper | Diamond 15-November-2016
Proposal #4
Share data analysis code developments
Hire data scientists
Each lab to champion the development/maintenance of a
small set of analysis programs
Make sure all programs are remote access compatible
Make sure all programs are documented
Organise code camps, eg. on parallel programming
Set-up tutoring services for data analysis
RI -My Brussels Meter- EU
5. Archiving raw data
Page 29 IT for Photon Science| Rudolf Dimper | Diamond 15-November-2016
Proposal #5
Develop a sustainable business model for an open data
archive
Transfer data sets to the public domain
Make data sets accessible (similar to Wikimedia Commons)
RI -My Brussels Meter- EU
6. Create a common user environment
Page 30 IT for Photon Science| Rudolf Dimper | Diamond 15-November-2016
Proposal #6
Develop a common user environment
Single entry point proposal system
Proposal round calendar
Reviewer’s hub
Coaching for proposal submission
Federated identity system – persistent user identifiers
An EU version of lightsources.org?
RI -My Brussels Meter- EU
7. Get ready for the Cloud (1/8)
Cloud computing is here to stay $70 billion in 2015, $140 billion in
2019
Private investment in cloud infrastructure will drive the IT market. The
big players are all American: AWS, Microsoft, Google
Why?
Cost for Cloud Services will rival with in-house computing
Studies show that Cloud Services are still more expensive, but cost is
decreasing
Unusual business model for RIs – Paying for IT not well established nor
accepted
Many technological issues to be addressed
Page 31 IT for Photon Science | Rudolf Dimper | Diamond 15-November-2016
7. Get ready for the Cloud (1/8)
IT for Photon Science| Rudolf Dimper | Diamond 15-November-2016Page 32
Amazon alone has spent $12.5 billion in 2015 on cloud infrastructure (IDC)
7. Get ready for the Cloud (2/8)
The Google Cloud data centres
Page 33 IT for Photon Science| Rudolf Dimper | Diamond 15-November-2016
7. Get ready for the Cloud (3/8)
The Microsoft Azure Cloud
Page 34 IT for Photon Science| Rudolf Dimper | Diamond 15-November-2016
7. Get ready for the Cloud (4/8)
Page 35 IT for Photon Science| Rudolf Dimper | Diamond 15-November-2016
Price of one core-year on
Commercial Clouds
Cost Comparison done by FermilabAverage cost per core-hour:
In-house resource: 0.9 cts (assumes 100% utilisation)
Off-site at AWS: 1.4 – 3.0 cts in function of scale of procurement
Cloud costs larger but approaching equivalence
7. Get ready for the Cloud (5/8)
Performance results of tests at ESRF:
Page 36 IT for Photon Science| Rudolf Dimper | Diamond 15-November-2016
Software Intel Cpu ESRF clock, #cores
Intel Cpu AWS clock, #cores
elapsed time ESRF
elapsed time AWS
time ratio ESRF / AWS
theory speed AWS / ESRF
cost AWS
FDMNES E5-2680 2.7 GHz, 16
E5-2666 v3 2.9 GHz, 8
17 h 14 min (1034 min)
24 h 57 min (1497 min)
0.69 0.70 (0.5 * 1.07 * 1.3)
26.73 $ (25.0 * 1.069)
FDMNES as above E5-2666 v3 2.9 GHz, 10
as above 20 h 17 min (1217 min)
0.85 0.87
(0.625 * 1.07 * 1.3)
43.40 $
(40.6 * 1.069)
Geant4 E5-2680 2.7 GHz, 16
E6-2670 v2 2.5 GHz, 8
2 h 05 min (125 min)
3 h 42 min (222 min)
0.56 0.60 (0.5 * 0.93 * 1.3)
5.92 $ (3.7 * 1.60)
Geant4 as above E6-2670 v2 2.5 GHz, 16
as above 1h 54 min (114 min)
1.10 1.21
(1.0 * 0.93 * 1.3)
6.08 $
(1.9 * 3.201)
Quanty E5-2680 2.7 GHz, 4
E5-2666 v3 2.9 GHz, 2
5 h 40 min (340 min)
7 h 38 min (458 min)
0.74 0.70 (0.5 * 1.07 * 1.3)
2.03 $ (7.6 * 0.267)
Quanty E5-2680 2.7 GHz, 8
E5-2666 v3 2.9 GHz, 4
4 h 33 min (273 min)
5 h 32 min (332 min)
0.82 0.70
(as above)
2.94 $
(5.5 * 0.534)
Quanty E5-2680 2.7 GHz, 16
E5-2666 v3 2.9 GHz, 8
5 h 09 min (309 min)
5 h 33 min (333 min)
0.93 0.70 (as above)
5.99 $ (5.6 * 1.069)
Quanty as above E5-2666 v3 2.9 GHz, 18
as above 6 h 26 min (386 min)
0.80 1.60 (1.125 * 1.07 * 1.3)
13.68 $ (6.4 * 2.138)
Remarks: 1) the ESRF computer have 1 thread / core, the AWS comuters have 2 threads / core (i.e. hyperthreading) 2) theory speed: cores(AWS / ESRF) * clockrate(AWS / ESRF) * (estimated speedup hyperthreading) 3) the second FDMNES run on AWS used 10 cores of 2 nodes with 8 cores each 4) the nodes used in all Quanty runs at the ESRF have 16 cores, but only part of them were used 5) AWS prices are "on demand prices" for the region "EU Frankfurt", excluding taxes
7. Get ready for the Cloud (6/8)
Amazon provides HPC resources → why not using them?
Check out https://aws.amazon.com/hpc/
AWS cfncluster script allows us to create an HPC cluster in 5 to 10 minutes with pre-
installed fast interconnect, MPI, batch scheduler, ...
Conclusions from AWS cloud tests:
Cloud is adapted for HPC applications
AWS HPC offer is easy to use and adapted to our needs
Could be used to absorb peak demands for HPC
Issues still to solve:
How to optimise best price
How to manage access + budget / user
How to transfer large volumes of data to cloud
How to make it even easier for our non-cloud savvy users, e.g. portal
How to guarantee IT security in a cloud environment
Hype about cloud is not just hype …
Viewpoint in CERN Courier “End of steam age of computing” by Eckhard Elsen
CERN director for research and computing,
http://cerncourier.com/cws/article/cern/65819
Page 37 IT for Photon Science| Rudolf Dimper | Diamond 15-November-2016
7. Get ready for the Cloud (7/8) – data transport
Page 38 IT for Photon Science| Rudolf Dimper | Diamond 15-November-2016
7. Get ready for the cloud (8/8)
Page 39 IT for Photon Science| Rudolf Dimper | Diamond 15-November-2016
Proposal #7
Develop a common cloud strategy for hybrid clouds
Share knowledge and best practices
Setup private cloud instances in all participating RIs
Test cloud federations between RIs and between RIs and
public cloud(s)
Port data analysis code to the cloud
Develop cloud business model(s) compatible with RI practices
including cost, resource allocation, IT-security
RI -My Brussels Meter- EU
8. Be Open
Page 40 IT for Photon Science| Rudolf Dimper | Diamond 15-November-2016
Proposal #8
Participate in key EU activities
To allow cross-disciplinary activities
Stay tuned with RDA
Closely follow up activities of the EOSC
Make the Photon Science Community visible in IT (and
beyond)
RI -My Brussels Meter- EU
9. What is the EOSC?
The European Open Science Cloud
• A trusted, open environment for storing, sharing and re-using scientific
data and results,
• A European Data Infrastructure: a world-class digital infrastructure to
securely access, move, share and process data in Europe
It is claimed that many of the services needed for the EOSC already exists,
but that technical and policy barriers remain to provide the eight element
for success:
• Open in design, participation and use
• Publicly funded & governed
• Research centric
• Comprehensive in terms of universality and inclusiveness of all disciplines
• Diverse & distributed empowering network effects
• Interoperable with common standards
• Service oriented
• Connecting diverse communities
Page 41 IT for Photon Science| Rudolf Dimper | Diamond 15-November-2016
Thank you for your attention!
IT for Photon Science | Rudolf Dimper | Diamond 15-November-2016Page 42
Questions?