distributed computing r&d -...

26
Fabio HERNANDEZ on behalf of IHEP: CHEN Gang, LI Weidong, QI Fazhi, WANG Lu, ZHANG Xiaomei, CHEN Yaodong, YAN Tian, SHI Jingyan, DU Ran, ZENG Shan, WANG Cong CC-IN2P3: Ghita RAHAL, Vanessa HAMAR, Laurent CAILLAT-VALLET, Frédéric SUTER, Nicolas FOURNIALS CPPM: Andreï TSAREGORODTSEV 10th FCPPL Workshop — Beijing, March 30th, 2017 Distributed Computing R&D progress report

Upload: duongcong

Post on 08-Mar-2018

221 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Distributed Computing R&D - IHEPindico.ihep.ac.cn/event/6651/session/10/contribution/31/material/... · Fabio HERNANDEZ on behalf of IHEP: CHEN Gang, LI Weidong, QI Fazhi, WANG Lu,

Fabio HERNANDEZ on behalf of

IHEP: CHEN Gang, LI Weidong, QI Fazhi, WANG Lu, ZHANG Xiaomei, CHEN Yaodong, YAN Tian, SHI Jingyan, DU Ran, ZENG Shan, WANG Cong CC-IN2P3: Ghita RAHAL, Vanessa HAMAR, Laurent CAILLAT-VALLET, Frédéric SUTER, Nicolas FOURNIALS

CPPM: Andreï TSAREGORODTSEV

10thFCPPLWorkshop—Beijing,March30th,2017

DistributedComputingR&Dprogressreport

Page 2: Distributed Computing R&D - IHEPindico.ihep.ac.cn/event/6651/session/10/contribution/31/material/... · Fabio HERNANDEZ on behalf of IHEP: CHEN Gang, LI Weidong, QI Fazhi, WANG Lu,

[email protected] CNRS/IN2P3compuEngcentre

◦ Guiding principle to explore technologies of potential interest for the data processing needs of HEP experiments

◦ Partners CPPM, IN2P3 computing center, IHEP computing center

◦ Funding IHEP and IN2P3 through FCPPL 2016 call

CNRS-NSFC joint program for international collaboration

2

Background

Page 3: Distributed Computing R&D - IHEPindico.ihep.ac.cn/event/6651/session/10/contribution/31/material/... · Fabio HERNANDEZ on behalf of IHEP: CHEN Gang, LI Weidong, QI Fazhi, WANG Lu,

[email protected] CNRS/IN2P3compuEngcentre

◦ Topics of interest DIRAC-based computing platform for IHEP experiments

High-Performance computing platforms

Building blocks for inter-site bulk data transfer

Understanding I/O behaviour of applications

Experimentation with software-defined networking

3

Topics

Page 4: Distributed Computing R&D - IHEPindico.ihep.ac.cn/event/6651/session/10/contribution/31/material/... · Fabio HERNANDEZ on behalf of IHEP: CHEN Gang, LI Weidong, QI Fazhi, WANG Lu,

[email protected] CNRS/IN2P3compuEngcentre

DIRAC-basedplatform

Page 5: Distributed Computing R&D - IHEPindico.ihep.ac.cn/event/6651/session/10/contribution/31/material/... · Fabio HERNANDEZ on behalf of IHEP: CHEN Gang, LI Weidong, QI Fazhi, WANG Lu,

[email protected] CNRS/IN2P3compuEngcentre

◦ DIRAC instance at IHEP in production since several years currently being used by BES III, CEPC and JUNO

single instance for various experiments: makes easier to add other experiments to the distributed infrastructure, lowers maintenance effort

◦ Single, multi-experiment storage element Lustre-managed local storage also exposed via grid protocols by StoRM

access control and quotas based on grid certificates and VOMS attributes

◦ Example of recent usage CEPC simulation: 9.6 M events, 170 TB of data

5

DIRAC-basedplatform

Page 6: Distributed Computing R&D - IHEPindico.ihep.ac.cn/event/6651/session/10/contribution/31/material/... · Fabio HERNANDEZ on behalf of IHEP: CHEN Gang, LI Weidong, QI Fazhi, WANG Lu,

[email protected] CNRS/IN2P3compuEngcentre

◦ Redesign and reimplementation of support of virtual machines by DIRAC: VMDIRAC 2.0 contribution of IHEP team, now merged into official DIRAC source code

◦ Benefits of VMDIRAC 2.0 scheduling of tasks for execution on (OpenStack) virtual machines: more reliable scheduling policy

virtual machine started and stopped as a function of workload

transparent to end-user

greatly simplifies the integration of new cloud sites

support of execution of multi-core applications in virtual machines

◦ In production at IHEP to be recommissioned at CC-IN2P3 after changes to OpenStack interfaces

6

DIRAC-basedplatform(cont.)

Page 7: Distributed Computing R&D - IHEPindico.ihep.ac.cn/event/6651/session/10/contribution/31/material/... · Fabio HERNANDEZ on behalf of IHEP: CHEN Gang, LI Weidong, QI Fazhi, WANG Lu,

[email protected] CNRS/IN2P3compuEngcentre

High-performanceplatforms

Page 8: Distributed Computing R&D - IHEPindico.ihep.ac.cn/event/6651/session/10/contribution/31/material/... · Fabio HERNANDEZ on behalf of IHEP: CHEN Gang, LI Weidong, QI Fazhi, WANG Lu,

[email protected] CNRS/IN2P3compuEngcentre

◦ Both IHEP and CC-IN2P3 deployed pilot high-performance computing platforms IHEP (target): 1000 CPU cores, 150 GPUs, 50 Intel Xeon PHIs, Infiniband interconnection

CC-IN2P3: 160 CPU cores and 40 NVIDIA TESLA K80 GPUs in 10 hosts, Infiniband interconnection

programming environment: CUDA, OpenCL, OpenMP, MPI

◦ Early adopters getting familiar with those platforms the platforms are integrated to the workload management systems of both sites

◦ Sites understanding how to operate them and attracting users to these new facilities experience to be shared among sites operators

8

High-performancecomputingplatforms

Page 9: Distributed Computing R&D - IHEPindico.ihep.ac.cn/event/6651/session/10/contribution/31/material/... · Fabio HERNANDEZ on behalf of IHEP: CHEN Gang, LI Weidong, QI Fazhi, WANG Lu,

[email protected] CNRS/IN2P3compuEngcentre

Filemetadatastorage

Page 10: Distributed Computing R&D - IHEPindico.ihep.ac.cn/event/6651/session/10/contribution/31/material/... · Fabio HERNANDEZ on behalf of IHEP: CHEN Gang, LI Weidong, QI Fazhi, WANG Lu,

[email protected] CNRS/IN2P3compuEngcentre

◦ Exploratory work being conducted at IHEP

◦ Goal: to explore suitable alternative storage platforms for handling file metadata motivating use case: number of objects managed by file systems rapidly increasing — metadata management becoming bottleneck in some cases

◦ RAMCloud (Stanford) aggregation of RAM of several machines in a cluster with fast interconnection

key-value programming model on top of which indices, tables and transactions are built

strong consistency guarantees, low latency, persistence

10

Filemetadatastorage

Page 11: Distributed Computing R&D - IHEPindico.ihep.ac.cn/event/6651/session/10/contribution/31/material/... · Fabio HERNANDEZ on behalf of IHEP: CHEN Gang, LI Weidong, QI Fazhi, WANG Lu,

[email protected] CNRS/IN2P3compuEngcentre

◦ Testbed deployed at IHEP 3 servers, one client

both Ethernet and Infiniband interconnection

◦ Performance measurements using TCP over both Ethernet and Infiniband

using proprietary infrc protocol over Infiniband

◦ Next steps to adapt metadata management component of IndexFS to exploit RAMCloud for storage

11

Filemetadatastorage(cont.)

Page 12: Distributed Computing R&D - IHEPindico.ihep.ac.cn/event/6651/session/10/contribution/31/material/... · Fabio HERNANDEZ on behalf of IHEP: CHEN Gang, LI Weidong, QI Fazhi, WANG Lu,

[email protected] CNRS/IN2P3compuEngcentre

12

Filemetadatastorage(cont.)

Source: WANG Lu

10 µsec

179 µsec

Page 13: Distributed Computing R&D - IHEPindico.ihep.ac.cn/event/6651/session/10/contribution/31/material/... · Fabio HERNANDEZ on behalf of IHEP: CHEN Gang, LI Weidong, QI Fazhi, WANG Lu,

[email protected] CNRS/IN2P3compuEngcentre

13

Filemetadatastorage(cont.)

5.2 GB/sec

0.091 GB/sec

Source: WANG Lu

Page 14: Distributed Computing R&D - IHEPindico.ihep.ac.cn/event/6651/session/10/contribution/31/material/... · Fabio HERNANDEZ on behalf of IHEP: CHEN Gang, LI Weidong, QI Fazhi, WANG Lu,

[email protected] CNRS/IN2P3compuEngcentre

Software-definednetworking

Page 15: Distributed Computing R&D - IHEPindico.ihep.ac.cn/event/6651/session/10/contribution/31/material/... · Fabio HERNANDEZ on behalf of IHEP: CHEN Gang, LI Weidong, QI Fazhi, WANG Lu,

[email protected] CNRS/IN2P3compuEngcentre

◦ Continuation of work presented last year

◦ Advanced experimentation with SDN technologies for 3 use cases performed by IHEP experts

◦ Use-case #1 based on observed status of network links of 4 sites, dynamically reprogram the network for choosing the best path (IPv4 or IPv6 or aggregation of both)

4 participating sites: HEP, Shanghai Jiaotong Univ., Shandong Univ., China Central Normal Univ.

15

Software-definednetworking

EG1( 202. 122. 39. 190)

���RG-ONC( 202. 122. 39. 35)

������

I PV4

202. 194. 3. 135

I PV6

202. 120. 47. 162

CCNUI Pv4��

202. 122. 32. 176

2001: da8: 7000: 64: : 300

2001: da8: 8000:6210: 64: c5da: b

f f 4: acd9

CCNUi pv6��

2401: de00: : 300

I HEP����

EG2000XE

EG2000D

EG2000D

SDU����

SJTU����

EG2000D

I HEP

SDU

SJTU

Source: QI Fazhi

Page 16: Distributed Computing R&D - IHEPindico.ihep.ac.cn/event/6651/session/10/contribution/31/material/... · Fabio HERNANDEZ on behalf of IHEP: CHEN Gang, LI Weidong, QI Fazhi, WANG Lu,

[email protected] CNRS/IN2P3compuEngcentre

◦ Use case #2 to provide flexibility in the network configuration of OpenStack virtual machines

tenant can specify the bandwidth required on a per-VM basis

underlying network is programmed according to available physical bandwidth and VM requirements

experimentation ongoing with IHEP OpenStack installation

16

Software-definednetworking(cont.)

Source: QI Fazhi

Page 17: Distributed Computing R&D - IHEPindico.ihep.ac.cn/event/6651/session/10/contribution/31/material/... · Fabio HERNANDEZ on behalf of IHEP: CHEN Gang, LI Weidong, QI Fazhi, WANG Lu,

[email protected] CNRS/IN2P3compuEngcentre

◦ Use case #3 to dynamically program the the control path of network routers (via OpenFlow) based on observed behaviour capturing samples of packets (via SFlow)

goal is to detect abnormal behaviour (e.g. network attacks) and react promptly by reprogramming the border routers

◦ Fazhi QI visited CC-IN2P3 in April 2016 and shared IHEP experience using SDN technology https://indico.in2p3.fr/event/13089

17

Software-definednetworking(cont.)

Source: QI Fazhi

Page 18: Distributed Computing R&D - IHEPindico.ihep.ac.cn/event/6651/session/10/contribution/31/material/... · Fabio HERNANDEZ on behalf of IHEP: CHEN Gang, LI Weidong, QI Fazhi, WANG Lu,

[email protected] CNRS/IN2P3compuEngcentre

Intersitedatatransfer

Page 19: Distributed Computing R&D - IHEPindico.ihep.ac.cn/event/6651/session/10/contribution/31/material/... · Fabio HERNANDEZ on behalf of IHEP: CHEN Gang, LI Weidong, QI Fazhi, WANG Lu,

[email protected] CNRS/IN2P3compuEngcentre

◦ Goal: is HTTP suitable for bulk data transfer over high latency network? optimise for throughput, not latency

◦ Why HTTP? standard, programmability of both client and servers in any relevant programming language, future-proof, ubiquitous, customisable semantics, …

confidentiality and data integrity ensured using standard TLS

◦ Developed building blocks for measuring memory-to-memory performance of data transport over HTTP over transatlantic network, 10 Gbps, 110 ms RTT

19

Intersitedatatransfer

Page 20: Distributed Computing R&D - IHEPindico.ihep.ac.cn/event/6651/session/10/contribution/31/material/... · Fabio HERNANDEZ on behalf of IHEP: CHEN Gang, LI Weidong, QI Fazhi, WANG Lu,

[email protected] CNRS/IN2P3compuEngcentre

20

Intersitedatatransfer(cont.)

Page 21: Distributed Computing R&D - IHEPindico.ihep.ac.cn/event/6651/session/10/contribution/31/material/... · Fabio HERNANDEZ on behalf of IHEP: CHEN Gang, LI Weidong, QI Fazhi, WANG Lu,

[email protected] CNRS/IN2P3compuEngcentre

AnalysisofI/Opatterns

Page 22: Distributed Computing R&D - IHEPindico.ihep.ac.cn/event/6651/session/10/contribution/31/material/... · Fabio HERNANDEZ on behalf of IHEP: CHEN Gang, LI Weidong, QI Fazhi, WANG Lu,

[email protected] CNRS/IN2P3compuEngcentre

◦ Work performed by WANG Cong during her stay at CC-IN2P3

◦ Goal: understand the I/O behaviour of applications with the goal of simulate it

◦ Very preliminary results presented at CHEP 2016 (poster) http://indico.cern.ch/event/505613/contributions/2230984

◦ Additional areas of work identified, but this subject is on stand by because lack of manpower

22

AnalysisofI/Opatterns

Page 23: Distributed Computing R&D - IHEPindico.ihep.ac.cn/event/6651/session/10/contribution/31/material/... · Fabio HERNANDEZ on behalf of IHEP: CHEN Gang, LI Weidong, QI Fazhi, WANG Lu,

[email protected] CNRS/IN2P3compuEngcentre

People

Page 24: Distributed Computing R&D - IHEPindico.ihep.ac.cn/event/6651/session/10/contribution/31/material/... · Fabio HERNANDEZ on behalf of IHEP: CHEN Gang, LI Weidong, QI Fazhi, WANG Lu,

[email protected] CNRS/IN2P3compuEngcentre

◦ Fabio HERNANDEZ visited IHEP, January 2016

◦ Visit of QI Fazhi, LI Haibo and SUN Zhuhui to CC-IN2P3, April 2016 http://indico.in2p3.fr/e/sdn

◦ ZHANG Xiaomei attended 6th DIRAC Users Workshop in France and visited CC-IN2P3, May 2016 https://indico.cern.ch/event/477578/

◦ Fabio HERNANDEZ gave an invited talk at the 3rd Data Science conference in Shanghai, August 2016 http://dc2016.codata.cn/

◦ CHEN Gang and LI Jianhui (CNIC) visited CC-IN2P3, March 2017 https://indico.in2p3.fr/event/14339/

◦ Andrei TSAREGORODTSEV to visit IHEP, March 2017

24

People

Page 25: Distributed Computing R&D - IHEPindico.ihep.ac.cn/event/6651/session/10/contribution/31/material/... · Fabio HERNANDEZ on behalf of IHEP: CHEN Gang, LI Weidong, QI Fazhi, WANG Lu,

[email protected] CNRS/IN2P3compuEngcentre

Perspectives

Page 26: Distributed Computing R&D - IHEPindico.ihep.ac.cn/event/6651/session/10/contribution/31/material/... · Fabio HERNANDEZ on behalf of IHEP: CHEN Gang, LI Weidong, QI Fazhi, WANG Lu,

[email protected] CNRS/IN2P3compuEngcentre

◦ Project submitted to FCPPL 2017 call

◦ Topics DIRAC: improved integration of cloud and high performance computing resources

hybrid computing platforms (CPU, GPU, accelerators)

inter-site bulk data transfers

RAM-based data stores for metadata management

indexation of contents of ROOT files in external database for rapid location of events

cyber security federation for HEP

26

Perspectives