data, storage & information managementconference.eresearch.edu.au/wp-content/uploads/... ·...

44
Data, storage & information management collaboration at the frontiers of science and technology Jakub T. Mościcki, CERN IT eResearch 2019, Oct 22, Brisbane

Upload: others

Post on 22-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Data, storage & information management

collaboration at the frontiers of science and technology

Jakub T. Mościcki, CERN IT

eResearch 2019, Oct 22, Brisbane

Large Hadron Collider

27 km tunnel40-100m underground

60+ years lifespan

Accelerator

1232 dipole magnets 1.9K (-271.3C)

0.99999% c2 beams x 362MJ

15m 30t

at 5 knots at 2200km/h

LHCbATLAS

ALICE

CMS Experiments& Detectors

46m

25m

14,000t= 1.5x

11,000 rounds per second 25 MHz collision rate

aligned with sub-mm precision!

Why?

Standard Model Strings QCD

mass

symmetry?early universe

Origins

Education

8 Middle East countries together

Model

1954 Science for Peace

Collaboration Fundamental Research

2019

23 Member States70 Countries

120 Nationalities600 Universities

1959 PS 628m

1976 SPS7 km

1989 LEP (electrons)

2008 LHC (protons) 27 km

1984 Workshop: LHC in LEP tunnel ?

Generations of accelerators

v=const

F

Zoom into LHC Computing

millions

thousands

hundreds

read-out channels

head-on collisions

detector alignment

25M event/s collision rateTrigger: 1/25,000 events

multiple particles interacting

Illustra(on  of  Run  3  tracking  problem:  2ms  (meframe  shown  (10%  of  total)  

R.Jones, European Strategy for Particle Physics, Grenada May 2019

Calibration

CPU (a lot)

RAW

RECO RECO

RAW

Source:Vakho Tsulaia LBNL, June 2019

peak 60GB/s

140/300 PB60,000 disks

10 GB/s

330PB

~105 cores

peak 100 GB/s

Data recording & processing at CERN

10 GB/s

WLCG Grid

2018 Data taking2018: 88 PB ATLAS: 24.7 CMS: 43.6 LHCb: 7.3 ALICE: 12.4

330PB on tape60 GB/s

Worldwide LHC Computing Grid

global data distribution systemstorage: ~1EB and 1 billion files transfer rates: 60GB/s

data processing~170 sites

1 million cores send jobs to data

File Transfer Service at CERN

dedicated networkLHCOPN

Tier-2: simulation, end-user analysis access

efficiency

funding model

Tier-0: data recording, long-term archival, distribution

RAWTier-1: reprocessing, storage, analysis

RAW

Tape Storage• Archival and backup

• Cheap media, expensive infrastructure

• Energy efficient

• Access

• sequential streaming ++

• random access --

• Rewrite (“repack”) O(n2)

user

cache? buffer?

IOPS

HSM?

CTA

Disk Storage• Reprocessing and Analysis

• Cost control

• use cheap hardware

• compensate in software

Open Source Storage cern.ch/eos

nodes (100s)

disks (24-196)

user

• Tunable redundancy layouts (RAIN) • replication, erasure coding

• Storage hierarchy • SSD for metadata • HDD for data

Data Formatsunstructured RAW dataindependent events compressed into big blocks

1 2 3 4 5 6 7 8

FileN

1..8

x1 x2 x3 x4 x5 x6 x7 x8

File

y1 y2 y3 y4 y5 y6 y7 y8

z1 z2 z3 z4 z5 z6 z7 z8

v1 v2 v3 v4 v5 v6 v7 v8

………..

C(x)

File

C(y) C(z) C(v)…..

structured analysis dataoptimized for volume and access

A.Peters/CERN

Data Access Patterns

Credit: A.Peters, J.Blomer / CERN

Data formats

sparse access: optimized LAN&WAN protocol

latency compensation:predictable async multi-byte-range request

remote access: sequential forward seeking IO

Preservation

Bit preservation not a problem How to read old formats?

LEP data ~100TB Metadata

• Digital memory of organization easily lost

Data lifetime >> experiment lifetime

Years…

• Data is expensive to produce

High Luminosity LHC

50 PB/y 150 PB/y 650 PB/y

High Luminosity LHC

330PB Exabytes

we are here

Computing and Storage Evolution

Source: B.Panzer / CERN

1969 Apollo Guidance Computer 2K RAM, 1.024MHz clock

CPU

Disk100PB

Challenges for HL-LHC

• Cannot rely on pure scaling of technology

• it slows down

• Cannot purchase more

• budget is flat

Clever solutions needed• Don’t store raw data: pre-processed events only

• Re-organize workflows

• organized staging from cheap storage = needs less expensive storage (Tape Carousel)

• Smarter global caching (Data Lakes)

• Tune redundancy: erasure coding, 2nd global replica (?)

• …

Information Technology for Collaboration

Large collaborations• Global community: 40,000 email boxes, 600 universities

• 5,000-7,000 users at CERN at any time

• ATLAS: 3 thousand physicists from 180 institutions, 1500 PhD students, 900 published papers (so far)

• long tail of small and medium experiments

• Hierarchical, distributed organization but…

• people communicate and share information across formal structures

Back in 1989Many of the discussions of the future at CERN and the LHC era end with the question Yes, but how will we ever keep track of such a large project?

Hierarchical information does not model the real world

• Your filesystem • Photo collections • Email folders • Dynamic work environments

Centralized system is impractical

• continuous & distributed information update

https://www.w3.org/History/1989/proposal.html

The problems of information loss may be particularly acute at CERN (…) CERN is a model in miniature of the rest of world in a few years time. CERN meets now some problems which the rest of the world will have to face soon. https://www.w3.org/History/1989/proposal.html

http://cds.cern.ch/record/1164396

WWW

Web principles…• Remote access from any type of device

• Decentralised update and modification

• Integrating existing information sources (data)

• Layered: separate the information storage from display

• Well defined interface

…still true in cloud services Era

• Access from mobile phones, laptops, …

• Collaborative and distributed update

• A connected mesh of services integrating seamlessly into everyday workflows

• ….

Integrated Services for Data Science at CERN

Users collaborate on data using an increasing number of applications.

Data available on all devices: mobile, laptops, desktops

Data easily sharable with individuals and groups

Concurrent editing

Web-based Analysis

Ready-to-go environment “one click away”

Integrated with entire data repository

Future Federated Analysis Platform

One-click to create user groups, share projects and data

Domestic and remote users in the same collaborative workflow.

Advancing state of the art

Application&data workflow.

GridIntegrate distributed

computing capabilities with data

sharing systems.

Full metadata awarenessin the research workflow.

PUBLISH( PROPOSAL(

DATA((COLLECTION(

DATA((ANALYSIS(

Mesh

Commitment to Open Source

http://cds.cern.ch/record/1164399

The document that officially put the World Wide Web into the public

domain on 30 April 1993.

Many exciting challenges ahead for the LHC computing and distributed collaboration

Pushing frontiers of science and technology together with a worldwide community

Australia in Federated Analysis Platform

Community Service

X

NRENs%HEP%&%Physics%

Universi2es% Companies%

cs3community.org

Storage Operations at CERN❖ Physics Online Storage: EOS — 250 PB, 70GB/s RW

❖ User Storage: AFS, CERNBox — 5B files

❖ Infrastructure Storage: CEPH, S3, Filers — 30 KHz ops

❖ Software Distribution Storage: CVMFS — 900 repositories

❖ Archival Storage: CASTOR — 330 PB

43

… …small / medium experiments