the world wide distributed computing architecture of the lhc datagrid
TRANSCRIPT
Distributed Data Management for LHC
Dirk Duellmann CERN, Geneva
Accelerating Science and Innovation
1
July
4th
201
2 T
he S
tatu
s of
the
Hig
gs S
earc
h
J. In
cand
ela
for t
he C
MS
CO
LLA
BO
RAT
ION
H #γγ candidate
July
4th
201
2 T
he S
tatu
s of
the
Hig
gs S
earc
h
J. In
cand
ela
for t
he C
MS
CO
LLA
BO
RAT
ION
S/B%Weighted%Mass%Distribution%! Sum%of%mass%distributions%for%each%event%class,%weighted%by%S/B%
! B%is%integral%of%background%model%over%a%constant%signal%fraction%interval%
43
ATLAS: Status of SM Higgs searches, 4/7/2012 9
Evolution of the excess with time
Energy-scale systematics not included
4
Founded in 1954: “Science for Peace”
Member States: Austria, Belgium, Bulgaria, the Czech Republic, Denmark, Finland, France, Germany, Greece, Hungary, Italy, the Netherlands, Norway, Poland, Portugal, Slovakia, Spain, Sweden, Switzerland and the United Kingdom Candidate for Accession: Romania Associate Members in the Pre-Stage to Membership: Israel, Serbia Applicant States: Cyprus, Slovenia, Turkey Observers to Council: India, Japan, the Russian Federation, the United States of America, Turkey, the European Commission and UNESCO
~ 2300 staff ~ 1050 other paid personnel ~ 11000 users Budget (2012) ~1000 MCHF
CERN: 20 member states
5
Global Science: 11000 scientists
Dirk Düllmann, CERN/IT 7
8
Stars and Planets only account for a small percentage of the universe !
CERN / May 2011
Ø 27 kilometre circle Ø proton collisions at 7+7 TeV Ø 10.000 magnets Ø 8000 km super-conducting cables Ø 120 t of liquid Helium
The Large Hadron Collider
The largest super conducting installation in the word
Dirk Düllmann, CERN/IT 14
Precision ! The 27 km long ring is sensitive to <1mm changes
Tides
Stray currents
Rainfall
LHC
Dirk Düllmann, CERN/IT 17
Ø 140 000 m3 rock removed Ø 53 000 m3 concrete Ø 6 000 tons steel reinforcement Ø 55 meters long Ø 30 meters wide Ø 53 meters high (10-storey building)
The ATLAS Cavern
15
A collision at LHC
26 June 2009 Ian Bird, CERN
Ian Bird, CERN 18
The Data AcquisiIon for one Experiment
Tier 0 at CERN: AcquisiIon, First reconstrucIon, Storage & DistribuIon
1.25 GB/sec (ions)
19
2011: 400-500 MB/sec 2011: 4-6 GB/sec
20
The LHC Computing Challenge
� Signal/Noise: 10-‐13 (10-‐9 offline)
� Data volume � High rate * large number of
channels * 4 experiments è ~15 PetaBytes of new data each
year � Compute power
� Event complexity * Nb. events * thousands users
è 200 k CPUs è 45 PB of disk storage
� Worldwide analysis & funding � CompuIng funding locally in major
regions & countries � Efficient analysis everywhere è GRID technology
à ~30 PB in 2012
à 170 PB à 300 k CPU
CERN Computer Centre
CERN computer centre: • Built in the 70s on the CERN site • ~3000 m2 (on three machine rooms) • 3.5 MW for equipment
A recent extension: • Located at Wigner (Budapest, Hungary) • ~1000 m2
• 2.7 MW for equipment • Connected to CERN with 2x100Gb links
21
• A distributed compuIng infrastructure to provide the producIon and analysis environments for the LHC experiments
• Managed and operated by a worldwide collaboraIon between the experiments and the parIcipaIng computer centres
• The resources are distributed –
for funding and sociological reasons
• Our task was to make use of the resources available to us – no mafer where they are located
23
World Wide Grid – what and why?
Tier-0 (CERN): • Data recording • Initial data reconstruction • Data distribution
Tier-1 (11 centres): • Permanent storage • Re-processing • Analysis
Tier-2 (~130 centres): • Simulation • End-user analysis
• The grid really works • All sites, large and small can contribute – And their contribuIons are needed!
CPU – around the Tiers CPU$delivered$+$January$2011$
CERN%
BNL%
CNAF%
KIT%
NL%LHC/Tier21%
RAL%
FNAL%
CC2IN2P3%
ASGC%
PIC%
NDGF%
TRIUMF%
Tier%2%
Tier%2%CPU%delivered%by%country%4%January%2011% USA$ UK$
France$ Germany$
Italy$ Russian$Federa7on$
Spain$ Canada$
Poland$ Switzerland$
Slovenia$ Czech$Republic$
China$ Portugal$
Japan$ Sweden$
Israel$ Romania$
Belgium$ Austria$
Hungary$ Taipei$
Australia$ Republic$of$Korea$
Norway$ Turkey$
Ukraine$ Finland$
India$ Pakistan$
Estonia$ Brazil$
Greece$
25
Evolution of capacity: CERN & WLCG
0"
200000"
400000"
600000"
800000"
1000000"
1200000"
1400000"
1600000"
1800000"
2000000"
2008" 2009" 2010" 2011" 2012" 2013"
WLCG%CPU%Growth%
Tier2%
Tier1%
CERN%
0"
20"
40"
60"
80"
100"
120"
140"
160"
180"
200"
2008" 2009" 2010" 2011" 2012" 2013"
WLCG%Disk%Growth%
Tier2%
Tier1%
CERN%
0"
100000"
200000"
300000"
400000"
500000"
600000"
2005" 2006" 2007" 2008" 2009" 2010" 2011" 2012" 2013"
CERN%Compu*ng%Capacity%
CERN"
2013/14: modest increases to process “parked data” 2015 à budget limited ? -‐ experiments will push trigger rates -‐ flat budgets give ~20%/year growth
What we thought was needed at LHC start
What we actually used at LHC start!
• Relies on – OPN, GEANT, US-‐LHCNet – NRENs & other naIonal & internaIonal providers
Ian Bird, CERN 27
LHC Networking
28
Computing model evolution
EvoluIon of compuIng models
Hierarchy Mesh
Physics Storage @ CERN: CASTOR and EOS
CASTOR and EOS are using the same commodity disk servers
• With RAID-‐1 for CASTOR • 2 copies in the mirror
• JBOD with RAIN for EOS • Replicas spread over different disk servers
• Tunable redundancy
Storage Systems developed at CERN
30
CERN Disk/Tape Storage Management @ storage-day.ch
CASTOR -‐ Physics Data Archive
31
Data: • ~90 PB of data on tape; 250 M files • Up to 4.5 PB new data per month • Over 10GB/s (R+W) peaks
Infrastructure: • ~ 52K tapes (1TB, 4TB, 5TB) • 9 Robotic libraries (IBM and Oracle) • 80 production + 30 legacy tape drives
CERN IT Department CH-1211 Genève 23
Switzerland www.cern.ch/it
Internet Services
DSS 44.8 PB
136 (279) Mio.
20.7k
32.1 PB
EOS Usage at CERN Today
Availability and Performance
Archival & Data DistribuIon User Analysis
Usage Peaks
pp 2012 pA 2013
34
CERN openlab in a nutshell
• A science – industry partnership to drive R&D and innovaIon with over a decade of success
• Evaluate state-‐of-‐the-‐art technologies in a challenging environment and improve them
• Test in a research environment today what will be used in many business sectors tomorrow
• Train next generaIon of engineers/employees
• Disseminate results and outreach to new audiences
40
41
CERN IT Department CH-1211 Genève 23
Switzerland www.cern.ch/it
Internet Services
DSS Ongoing R&D: Eg Cloud Storage
• CERN openlab – joint project since Jan ’12 – Testing scaling and TCO
gains with prototype applications
• Huawei S3 storage appliance (0.8 PB)
• logical replication • fail-in-place
Thanks for your attention!
More at http://cern.ch
Accelerating Science and Innovation 45