ian bird lcg deployment manager egee operations manager lcg - the worldwide lhc computing grid...
TRANSCRIPT
Ian BirdIan BirdLCG Deployment ManagerLCG Deployment ManagerEGEE Operations ManagerEGEE Operations Manager
LCG - The Worldwide LHC Computing Grid
Building a Service for LHC Data Analysis
22 September 2006
October 7, 20052
The accelerator generates 40 million particle collisions (events) every second at the centre of each of the four experiments’ detectors
The LHC Accelerator
October 7, 20053
LHC DATA
This is reduced by online computers that filter out a few hundred “good” events per sec.
Which are recorded on disk and magnetic tapeat 100-1,000 MegaBytes/sec ~15 PetaBytes per year for all four experiments
October 7, 20054
The Worldwide LHC Computing Grid
Purpose Develop, build and maintain a distributed computing
environment for the storage and analysis of data from the four LHC experiments
Ensure the computing service … and common application libraries and tools
Phase I – 2002-05 - Development & planning
Phase II – 2006-2008 – Deployment & commissioning of the initial services
October 7, 20055
WLCG Collaboration
The Collaboration – still growing ~130 computing centres 12 large centres
(Tier-0, Tier-1) 40-50 federations of smaller
“Tier-2” centres 29 countries
Memorandum of Understanding Agreed in October 2005, now being signed
Purpose Focuses on the needs of the four LHC experiments Commits resources –
each October for the coming year 5-year forward look
Agrees on standards and procedures
October 7, 20056
LCG Service Hierarchy
Tier-0 – the accelerator centre Data acquisition & initial processing Long-term data curation Distribution of data Tier-1 centres
Canada – Triumf (Vancouver)France – IN2P3 (Lyon)Germany – Forschunszentrum KarlsruheItaly – CNAF (Bologna)Netherlands – NIKHEF/SARA (Amsterdam)Nordic countries – distributed Tier-1
Spain – PIC (Barcelona)Taiwan – Academia SInica (Taipei)UK – CLRC (Oxford)US – FermiLab (Illinois) – Brookhaven (NY)
Tier-1 – “online” to the data acquisition process high availability
Managed Mass Storage – grid-enabled data service
Data-heavy analysis National, regional support
Tier-2 – ~120 centres in ~29 countries Simulation End-user analysis – batch and interactive
October 7, 20057
LHC EGEE GridHigh Energy Physics a new computing infrastructure
for science
1999 – Monarc Project Early discussions on how to organise
distributed computing for LHC 2000 – growing interest in grid technology
HEP community was the driver in launching the DataGrid project
2001-2004 - EU DataGrid project middleware & testbed for an operational grid
2002-2005 – LHC Computing Grid – LCG deploying the results of DataGrid to provide
aproduction facility for LHC experiments
2004-2006 – EU EGEE project phase 1 starts from the LCG grid shared production infrastructure expanding to other communities and
sciences
CERN
October 7, 20058
LCG depends on two major science grid infrastructuresEGEE - Enabling Grids for E-ScienceOSG - US Open Science Grid
October 7, 20059
Production Grids for LHC
EGEE Grid ~50K jobs/day
~14K simultaneous jobs during prolonged periods
Jobs/Day - EGEE Grid
0
10
20
30
40
50
60
Jun-05
Jul-05
Aug-05
Sep-05
Oct-05
Nov-05
Dec-05
Jan-06
Feb-06
Mar-06
Apr-06
May-06
Jun-06
Jul-06
Aug-06
month
K jobs/day
alice
atlas
cms
lhcb
geant4
dteam
non-LHC
Last month, running jobs for the whole Grid
lhcb cms atlas alicelhcb cms atlas alice
EGEE Grid
Jobs/day EGEE Grid
14K
October 7, 200510
OSG Production for LHC
OSG~15K jobs/day. 3 big users are
ATLAS, CDF, CMS.~3K simultaneous jobs --
at the moment use quite spiky.
ATLASCMS
OSG-CMS Data Distribution -
past 3 months
OSG-ATLAS Running Jobs - past 3 months10,000
20,000
1,000
Jobs/day OSG Grid
October 7, 200511
Pre-SC4 April tests CERN T1s – SC4 target 1.6 GB/s reached – but only for one day
But – experiment-driven transfers (ATLAS and CMS) sustained 50% of the targetunder much more realistic conditions
CMS transferred a steady 1 PByte/month between Tier-1s & Tier-2s during a 90 day period
ATLAS distributed 1.25 PBytes from CERN during a 6-week period
Data Distribution
1.6 GBytes/sec
0.8 GBytes/sec
October 7, 200512
Interoperation between Grid Infrastructures
Good progress EGEE-OSG interoperability Cross job submission – in use by CMS Integrating basic operation – series of workshops
Early technical studies on integration with Nordic countries and NAREGI in Japan
13
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Collaborating Infrastructures
Potential for linking ~80 countries by 2008
KnowARC
DEISATeraGrid
14
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Applications on EGEE
• More than 25 applications from anincreasing number of domains– Astrophysics– Computational Chemistry– Earth Sciences– Financial Simulation– Fusion– Geophysics– High Energy Physics– Life Sciences– Multimedia– Material Sciences– …..
Book of abstracts: http://doc.cern.ch//archive/electronic/egee/tr/egee-tr-2006-005.pdf
15
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Example: EGEE Attacks Avian Flu
• EGEE used to analyse 300,000 possible potential drug compounds against bird flu virus, H5N1.
• 2000 computers at 60 computer centres in Europe, Russia, Asia and Middle East ran during four weeks in April - the equivalent of 100 years on a single computer.
• Potential drug compounds now being identified and ranked.
Neuraminidase, one of the two major surface proteins of influenza viruses, facilitating the release of virions from infected cells. Image Courtesy Ying-Ta Wu, AcademiaSinica.
16
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
ITU• International Telecommunication Union
– ITU/BR: Radio-communication Sector management of the radio-frequency
spectrum and satellite orbits for fixed, mobile, broadcasting and other communication services
• RRC-06 (15 May–16 June 2006)– 120 countries negotiate the new frequency plan
– introduction of digital broadcasting UHF (470-862 Mhz) & VHF (174-230 Mhz)
– Demanding computing problem with short-deadlines
– Using EGEE grid were able to complete a cycle in less than 1 hour
17
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Grid management: structure
• Operations Coordination Centre (OCC)
– management, oversight of all operational and support activities
• Regional Operations Centres (ROC)
– providing the core of the support infrastructure, each supporting a number of resource centres within its region
– Grid manager on Duty (COD)
• Resource centres – providing resources
(computing, storage, network, etc.);
• Grid User Support (GGUS)
– At FZK, coordination and management of user support, single point of contact for users
18
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Security & Policy
Collaborative policy development– Many policy aspects are collaborative
works; e.g.:
• Joint Security Policy Group
• Certification Authorities– EUGridPMA IGTF, etc.
• Grid Acceptable Use Policy (AUP)– common, general and simple AUP
– for all VO members using many Grid infrastructures
EGEE, OSG, SEE-GRID, DEISA, national Grids…
• Incident Handling and Response – defines basic communications paths
– defines requirements (MUSTs) for IR
– not to replace or interfere with local response plans
Security & Availability Policy
UsageRules
Certification Authorities
AuditRequirements
Incident Response
User Registration & VO Management
Application Development& Network Admin Guide
VOSecurity
19
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Sustainability: Beyond EGEE-II
• Need to prepare for permanent Grid infrastructure– Maintain Europe’s leading position in global science Grids– Ensure a reliable and adaptive support for all sciences– Independent of short project funding cycles– Modelled on success of GÉANT
Infrastructure managed in collaboration with national grid initiatives
October 7, 200520
Conclusions
LCG will depend on
~130 computer centres two major science grid infrastructures – EGEE and OSG excellent global research networking
Grids are now operational >200 sites between EGEE and OSG Grid operations centres running for well over a year >40K jobs per day, 20K simultaneous jobs with the right load
and job mix Demonstrated target data distribution rates from CERN Tier-1s
EGEE is a large multi-disciplinary grid Although HEP is a driving force, must remain broader to ensure
the long term Planning for a long-term sustainable infrastructure now