ian bird lcg deployment manager egee operations manager lcg - the worldwide lhc computing grid...
Post on 03-Jan-2016
221 Views
Preview:
TRANSCRIPT
Ian BirdIan BirdLCG Deployment ManagerLCG Deployment ManagerEGEE Operations ManagerEGEE Operations Manager
LCG - The Worldwide LHC Computing Grid
Building a Service for LHC Data Analysis
22 September 2006
October 7, 20052
Ian.Bird@cern.ch
The accelerator generates 40 million particle collisions (events) every second at the centre of each of the four experiments’ detectors
The LHC Accelerator
October 7, 20053
Ian.Bird@cern.ch
LHC DATA
This is reduced by online computers that filter out a few hundred “good” events per sec.
Which are recorded on disk and magnetic tapeat 100-1,000 MegaBytes/sec ~15 PetaBytes per year for all four experiments
October 7, 20054
Ian.Bird@cern.ch
The Worldwide LHC Computing Grid
Purpose Develop, build and maintain a distributed computing
environment for the storage and analysis of data from the four LHC experiments
Ensure the computing service … and common application libraries and tools
Phase I – 2002-05 - Development & planning
Phase II – 2006-2008 – Deployment & commissioning of the initial services
October 7, 20055
Ian.Bird@cern.ch
WLCG Collaboration
The Collaboration – still growing ~130 computing centres 12 large centres
(Tier-0, Tier-1) 40-50 federations of smaller
“Tier-2” centres 29 countries
Memorandum of Understanding Agreed in October 2005, now being signed
Purpose Focuses on the needs of the four LHC experiments Commits resources –
each October for the coming year 5-year forward look
Agrees on standards and procedures
October 7, 20056
Ian.Bird@cern.ch
LCG Service Hierarchy
Tier-0 – the accelerator centre Data acquisition & initial processing Long-term data curation Distribution of data Tier-1 centres
Canada – Triumf (Vancouver)France – IN2P3 (Lyon)Germany – Forschunszentrum KarlsruheItaly – CNAF (Bologna)Netherlands – NIKHEF/SARA (Amsterdam)Nordic countries – distributed Tier-1
Spain – PIC (Barcelona)Taiwan – Academia SInica (Taipei)UK – CLRC (Oxford)US – FermiLab (Illinois) – Brookhaven (NY)
Tier-1 – “online” to the data acquisition process high availability
Managed Mass Storage – grid-enabled data service
Data-heavy analysis National, regional support
Tier-2 – ~120 centres in ~29 countries Simulation End-user analysis – batch and interactive
October 7, 20057
Ian.Bird@cern.ch
LHC EGEE GridHigh Energy Physics a new computing infrastructure
for science
1999 – Monarc Project Early discussions on how to organise
distributed computing for LHC 2000 – growing interest in grid technology
HEP community was the driver in launching the DataGrid project
2001-2004 - EU DataGrid project middleware & testbed for an operational grid
2002-2005 – LHC Computing Grid – LCG deploying the results of DataGrid to provide
aproduction facility for LHC experiments
2004-2006 – EU EGEE project phase 1 starts from the LCG grid shared production infrastructure expanding to other communities and
sciences
CERN
October 7, 20058
Ian.Bird@cern.ch
LCG depends on two major science grid infrastructuresEGEE - Enabling Grids for E-ScienceOSG - US Open Science Grid
October 7, 20059
Ian.Bird@cern.ch
Production Grids for LHC
EGEE Grid ~50K jobs/day
~14K simultaneous jobs during prolonged periods
Jobs/Day - EGEE Grid
0
10
20
30
40
50
60
Jun-05
Jul-05
Aug-05
Sep-05
Oct-05
Nov-05
Dec-05
Jan-06
Feb-06
Mar-06
Apr-06
May-06
Jun-06
Jul-06
Aug-06
month
K jobs/day
alice
atlas
cms
lhcb
geant4
dteam
non-LHC
Last month, running jobs for the whole Grid
lhcb cms atlas alicelhcb cms atlas alice
EGEE Grid
Jobs/day EGEE Grid
14K
October 7, 200510
Ian.Bird@cern.ch
OSG Production for LHC
OSG~15K jobs/day. 3 big users are
ATLAS, CDF, CMS.~3K simultaneous jobs --
at the moment use quite spiky.
ATLASCMS
OSG-CMS Data Distribution -
past 3 months
OSG-ATLAS Running Jobs - past 3 months10,000
20,000
1,000
Jobs/day OSG Grid
October 7, 200511
Ian.Bird@cern.ch
Pre-SC4 April tests CERN T1s – SC4 target 1.6 GB/s reached – but only for one day
But – experiment-driven transfers (ATLAS and CMS) sustained 50% of the targetunder much more realistic conditions
CMS transferred a steady 1 PByte/month between Tier-1s & Tier-2s during a 90 day period
ATLAS distributed 1.25 PBytes from CERN during a 6-week period
Data Distribution
1.6 GBytes/sec
0.8 GBytes/sec
October 7, 200512
Ian.Bird@cern.ch
Interoperation between Grid Infrastructures
Good progress EGEE-OSG interoperability Cross job submission – in use by CMS Integrating basic operation – series of workshops
Early technical studies on integration with Nordic countries and NAREGI in Japan
13
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Collaborating Infrastructures
Potential for linking ~80 countries by 2008
KnowARC
DEISATeraGrid
14
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Applications on EGEE
• More than 25 applications from anincreasing number of domains– Astrophysics– Computational Chemistry– Earth Sciences– Financial Simulation– Fusion– Geophysics– High Energy Physics– Life Sciences– Multimedia– Material Sciences– …..
Book of abstracts: http://doc.cern.ch//archive/electronic/egee/tr/egee-tr-2006-005.pdf
15
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Example: EGEE Attacks Avian Flu
• EGEE used to analyse 300,000 possible potential drug compounds against bird flu virus, H5N1.
• 2000 computers at 60 computer centres in Europe, Russia, Asia and Middle East ran during four weeks in April - the equivalent of 100 years on a single computer.
• Potential drug compounds now being identified and ranked.
Neuraminidase, one of the two major surface proteins of influenza viruses, facilitating the release of virions from infected cells. Image Courtesy Ying-Ta Wu, AcademiaSinica.
16
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
ITU• International Telecommunication Union
– ITU/BR: Radio-communication Sector management of the radio-frequency
spectrum and satellite orbits for fixed, mobile, broadcasting and other communication services
• RRC-06 (15 May–16 June 2006)– 120 countries negotiate the new frequency plan
– introduction of digital broadcasting UHF (470-862 Mhz) & VHF (174-230 Mhz)
– Demanding computing problem with short-deadlines
– Using EGEE grid were able to complete a cycle in less than 1 hour
17
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Grid management: structure
• Operations Coordination Centre (OCC)
– management, oversight of all operational and support activities
• Regional Operations Centres (ROC)
– providing the core of the support infrastructure, each supporting a number of resource centres within its region
– Grid manager on Duty (COD)
• Resource centres – providing resources
(computing, storage, network, etc.);
• Grid User Support (GGUS)
– At FZK, coordination and management of user support, single point of contact for users
18
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Security & Policy
Collaborative policy development– Many policy aspects are collaborative
works; e.g.:
• Joint Security Policy Group
• Certification Authorities– EUGridPMA IGTF, etc.
• Grid Acceptable Use Policy (AUP)– common, general and simple AUP
– for all VO members using many Grid infrastructures
EGEE, OSG, SEE-GRID, DEISA, national Grids…
• Incident Handling and Response – defines basic communications paths
– defines requirements (MUSTs) for IR
– not to replace or interfere with local response plans
Security & Availability Policy
UsageRules
Certification Authorities
AuditRequirements
Incident Response
User Registration & VO Management
Application Development& Network Admin Guide
VOSecurity
19
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Sustainability: Beyond EGEE-II
• Need to prepare for permanent Grid infrastructure– Maintain Europe’s leading position in global science Grids– Ensure a reliable and adaptive support for all sciences– Independent of short project funding cycles– Modelled on success of GÉANT
Infrastructure managed in collaboration with national grid initiatives
October 7, 200520
Ian.Bird@cern.ch
Conclusions
LCG will depend on
~130 computer centres two major science grid infrastructures – EGEE and OSG excellent global research networking
Grids are now operational >200 sites between EGEE and OSG Grid operations centres running for well over a year >40K jobs per day, 20K simultaneous jobs with the right load
and job mix Demonstrated target data distribution rates from CERN Tier-1s
EGEE is a large multi-disciplinary grid Although HEP is a driving force, must remain broader to ensure
the long term Planning for a long-term sustainable infrastructure now
top related