status of the lhc computing project desy 26.1.2004 matthias kasemann / desy seminar...
TRANSCRIPT
Status of the LHC Computing Project
DESY
26.1.2004Matthias Kasemann / DESY
Seminar Datenverarbeitung in der Hochenergiephysik
Matt
hia
s K
ase
mann
, Sta
tus
of
the L
CG
2
6.1
.2004
2/4
5
Outline Introduction
LCG Overview Reosurces
The CERN LCG computing fabric
Grid R&D and deployment Deploying the LHC computing environment New projects – EGEE
The LCG Applications Status of sub-projects
And Germany…
Matt
hia
s K
ase
mann
, Sta
tus
of
the L
CG
2
6.1
.2004
3/4
5
The Large Hadron Collider Project4 detectors CMS
ATLAS
LHCb
Requirements for world-wide data analysis
Storage – Raw recording rate 0.1 – 1 GBytes/sec
Accumulating at 5-8 PetaBytes/year
10 PetaBytes of disk
Processing – 100,000 of today’s fastest PCs
Requirements for world-wide data analysis
Storage – Raw recording rate 0.1 – 1 GBytes/sec
Accumulating at 5-8 PetaBytes/year
10 PetaBytes of disk
Processing – 100,000 of today’s fastest PCs
Matt
hia
s K
ase
mann
, Sta
tus
of
the L
CG
2
6.1
.2004
4/4
5
LHC Computing Hierarchy
Emerging Vision: A Richly Structured, Global Dynamic System
Tier 1
Tier2 Center
Online System
CERN Center PBs of Disk;
Tape Robot
FNAL CenterIN2P3 Center INFN Center RAL Center
InstituteInstituteInstituteInstitute
Workstations
~100-1500 MBytes/sec
2.5-10 Gbps
Tens of Petabytes by 2007-8.An Exabyte ~5-7 Years later.
~PByte/sec
~2.5-10 Gbps
Tier2 CenterTier2 CenterTier2 Center
~2.5-10 Gbps
Tier 0 +1
Tier 3
Tier 4
Tier2 Center Tier 2
Experiment
CERN/Outside Resource Ratio ~1:2Tier0/( Tier1)/( Tier2) ~1:1:1
0.1 to 10 GbpsPhysics data cache
Matt
hia
s K
ase
mann
, Sta
tus
of
the L
CG
2
6.1
.2004
5/4
5
LCG - Goals The goal of the LCG project is to prototype and deploy the
computing environment for the LHC experiments
Two phases:
Phase 1: 2002 – 2005 Build a service prototype, based on existing grid middleware Gain experience in running a production grid service Produce the TDR for the final system
Phase 2: 2006 – 2008 Build and commission the initial LHC computing environment
LCG is not a development project – it relies on other grid projects for grid middleware development and support
Matt
hia
s K
ase
mann
, Sta
tus
of
the L
CG
2
6.1
.2004
6/4
5
LHC Computing Grid Project The LCG Project is a collaboration of –
The LHC experiments The Regional Computing Centres Physics institutes
.. working together to prepare and deploy the computing environment that will be used by the experiments to analyse the LHC data
This includes support for applications provision of common tools, frameworks, environment, data
persistency .. and the development and operation of a computing service
exploiting the resources available to LHC experiments in computing centres, physics institutes and universities around the world
presenting this as a reliable, coherent environment for the experiments
the goal is to enable the physicist to concentrate on science, unaware of the details and complexity of the environment they are exploiting
Matt
hia
s K
ase
mann
, Sta
tus
of
the L
CG
2
6.1
.2004
7/4
5
Requirements setting: RTAG status
RTAG1 Persistency Framework; completedRTAG2 Managing LCG Software; completedRTAG3 Math Library Review; completed RTAG4 GRID Use Cases; completedRTAG5 Mass Storage; completed RTAG6 Regional Centres; completed RTAG7 Detector geometry & materials description; completed RTAG8 LCG blueprint; completed RTAG9 Monte Carlo Generators completed RTAG10 Detector Simulation; completed RTAG11 Architectural Roadmap towards Distributed Analysis; completed
Reports of the Grid Applications Group: HEPCAL I (LCG-SC2-20-2002)
Common Use Cases for a Common Application Layer HEPCAL II (LCG-SC2-2003-032)
Common Use Cases for a Common Application Layer for Analysis
Matt
hia
s K
ase
mann
, Sta
tus
of
the L
CG
2
6.1
.2004
8/4
5Human Resources used to October 2003 CERN + apps. at other institutes
LCG Project - Human ResourcesExperience-weighted FTE-years
2002 1Q03 2Q03 3Q03 TotalApplications (with ROOT) 37.7 13.3 15.6 15.7 82.3CERN Fabric 32.2 10.4 10.3 10.7 63.5Grid Deployment 7.5 4.1 4.6 4.9 21.1Project Technology 1.2 0.3 0.8 1.4 3.6LCG Management 5.9 1.8 1.7 1.6 11.0Total 84.5 29.9 33.0 34.2 181.6
Resources used (FTE-years)
Matt
hia
s K
ase
mann
, Sta
tus
of
the L
CG
2
6.1
.2004
9/4
5
LCG Project - Human Resourcesunweighted FTEs in October-03
ApplicationsSoftware Process Infrastructure 8.7Object Persistency (POOL) 12.3Core Libraries and Services (SEAL) 5.4Physics Interfaces (PI) 5.1Simulation 16.0GRID interfacing 4.2Architecture 0.2Management 1.7Total 53.6ROOT 5.7
Includes resources at CERN and at Other Institutes
6.0
50.9
Matt
hia
s K
ase
mann
, Sta
tus
of
the L
CG
2
6.1
.2004
10
/45
LCG Project - Human Resourcesunweighted FTEs in October-03
CERN Fabric Project Technology
System Management & Operations 10.4Grid Technology, modelling, evaluations 5.0
Development (e.g. Monitoring) 9.6 Total 5.0Data Storage Management 7.5 0.0Grid Security 4.0 Grid Deployment 0.0Grid-Fabric Interface 8.0 Integration and Certification 14.5
Internal Networking for Physics 1.0Grid Infrastructure, Operations and User Support 10.0
External Networking for Physics 0.6 Total 24.5Management 1.3Total 42.4 LCG Management 5.6
Includes all staff for LHC services at CERNDoes NOT include EDG middleware
Matt
hia
s K
ase
mann
, Sta
tus
of
the L
CG
2
6.1
.2004
11
/45
Personnel resourcesIncludes –- all staff for LHC services at CERN (inc. networking)- Staff at other institutes in applications area
without Regional Centres, EDG middleware
Applications 43%
CERN Fabric31%
Grid Deployment
18%
Project Technology
4%
Project Management
4%
Current Staff (FTEs) by Project Area
LCG Special Funding
43%
EU (Fabric)2%
CERN Support
44%
CERN - Experiments
5%
Other Insts. - Experiments
6%
Current Staff (FTEs) by Funding Source
Matt
hia
s K
ase
mann
, Sta
tus
of
the L
CG
2
6.1
.2004
12
/45
Phase 1+2 – Human Resources at CERN
LCG Human ResourcesBudget & Funding Estimates
0
20
40
60
80
100
120
140
160
180
2003
2004
2005
2006
2007
2008
2009
2010
Year
FT
Es
Experiments - Level of August2003
Estimated net resources fromEGEE
Guess about EGEE follow-on
Phase II in Budget - but NOTFunded
Commitments for SpecialFunding in Phase 1 at CERN
Physics Services funded atCERN (EP, IT)
Requirements - applications -Estimated evolution
Requirements - computingservice
14
28 2715
Matt
hia
s K
ase
mann
, Sta
tus
of
the L
CG
2
6.1
.2004
13
/45
Country
CPU Capacity
kSI2k
Disk Capacity
TB
LCG Support
FTE
Online Tapes
TB
Diff. to CPU Capacity
announced May 200303
CERN 700 100 10.0 1000 0Czech Rep. 30 6 1.0 2 -30France 150 24 1.0 160 -270Germany 305 44 9.3 74 98Holland 30 1 3.0 20 -94Hungary -70Italy 895 110 13.3 100 388Japan 125 40 50 -95Poland 48 5 1.8 5 -38Russia 102 12.9 4.0 26 -18Taipei 40 5.8 4.6 30 -180UK 486 55.3 3.4 100 -1170
USA¹ 80 25 100 -721Switzerland 18 4 1.4 20 -8Spain 150 30 4.0 100 0Sweden -179Sum 3159 463 57 1787 -2388
LCG Regional Centre Capacity 2Q2004
Today: ~1000 Si2k/boxIn 2007: 8000 Si2k/box
Matt
hia
s K
ase
mann
, Sta
tus
of
the L
CG
2
6.1
.2004
14
/45
Storage (II)Storage (II)
CASTOR
CERN development of a Hierarchical Storage Management system (HSM) for LHC
Two teams are working in this area : Developer (3.8 FTE) and Support (3 FTE) Support to other institutes currently under negotiation (LCG, HEPCCC)
Usage : 1.7 PB of data with 13 million files, 250 TB disk layer and 10 PB tape storage Central Data Recording and data processing NA48 0.5 PB COMPASS 0.6 PB LHC Exp. 0.4 PB
Current CASTOR implementation needs improvements New CASTOR stager A pluggable framework for intelligent and policy controlled file access scheduling Evolvable storage resource sharing facility framework rather than a total solutiondetailed workplan and architecture available, presented to the user community in summer
Carefully watching the tape technology developments (not really commodity) in depth knowledge and understanding is key
Matt
hia
s K
ase
mann
, Sta
tus
of
the L
CG
2
6.1
.2004
15
/45
Storage (II)Storage (II)
CASTOR
CERN development of a Hierarchical Storage Management system (HSM) for LHC
Two teams are working in this area : Developer (3.8 FTE) and Support (3 FTE) Support to other institutes currently under negotiation (LCG, HEPCCC)
Usage : 1.7 PB of data with 13 million files, 250 TB disk layer and 10 PB tape storage Central Data Recording and data processing NA48 0.5 PB COMPASS 0.6 PB LHC Exp. 0.4 PB
Current CASTOR implementation needs improvements New CASTOR stager A pluggable framework for intelligent and policy controlled file access scheduling Evolvable storage resource sharing facility framework rather than a total solutiondetailed workplan and architecture available, presented to the user community in summer
Carefully watching the tape technology developments (not really commodity) in depth knowledge and understanding is key
Matt
hia
s K
ase
mann
, Sta
tus
of
the L
CG
2
6.1
.2004
16
/45
Network Network
Network infrastructure based on ethernet technology
Need for 2008 a completely new (performance) backbone in the centre based on 10 Gbit technology. Today very few vendors offer this multiport, non-blocking, 10 Gbit router. We have an Enterasys product already under test (openlab, prototype)
Timescale is tight :
Q1 2004 market surveyQ2 2004 install 2-3 different boxes, start thorough testing prepare new purchasing procedures, finance committee vendor selection, large orderQ3 2005 installlation of 25% of new backboneQ3 2006 upgrade to 50%Q3 2007 100% new backbone
Matt
hia
s K
ase
mann
, Sta
tus
of
the L
CG
2
6.1
.2004
17
/45
Currently 4 lines 21 Mbit/s, 622 Mbits/s , 2.5 Gbits/s (GEANT), dedicated 10 Gbit/s line (starlight chicago, DATATAG), next year full 10 Gbit/s production line
Needed for import and export of data, Data Challenges, todays data rate is 10 – 15 MB/s
Tests of mass storage coupling starting (Fermilab and CERN)
Next year more production like tests with the LHC experiments CMS-IT data streaming project inside the LCG framework tests on several layers : bookkeeping/production scheme, mass storage coupling, transfer protocols (gridftp, etc.), TCP/IP optimization
2008 : multiple 10 Gbit/s lines will be availble with the move to 40 Gbit/s connections CMS and LHCb will export the second copy of the raw data to the T1 center , ALICE and ATLAS want to keep the second copy at CERN (still ongoing discussion)
Wide Area Network
Matt
hia
s K
ase
mann
, Sta
tus
of
the L
CG
2
6.1
.2004
18
/45
2008 prediction
Hierarchical Ethernet network 280 GB/s 2 GB/s
~ 8000 mirrored disks ( 4 PB) 2000 mirrored disk 0.25 PB
~ 3000 dual CPU nodes (20 MSI2000) 2000 nodes 1.2 MSI2000
~ 170 tape drives (4 GB/s) 50 drives 0.8 GB/s
~ 25 PB tape storage 10 PB
The CMS HLT will consist of about 1000 nodes with 10 million SI2000 !!
2003 status
ComparisonComparison
Matt
hia
s K
ase
mann
, Sta
tus
of
the L
CG
2
6.1
.2004
19
/45
External Fabric relationsExternal Fabric relations
Collaboration with IndustryopenlabHP, INTEL, IBM, Enterasys, Oracle-- 10 Gbit networking-- new CPU technology-- possibly , new storage technology
Collaboration with IndustryopenlabHP, INTEL, IBM, Enterasys, Oracle-- 10 Gbit networking-- new CPU technology-- possibly , new storage technology
EDG, WP4-- Installation-- Configuration-- Monitoring-- Fault tolerance
EDG, WP4-- Installation-- Configuration-- Monitoring-- Fault tolerance
GRID Technology and deployment-- Common fabric infrastructure-- Fabric GRID interdependencies
GRID Technology and deployment-- Common fabric infrastructure-- Fabric GRID interdependencies
GDB working groups-- Site coordination-- Common fabric issues
GDB working groups-- Site coordination-- Common fabric issues
LCG -- Hardware resources-- Manpower resources
LCG -- Hardware resources-- Manpower resources
External network-- DataTag, Grande-- Data Streaming project with Fermilab
External network-- DataTag, Grande-- Data Streaming project with Fermilab
Collaboration with India-- filesystems-- Quality of Service
Collaboration with India-- filesystems-- Quality of Service
Collaboration with CASPUR--harware and software benchmarks and tests, storage and network
Collaboration with CASPUR--harware and software benchmarks and tests, storage and network
CERN IT• Main Fabric providerCERN IT• Main Fabric provider
LINUX-- Via HEPIX RedHat license coordination inside HEP (SLAC, Fermilab) certification and security
LINUX-- Via HEPIX RedHat license coordination inside HEP (SLAC, Fermilab) certification and security
CASTOR-- SRM definition and implementation (Berkeley, Fermi, etc.)-- mass storage coupling tests (Fermi)-- scheduler integration (Maui, LSF)-- support issues (LCG, HEPCCC)
CASTOR-- SRM definition and implementation (Berkeley, Fermi, etc.)-- mass storage coupling tests (Fermi)-- scheduler integration (Maui, LSF)-- support issues (LCG, HEPCCC)
Online-Offline boundaries-- workshop and discussion with Experiments-- Data Challenges
Online-Offline boundaries-- workshop and discussion with Experiments-- Data Challenges
Matt
hia
s K
ase
mann
, Sta
tus
of
the L
CG
2
6.1
.2004
20
/45
FA Timeline FA Timeline
2004 2005 2006 2007 2008
LCG Computing TDRLCG Computing TDR
25% of network backbone25% of network backbone
Phase 2 installations tape,cpu,disk 30 %Phase 2 installations tape,cpu,disk 30 %
LHC start
50% of network backbone50% of network backbone
100% of network backbone100% of network backbone
Decision onstorage solutionDecision onstorage solution
Power and cooling0.8 MWPower and cooling0.8 MW
Power and cooling2.5 MWPower and cooling2.5 MW
Phase 2 installations tape,cpu,disk 60 %Phase 2 installations tape,cpu,disk 60 %
Power and cooling1.6 MWPower and cooling1.6 MW
Decision on batchschedulerDecision on batchscheduler
preparations, benchmarks,data challengesarchitecture verificationevaluationscomputing models
Matt
hia
s K
ase
mann
, Sta
tus
of
the L
CG
2
6.1
.2004
21
/45
Resources and Requests
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
1 2 3 4 5 6
Month 2004
KSI2
000
LHCb
CMS
ATLAS
ALICE
Possible res.
Tier resources
Matt
hia
s K
ase
mann
, Sta
tus
of
the L
CG
2
6.1
.2004
22
/45
Tier 0 CERNTier 1 Centres Brookhaven National Lab CNAF Bologna Fermilab FZK Karlsruhe IN2P3 Lyon Rutherford Appleton Lab
(UK) University of Tokyo CERN
Other CentresAcademica Sinica (Taipei)BarcelonaCaltechGSI DarmstadtItalian Tier 2s(Torino, Milano, Legnaro)Manno (Switzerland)Moscow State UniversityNIKHEF AmsterdamOhio Supercomputing CentreSweden (NorduGrid)Tata Institute (India)Triumf (Canada)UCSDUK Tier 2sUniversity of Florida– Gainesville University of Prague……
LHC Computing Grid Service
Initial sites – deploying now Ready in next 6-12 months
Matt
hia
s K
ase
mann
, Sta
tus
of
the L
CG
2
6.1
.2004
23
/45
Elements of a Production LCG Service
Middleware: Testing and certification Packaging, configuration, distribution and site validation Support – problem determination and resolution; feedback to middleware
developers Operations:
Grid infrastructure services Site fabrics run as production services Operations centres – trouble and performance monitoring, problem resolution –
24x7 globally RAL is leading sub-project on developing operations services
Initial prototype – • Basic monitoring tools• Mail lists and rapid communications/coordination for problem resolution
Support: Experiment integration – ensure optimal use of system User support – call centres/helpdesk – global coverage; documentation; training FZK leading sub-project to develop user support services
Initial prototype – • Web portal for problem reporting • Expectation that initially experiments will triage problems and experts will submit
LCG problems to the support service
Matt
hia
s K
ase
mann
, Sta
tus
of
the L
CG
2
6.1
.2004
24
/45
Timeline for the LCG services
Event simulation productions
Service for Data Challenges,batch analysis, simulation
Validation of computing models
Acquisition, installation, testing of Phase 2 service
Phase 2 service in production
2003 2006
LCG-1LCG-1 LCG-2LCG-2 LCG-3LCG-3
Agree LCG-1 Spec
LCG-1 service opens
LCG-2 with upgraded m/w, management etc.
TDR for Phase 2
Stabilize, expand, develop Evaluation 2nd generation middleware
LCG-3 full multi-tier prototype batch+interactive service
Computing model TDR’s
20052004
CERN
Deployment Activities: Human ResourcesDeployment Activities: Human ResourcesDeployment Activities: Human ResourcesDeployment Activities: Human Resources
Activity CERN/LCG External
Integration & Certification 6 External tb sites
Debugging/development/mw support
3
Testing 32 + VDT testers
group
Experiment Integration & Support
5
Deployment & Infrastructure Support
5.5RC system managers
Security/VO Management 2Security Task
Force
Operations CentresRAL + GOC Task
Force
Grid User SupportFZK + US Task
Force
Management 1
Totals 25.5
Team of 3 Russians have 1 at CERN at a given time (3 months)
Refer to Security talk
Refer to Operations Centre talk
In collaboration
The GDA team has been very understaffed – only now has this improved with 7 new fellows
There are many opportunities for more collaborative involvement in operational and infrastructure activities
CERN
2003 Milestones2003 Milestones2003 Milestones2003 Milestones
Project Level 1 Deployment milestones for 2003:
July: Introduce the initial publicly available LCG-1 global grid service
• With 10 Tier 1 centres in 3 continents
November: Expanded LCG-1 service with resources and functionality sufficient for the 2004 Computing Data Challenges
• Additional Tier 1 centres, several Tier 2 centres – more countries• Expanded resources at Tier 1s • Agreed performance and reliability targets
• The idea was:– Deploy a service in July
• Several months to gain experience (operations, experiments)– By November
• Meet performance targets (30 days running) – experiment verification
• Expand resources – regional centres and compute resources• Upgrade functionality
CERN
Milestone StatusMilestone StatusMilestone StatusMilestone Status
• July milestone was 3 months late– Late middleware, slow takeup in regional centres
• November milestone will be partially met– LCG-2 will be a service for the Data Challenges– Regional Centres added to the level anticipated (23), including
several Tier 2 (Italy, Spain, UK)– But:
• lack of operational experience• Experiments have only just begun serious testing
• LCG-2 will be deployed in December– Functionality required for DCs– Meet verification part of milestone with LCG-2 – early next year– Experiments must drive addition of resources into the service– Address operational and functional issues in parallel with
operations• Stability, adding resources
– This will be a service for the Data Challenges
CERN
Sites in LCG-1 – 21 NovSites in LCG-1 – 21 NovSites in LCG-1 – 21 NovSites in LCG-1 – 21 Nov
–FNAL–FZK
•Krakow
–Moscow–Prague–RAL
•Imperial C.•Cavendish
–Taipei–Tokyo
–PIC-Barcelona •IFIC Valencia
•Ciemat Madrid•UAM Madrid•USC Santiago de Compostela•UB Barcelona•IFCA Santander
–BNL–Budapest–CERN–CNAF
•Torino•MilanoSites to enter soon
CSCS Switzerland, Lyon, NIKHEF
More tier2 centres in Italy, UK
Sites preparing to join
Pakistan, Sofia
CERN
Achievements – 2003 Achievements – 2003 Achievements – 2003 Achievements – 2003
• Put in place the Integration and Certification process:– Essential to prepare m/w for deployment – the key tool in trying to
build a stable environment– Used seriously since January for LCG-0,1,2 – also provided crucial input
to EDG LCG is more stable than earlier systems
• Set up the deployment process:• Tiered deployment and support system is working• Currently support load on small team is high, must devolve to GOC
• Support experiment deployment on LCG-0,1• User support load high – must move into support infrastructure (FZK)• CMS use of LCG-0 in production• Produced a comprehensive User Guide
• Put in place security policies and agreements• Particularly important agreements on registration requirements
• Basic Operations Centre and Call Centre frameworks in place
• Expect to be ready for the 2004 DCs• Essential infrastructures are ready, but have not yet been production
tested• And, improvements will happen in parallel with operating the system
CERN
Issues Issues Issues Issues
• Middleware is not yet production quality – Although a lot has been improved, still unstable, unreliable– Some essential functionality was not delivered – LCG had to address
• Deployment tools not adequate for many sites– Hard to integrate into existing computing infrastructures– Too complex, hard to maintain and use
• Middleware limits a site’s ability to participate in multiple grids– Something that is now required for many large sites supporting many
experiments, and other applications
• We are only now beginning to try and run LCG as a service– Beginning to understand and address missing tools, etc for operation
• Delays have meant that we could not yet address these fundamental issues as we had hoped to this year
CERN
Expected Developments in 2004Expected Developments in 2004Expected Developments in 2004Expected Developments in 2004
• General:– LCG-2 will be the service run in 2004 – aim to evolve
incrementally– Goal is to run a stable service
• Some functional improvements:– Extend access to MSS – tape systems, and managed disk pools– Distributed replica catalogs – with Oracle back-ends
• To avoid reliance on single service instances
• Operational improvements:– Monitoring systems – move towards proactive problem finding,
ability to take sites on/offline– Continual effort to improve reliability and robustness– Develop accounting and reporting
• Address integration issues:– With large clusters, with storage systems– Ensure that large clusters can be accessed via LCG – Issue of integrating with other experiments and apps
CERN
Services in 2004 – EGEE Services in 2004 – EGEE Services in 2004 – EGEE Services in 2004 – EGEE
• LCG-2 will be the production service during 2004– Will also form basis of EGEE initial production service– EGEE will take over operations during 2004 – but same teams– Will be maintained as a stable service– Peering with other grid infrastructures– Will continue to be developed
• Expect in parallel a development service – Q204– Based on EGEE middleware prototypes– Run as a service on a subset of EGEE/LCG production sites– Demonstrate new architecture and functionality to applications
• Additional resources to achieve this come from EGEE
• Development service must demonstrate superiority– All aspects: functionality, operational, maintainability, etc.
• End 2004 – hope that development service becomes the production service
CERN
Deployment: SummaryDeployment: SummaryDeployment: SummaryDeployment: Summary
• LCG-1 is deployed to 23+ sites– Despite the complexities and problems
• The LCG-2 functionality– Will support production activities for the Data Challenges– Will allow basic tests of analysis activities– And the infrastructure is there to provide all aspects of support
• Staffing situation at CERN is now better– Most of what was done so far was with limited staffing– Need to clarify situation in the regional centres
• We are now in a position to address the complexities
CERN
EGEE-LCG RelationshipEGEE-LCG RelationshipEGEE-LCG RelationshipEGEE-LCG Relationship
Enabling Grids for e-Science in Europe – EGEE• EU project approved to provide partial funding for
operation of a general e-Science grid in Europe, including the supply of suitable middleware
• EGEE provides funding for 70 partners, large majority of which have strong HEP ties
Agreement between LCG and EGEE management on very close integration
OPERATIONS• LCG operates the EGEE infrastructure as a service to
EGEE - ensures compatibility between the LCG and EGEE grids
• In practice – the EGEE grid will grow out of LCG• The LCG Grid Deployment Manager (Ian Bird) serves also
as the EGEE Operations Manager
CERN
EGEE-LCG Relationship (ii)EGEE-LCG Relationship (ii)EGEE-LCG Relationship (ii)EGEE-LCG Relationship (ii)
• MIDDLEWARE• The EGEE middleware activity provides a middleware package
– satisfying requirements agreed with LCG (..HEPCAL, ARDA, ..)– and equivalent requirements from other sciences
• Middleware - the tools that provide functions –– that are of general application .. – …. not HEP-special or experiment-special– and that we can reasonably expect to come in the long term from
public or commercial sources (cf internet protocols, unix, html)• Very tight delivery timescale dictated by LCG requirements
– Start with LCG-2 middleware– Rapid prototyping of a new round of middleware. First “production”
version in service by end 2004• The EGEE Middleware Manager (Frédéric Hemmer) serves also as
the LCG Middleware Manager
CERN
PARTNERS
70 partners organized in nine regional federations
Coordinating and Lead Partner: CERN
CENTRAL EUROPE – FRANCE - GERMANY & SWITZERLAND – ITALY - IRELAND & UK - NORTHERN EUROPE - SOUTH-EAST EUROPE - SOUTH-WEST EUROPE – RUSSIA - USA
STRATEGY
Leverage current and planned national and regional Grid programmes Build on existing investments in Grid Technology by EU and US Exploit the international dimensions of the HEP-LCG programme Make the most of planned collaboration with NSF CyberInfrastructure initiative
A seamless international Grid infrastructure to provide researchers in academia and industry
with a distributed computing facility
ACTIVITY AREAS
SERVICESDeliver “production level” grid services (manageable, robust, resilient to failure) Ensure security and scalability
MIDDLEWARE Professional Grid middleware re-engineering activity in support of the production services
NETWORKING Proactively market Grid services to new research communities in academia and industry Provide necessary education
CERN
Integrate regional grid efforts Represent leading grid activities in Europe
EGEE: partner federationsEGEE: partner federations
9 regional federations covering 70 partners in
26 countries
CERN
EGEE ProposalEGEE ProposalEGEE ProposalEGEE Proposal
• Proposal submitted to EU IST 6th framework call on 6th May 2003•Executive summary (exec summary: 10 pages; full proposal: 276 pages)
http://agenda.cern.ch/askArchive.php?base=agenda&categ=a03816&id=a03816s5%2Fdocuments%2FEGEE-executive-summary.pdf
•Activities•Deployment of Grid Infrastructure
•Provide a grid service for science research•Initial service will be based on LCG-1•Aim to deploy re-engineered middleware at the end of year 1
•Re-Engineering of grid middleware •OGSA environment – well defined services, interfaces, protocols•In collaboration with US and Asia-Pacific developments•Using LCG and HEP experiments to drive US-EU interoperability and common solutions•A common design activity should start now
•Dissemination, Training and Applications •Initially HEP & Bio
CERNEGEE: timelineEGEE: timeline
• May 2003• proposal submitted
• July 2003• positive EU reaction
• September 2003• started negotiation• approx 32 M€ over
2 years
• December 2003• sign EU contract
• April 2004• start project
Matt
hia
s K
ase
mann
, G
rid C
om
puti
ng, 1
6.1
0.2
003
43
/45
Group Level Analysis in an HENP experiment User specifies job information including
• Selection criteria;• Metadata Dataset (input);• Information about s/w (library) and configuration versions • Output AOD and/or TAG Dataset (typical);• Program to be run;
User submits job; Program is run; Selection Criteria are used for a query on the Metadata Dataset; Event ID satisfying the selection criteria and Logical Dataset Name of
corresponding Datasets are retrieved; Input Datasets are accessed; Events are read; Algorithm (program) is applied to the events; Output Datasets are uploaded; Experiment Metadata is updated; Report summarizing the output of the jobs is prepared for the group
(eg. how many evts to which stream, ...) extracting the information from the application and GridMW
•Authentication•Authorization•Metadata catalog
•Workload management
•Metadata catalog
•Package manager•Compute element
•Metadata catalog•Job provenance
•Auditing
HENP Distributed Analysis
•File catalog•Data management•Storage Element
•File catalog•Data management•Storage Element
LCG
AliEn EDG VDT
LHC Computing Grid:A Roadmap towards Distributed Analysis
• based on review of experiments needs
• input to EGEE Middleware re-engineering
Information Service
Authentication
Authorization
Auditing
Grid Monitoring
Workload Management
Metadata Catalogue
File Catalogue
Data Management
Computing Element
Storage Element
Job Monitor
Job Provenance
Package Manager
DB Proxy
User Interface
API
Accounting
7: 12:
5:
13:
8:
15: 11:
9: 10:
1:
4:
2:
3:
6:
14:
Proposed: Key Services for HEP Distributed Analysis
24 November 2003 - 47EGEE MiddlewareLHCC LCG review
ARDA, EGEE & LCG Middleware possible evolution
LCG2 LCG2 ’’
12/2003 12/2004
ARDAStarts
ARDAPrototype
EGEEStarts
EGEERelease
Candidate
LCG2 ’
LCG
LHCC Review , Nov 2003 Slide 48 Torre Wenaus, BNL/CERN
Applications Area Organisation
Applicationsmanager
Architects forumAA manager (chair),
experiment architects,project leaders,ROOT leader
Applicationsarea
meeting
Simulationproject
PIproject
SEALproject
POOLproject
SPIproject
decisionsstrategy
consultation
ROOTUser - provider
~2 meetings/mo
Public minutes
~3 meetings/mo
Open to all
25-50 attendees
Now developing a more collaborative
ROOT relationship than user/provider
Projects organized into work packages
LHCC Review , Nov 2003 Slide 49 Torre Wenaus, BNL/CERN
Apps Area Projects and their Relationships
Sof
twa
re P
roce
ss &
Inf
rast
ruct
ure
(S
PI)
Core Libraries & Services (SEAL)
Persistency(POOL)
PhysicistInterface
(PI)Simulation…
LCG Applications Area
LHC Experiments
Oth
er L
CG
Are
as
LHCC Review , Nov 2003 Slide 50 Torre Wenaus, BNL/CERN
EventGeneration
Core Services
Dictionary
Whiteboard
Foundation and Utility Libraries
DetectorSimulation
Engine
Persistency
StoreMgr
Reconstruction
Algorithms
Geometry Event Model
GridServices
I nteractiveServices
Modeler
GUIAnalysis
EvtGen
Calibration
Scheduler
Fitter
PluginMgr
Monitor
NTuple
Scripting
FileCatalog
ROOT GEANT4 DataGrid Python Qt
Monitor
. . .MySQLFLUKA
EventGeneration
Core Services
Dictionary
Whiteboard
Foundation and Utility Libraries
DetectorSimulation
Engine
Persistency
StoreMgr
Reconstruction
Algorithms
Geometry Event Model
GridServices
I nteractiveServices
Modeler
GUIAnalysis
EvtGen
Calibration
Scheduler
Fitter
PluginMgr
Monitor
NTuple
Scripting
FileCatalog
ROOT GEANT4 DataGrid Python Qt
Monitor
. . .MySQLFLUKA
Applications Domain Decomposition
Products mentioned are examples; not a comprehensive list
Project activity in all expected areasexcept grid services (coming with ARDA)
LHCC Review , Nov 2003 Slide 51 Torre Wenaus, BNL/CERN
Level 1 and Highlighted Level 2 milestones
Jan 03: SEAL, PI workplans approved Mar 03: Simulation workplan approved Apr 03: SEAL V1 (Priority POOL needs) May 03: SPI software library fully deployed Jun 03: General release of POOL (LHCC Level 1)
Functionality requested by experiments for production usage Jul 03: SEAL framework services released (experiment directed) Jul 03: CMS POOL/SEAL integration Sep 03: ATLAS POOL/SEAL integration Oct 03: CMS POOL/SEAL validation (~1M events/week written) Dec 03: LHCb POOL/SEAL integration Jan 04: ATLAS POOL/SEAL validation (50TB DC1 POOL store) May 04: Generator event database beta in production Oct 04: Generic simulation framework production release Dec 04: Physics validation document Mar 05: Full function release of POOL (LHCC Level 1)
Completed
See supplemental slide for current milestone performance
LHCC Review , Nov 2003 Slide 52 Torre Wenaus, BNL/CERN
L1+L2 milestone counts by quarter
0
2
4
6
8
10
12
14
16
2002
Q2
2002
Q3
2002
Q4
2003
Q1
2003
Q2
2003
Q3
2003
Q4
2004
Q1
2004
Q2
2004
Q3
2004
Q4
2005
Q1
Level 2
Level 1
2004 milestones will be fleshed out by the workplan updates due this quarter the finalization of the slate of ~10 L2 milestones for the next quarter
(~2 per project) that we do before each quarterly report New milestones will be added in ARDA planning
LHCC Review , Nov 2003 Slide 53 Torre Wenaus, BNL/CERN
Applications Area Personnel Resources
LCG applications area hires complete 21 working; target in Sep 2001 LCG proposal was 23 Contributions from UK, Spain, Switzerland, Germany,
Sweden, Israel, Portugal, US, India, and Russia Similar contribution levels from CERN, experiments
In FTEs.
Experiment number includes CERN
people working on experiments
See supplemental slide for
detail on personnel sources
LHCC Review , Nov 2003 Slide 54 Torre Wenaus, BNL/CERN
Personnel Distribution
Effort levels match needanticipated in blueprint RTAG
LHCC Review , Nov 2003 Slide 55 Torre Wenaus, BNL/CERN
And in Germany...
We have GridKa and GSI and DESY
And we are starting to increase the HEP Grid footprint: Feb. 11: meeting of the German HEP Groups to
… to establish more flags on the German Grid map.
LHCC Review , Nov 2003 Slide 56 Torre Wenaus, BNL/CERN
Computing for Particle Physics: DESY
At DESY HEP computing is part of the physics programme.
DESY operates Tier 0 and Tier 1 centres for H1, ZEUS, Hermes, HeraB, LC and computing for theory.
Services & Development(highlights)
Distributed Monte Carlo production since 1993
Data handling on the Grid: dCache
OO simulation: Geant4 Large computing clusters
and data storage