21 march 2000system managers meeting slide 1 the particle physics computational grid paul...
Post on 03-Jan-2016
217 Views
Preview:
TRANSCRIPT
21 March 2000 System Managers Meeting Slide 1
The Particle Physics Computational Grid
Paul Jeffreys/CCLRC
21 March 2000 System Managers Meeting Slide 2
Financial Times, 7 March 2000
21 March 2000 System Managers Meeting Slide 3
Front Page FT, 7 March 2000
21 March 2000 System Managers Meeting Slide 4
LHC Computing: Different from Previous Experiment Generations
– Geographical dispersion: of people and resources – Complexity: the detector and the LHC environment– Scale: Petabytes per year of data– (NB – for purposes of this talk – mostly LHC specific)
~5000 Physicists 250 Institutes
~50 Countries
Major challenges associated with:Major challenges associated with: Coordinated Use of Distributed Computing Resources Coordinated Use of Distributed Computing Resources Remote software development and physics analysisRemote software development and physics analysis Communication and collaboration at a distanceCommunication and collaboration at a distance
R&D: A New Form of Distributed System: Data-GridR&D: A New Form of Distributed System: Data-Grid
21 March 2000 System Managers Meeting Slide 5
The LHC Computing Challenge – by example
• Consider UK group searching for Higgs particle in LHC experiment– Data flowing off detectors at 40TB/sec (30 million floppies/sec)!
• Factor of c. 5.105 rejection made online before writing to media– But have to be sure not throwing away the physics with the background– Need to simulate samples to exercise rejection algorithms
• Simulation samples will be created around the world• Common access required
– After 1 year, 1PB sample of experimental events stored on media• Initial analysed sample will be at CERN, in due course elsewhere
– UK has particular detector expertise (CMS: e-, e+, )– Apply our expertise to : access 1PB exptal. data (located?), re-analyse
e.m. signatures (where?) to select c. 1 in 104 Higgs candidates, but S/N will be c. 1 to 20 (continuum background), and store results (where?)
• Also .. access some simulated samples (located?), generate (where?) additional samples, store (where?) -- PHYSICS (where?)
• In addition .. strong competition• Desire to implement infrastructure in generic way
21 March 2000 System Managers Meeting Slide 6
Proposed Solution to LHC Computing Challenge (?)
• A data analysis ‘Grid’ for High Energy Physics
Tier 1
T2
T2
T2
T2
T2
3
3
3
3
3
33
3
3
3
3
3
CERN T2
44 4 4
33
21 March 2000 System Managers Meeting Slide 7
Access Patterns
Raw Data ~1000 Tbytes
AOD ~10 TB
AOD ~10 TB
AOD ~10 TB
AOD ~10 TB
AOD ~10 TB
AOD ~10 TB
AOD ~10 TB
AOD ~10 TB
AOD ~10 TB
Reco-V1 ~1000 Tbytes Reco-V2 ~1000 Tbytes
ESD-V1.1 ~100 Tbytes
ESD-V1.2 ~100 Tbytes
ESD-V2.1 ~100 Tbytes
ESD-V2.2 ~100 Tbytes
Access Rates (aggregate, average)
100 Mbytes/s (2-5 physicists)
500 Mbytes/s (5-10 physicists)
1000 Mbytes/s (~50 physicists)
2000 Mbytes/s (~150 physicists)
Typical particle physics experiment in 2000-2005:On year of acquisition and analysis of data
21 March 2000 System Managers Meeting Slide 8
Hierarchical Data Grid
• Physical– Efficient network/resource use local > regional > national > oceanic
• Human– University/regional computing complements national labs, in turn
complements accelerator site• Easier to leverage resources, maintain control, assert priorities at
regional/local level– Effective involvement of scientists and students independently of
location
• The ‘challenge for UK particle physics’ … How do we:– Go from the 200 PC99 farm maximum of today to 10000 PC99 centre?– Connect/participate in European and World-wide PP grid?– Write the applications needed to operate within this hierarchical grid?Write the applications needed to operate within this hierarchical grid?ANDAND – Ensure other disciplines able to work with us, our developments &
applications are made available to others, exchange of expertise, and enjoy fruitful collaboration with Computer Scientists and Industry
21 March 2000 System Managers Meeting Slide 9
Quantitative Requirements
• Start with typical experiment’s Computing Model• UK Tier-1 Regional Centre specification• Then consider implications for UK Particle Physics Computational
Grid– Over years 2000, 2001, 2002, 2003
– Joint Infrastructure Bid made for resources to cover this
– Estimates of costs
• Look further ahead
21 March 2000 System Managers Meeting Slide 10
21 March 2000 System Managers Meeting Slide 11
21 March 2000 System Managers Meeting Slide 12
21 March 2000 System Managers Meeting Slide 13
21 March 2000 System Managers Meeting Slide 14
21 March 2000 System Managers Meeting Slide 15
21 March 2000 System Managers Meeting Slide 16
21 March 2000 System Managers Meeting Slide 17
21 March 2000 System Managers Meeting Slide 18
Steering Committee
‘Help establish the Particle Physics Grid activities in the UK'a. An interim committee be put in place.b. The immediate objectives would be prepare for the presentation to John Taylor on
27 March 2000, and to co-ordinate the EU 'Work Package' activities for April 14c. After discharging these objectives, membership would be re-consideredd. The next action of the committee would be to refine the Terms of Reference
(presented to the meeting on 15 March)e. After that the Steering Committee will be charged with commissioning a Project
Team to co-ordinate the Grid technical work in the UKf. The interim membership is:
• Chairman: Andy Halley• Secretary: Paul Jeffreys• Tier 2 reps: Themis Bowcock, Steve Playfer• CDF: Todd Hoffmann• D0: Ian Bertram• CMS: David Britton• BaBar: Alessandra Forti• CNAP: Steve Lloyd
– The 'labels' against the members are not official in any sense at this stage, but the members are intended to cover these areas approximately!
21 March 2000 System Managers Meeting Slide 19
UK Project Team
• Need to really get underway!• System Managers crucial!• PPARC needs to see genuine plans and genuine activities…• Must coordinate our activities• And
– Fit in with CERN activities
– Meet needs of experiments (BaBar, CDF, D0, …)
• So … go through range of options and then discuss…
21 March 2000 System Managers Meeting Slide 20
EU Bid(1)
• Bid will be made to EU to link national grids– “Process” has become more than ‘just a bid’
• Almost reached the point where have to be active participant in EU bid, and associated activities, in order to access data from CERN in the future
• Decisions need to be taken today…
• Timescale:– March 7 Workshop at CERN to prepare programme of work (RPM)
– March 17 Editorial meeting to look for industrial partners
– March 30 Outline of paper used to obtain pre-commitment of partners
– April 17 Finalise ‘Work Packages’ – see next slides
– April 25 Final draft of proposal
– May 1 Final version of proposal for signature
– May 7 Submit
21 March 2000 System Managers Meeting Slide 21
EU Bid(2)
• The bid was originally for 30MECU, with matching contribution from national funding organisations– Now scaled down, possibly to 10MECU
– Possibly as ‘taster’ before follow-up bid?
– EU funds for Grid activities in Framework VI likely to be larger
• Work Packages have been defined– Objective is that countries (through named individuals) take
responsibility to split up the work and define deliverables within each, to generate draft content for EU bid
– BUT
• Without doubt the same people will be well positioned to lead the work in due course
• .. And funds split accordingly??
• Considerable manoeuvering!
– UK – need to establish priorities, decide where to contribute…
21 March 2000 System Managers Meeting Slide 22
Work Packages
Middleware Contact Point
1 Grid Work Scheduling Cristina Vistoli INFN2 Grid Data Management Ben Segal CERN3 Grid Application Monitoring Robin Middleton UK4 Fabric Management Tim Smith CERN5 Mass Storage Management Olof Barring CERNInfrastructure6 Testbed and Demonstrators François Etienne IN2P37 Network Services Christian Michau CNRSApplications8 HEP Applications Hans Hoffmann 4expts9 Earth Observation Applications Luigi Fusco10 Biology Applications Christian MichauManagement11 Project Management Fabrizio Gagliardi CERN
Robin is ‘place-holder’ – holding UK’s interest (explanation in Open Session)
21 March 2000 System Managers Meeting Slide 23
UK Participation in Work Packages
MIDDLEWARE1. Grid Work Scheduling2. Grid Data Management TONY DOYLE, Iain Bertram?3. Grid Application monitoring ROBIN MIDDLETON, Chris Brew4. Fabric Management5. Mass Storage Management JOHN GORDON
INFRASTRUCTURE6. Testbed and demonstrators7. Network Services PETER CLARKE, Richard Hughes-Jones
APPLICATIONS8. HEP Applications
21 March 2000 System Managers Meeting Slide 24
PPDG
DoE NGI Program PI Meeting, October 1999Particle Physics Data Grid Richard P. Mount, SLAC
PPDG as an NGI ProblemPPDG Goals
The ability to query and partially retrieve hundreds of terabytesacross Wide Area Networks within seconds,
Making effective data analysis from ten to one hundred USuniversities possible.
PPDG is taking advantage of NGI services in three areas:– Differentiated Services: to allow particle-physics bulk data
transport to coexist with interactive and real-time remotecollaboration sessions, and other network traffic.
– Distributed caching: to allow for rapid data delivery in response tomultiple “interleaved” requests
– “Robustness”: Matchmaking and Request/Resourceco-scheduling: to manage workflow and use computing and netresources efficiently; to achieve high throughput
21 March 2000 System Managers Meeting Slide 25
PPDG
27Richard P. Mount CHEP 2000Data A nalysis for SLAC Physics
PPDG Resources• Network Testbeds:
– ESNET links at up to 622 Mbits/s (e.g. LBNL-ANL)
– Other testbed links at up to 2.5 Gbits/s (e.g. Caltech-SLAC via NTON)
• Data and Hardware:– Tens of terabytes of disk-resident particle physics data (plus hundreds of
terabytes of tape-resident data) at accelerator labs;
– Dedicated terabyte university disk cache;
– Gigabit LANs at most sites.
• Middleware Developed by Collaborators:– Many components needed to meet short-term targets (e.g.Globus, SRB,
MCAT, Condor,OOFS,Netlogger, STACS, Mass Storage Management)already developed by collaborators.
• Existing Achievements of Collaborators:– WAN transfer at 57 Mbytes/s;
– Single site database access at 175 Mbytes/s
21 March 2000 System Managers Meeting Slide 26
PPDG
DoE NGI Program PI Meeting, October 1999Particle Physics Data Grid Richard P. Mount, SLAC
PPDG First Year Milestones• Project Start August, 1999
• Decision on existing middleware to be October, 1999integrated into the first-year Data Grid;
• First demonstration of high-speed January, 2000site-to-site data replication;
• First demonstration of multi-site February, 1999cached file access (3 sites);
• Deployment of high-speed site-to-site July, 2000data replication in support of twoparticle-physics experiments;
• Deployment of multi-site cached file August, 2000access in partial support of at least twoparticle-physics experiments.
21 March 2000 System Managers Meeting Slide 27
PPDG
DoE NGI Program PI Meeting, October 1999Particle Physics Data Grid Richard P. Mount, SLAC
First Year PPDG “System” Components
Middleware Components (Initial Choice): See PPDG Proposal Page 15 Object and File-Based Objectivity/DB (SLAC enhanced)
Application Services GC Query Object, Event Iterator, Query Monitor
FNAL SAM System Resource Management Start with Human Intervention
(but begin to deploy resource discovery & mgmnt tools) File Access Service Components of OOFS (SLAC) Cache Manager GC Cache Manager (LBNL) Mass Storage Manager HPSS, Enstore, OSM (Site-dependent) Matchmaking Service Condor (U. Wisconsin) File Replication Index MCAT (SDSC) Transfer Cost Estimation Service Globus (ANL) File Fetching Service Components of OOFS File Movers(s) SRB (SDSC); Site specific End-to-end Network Services Globus tools for QoS reservation Security and authentication Globus (ANL)
21 March 2000 System Managers Meeting Slide 28
LHCb contribution to EU proposal HEP Applications Work Package
• Grid testbed in 2001, 2002• Production 106 simulated b->D*pi
– Create 108 events at Liverpool MAP in 4 months– Transfer 0.62TB to RAL– RAL dispatch AOD and TAG datasets to other sites
• 0.02TB to Lyon and CERN
• Then permit a study of all the various options for performing a distributed analysis in a Grid environment
21 March 2000 System Managers Meeting Slide 29
American Activities
• Collaboration with Ian Foster– Transatlantic collaboration using GLOBUS
• Networking– QoS tests with SLAC
– Also link in with GLOBUS?
• CDF and D0– Real challenge to ‘export data’
– Have to implement 4Mbps connection
– Have to set up mini Grid
• BaBar– Distributed LINUX farms etc in JIF bid
21 March 2000 System Managers Meeting Slide 30
Networking Proposal - 1DETAILS Demonstration of high rate site to site file replication Single site-to-site tests at low rates to set up technologies and gain experience. These should include and benefit the experiments which will be taking data. TIER-0 -> TIER-1 (CERN-RAL) TIER-1 -> TIER-1 (FNAL-RAL) TIER-1 to TIER-2 (RAL-LVPL, GLA/ED) Use existing monitoring tools. Adapt to function as resource predictors also. Multi site file replication, cascaded file replication at modest rates rates. Transfers at Neo-GRID rates
DEPENDENCIES/RISKS Availability of temporary PVCs on inter and intra-national WANs – or from collaborating industries. Needs negotiation now. Monitoring expertise/ tools already available: PPNCG(UK), ICFA(Worldwide)
RESOURCES REQUIRED 1.5 SY 10 Mbit/s PVCs between sites in 00-01 50 Mbits/s PVCs between sites in 01-02 > 100 Mbits/s PVCs in 02-03
MILESTONES Jan-01: Demonstration of low rate transfers between all sites Jan-02: Demonstration of cascaded file transfer Demonstration of sustained modest rate transfers 03: Implementation of sustained transfers of real data at rates approaching 1000 Mbits/s
21 March 2000 System Managers Meeting Slide 31
Networking - 2
Differentiated Services Deploy some form of DiffServ on dedicated PVCs. Measure high and low priority latency and rates as a function of strategy and load.
[Depends upon QoS developments]. Attempt to deploy end-to-end QoS across several interconnected networks.
PVCs must be QoS capable. May rely upon proprietary or technology dependent factors in short term. Monotoring tools WAN end-to-end depends upon expected developments by network suppliers.
1.5 SY Same PVCs as in NET-1 Production deployment of QoS on WAN
Apr-01: Successful deployment and measurement of pilot QoS on PVCs under project control.
21 March 2000 System Managers Meeting Slide 32
Networking - 3Monitoring and Metrics for resource prediction. Survey and define monitoring requirements of GRID . Adapt existing monitoring tools for for measurement and monitoring needs of network work packages (all NET-xx) as described here. In particular develop protocol sensitive monitoring as will be needed for QoS Develop and test prediction metrics
PPNCG monitoring ICFA monitoring
0.5 SY
Dec-00: Interim report on GRID monitoring requirements Jul-01: Dec-01: Finish of adaptation of existing tools for monitoring Jul-02: First prototype predictive tools deployed Dec-02: Report on tests of predictive tools
21 March 2000 System Managers Meeting Slide 33
Networking - 4Data Flow modelling Assimilate Monarc modeling tool set. Determine requirements of model of UK GRID – and to what extent this factorises or not from international GRID. Appraise work needed to adapt/write necessary components. Configure and run models in parallel with transfer tests NET-1 and QoS tests NET-2 for calibration purposes Apply models to determination of GRID topology and resource location.
Applicability of existing tools unknown before appraisal.
3 SY
Oct-00: Assimilate Monarc Dec-00: Determine requirements of GRID model Determine scope of work needed to adapt/provide components. ??: Configure initial model.
21 March 2000 System Managers Meeting Slide 34
Pulling it together…
• Networking:– EU work package
– Existing tests
– Integration of ICFA studies to Grid
• Will networking lead the non-experiment activities??
• Data Storage– EU work package
• Grid Application Monitoring– EU work package
• CDF, D0 and BaBar– Need to integrate these into Grid activities
– Best approach is to centre on experiments
21 March 2000 System Managers Meeting Slide 35
…Pulling it all together
• Experiment-driven– Like LHCb, meet specific objectives
• Middleware preparation– Set up GLOBUS?
• QMW, RAL, DL ..?– Authenticate– Familiar– Try moving data between sites
• Resource Specification• Collect dynamic information• Try with international collaborators
– Learn about alternatives to GLOBUS
– Understand what is missing
– Exercise and measure performance of distributed cacheing
• What do you think?• Anyone like to work with Ian Foster for 3 months?!
top related