us-atlas grid efforts john huth harvard university agency review of lhc computing lawrence berkeley...
TRANSCRIPT
US-ATLAS Grid Efforts
John HuthJohn Huth
Harvard UniversityHarvard University
Agency Review of LHC ComputingAgency Review of LHC Computing
Lawrence Berkeley LaboratoryLawrence Berkeley LaboratoryJanuary 14-17, 2003January 14-17, 2003
14 Jan 0314 Jan 03J. Huth LHC Computing Agency ReviewJ. Huth LHC Computing Agency Review 2
Outline
GoalsGoals
Data Challenge experience and plansData Challenge experience and plans
Relation to external groupsRelation to external groups
US ATLAS grid managementUS ATLAS grid management
14 Jan 0314 Jan 03J. Huth LHC Computing Agency ReviewJ. Huth LHC Computing Agency Review 3
Goals
Primary: establish a robust and efficient platform (fabric and Primary: establish a robust and efficient platform (fabric and software) for US LHC and International LHC simulation, software) for US LHC and International LHC simulation, production and data analysisproduction and data analysis Data challenge support User analysis support Production support Ready for data taking on “day one”
Realizing common grid goalsRealizing common grid goals Local control Democratic access to data Support of autonomous analysis communities (private grids)
Value added to other disciplinesValue added to other disciplines Serving as a knowledgeable user community for testing middleware
14 Jan 0314 Jan 03J. Huth LHC Computing Agency ReviewJ. Huth LHC Computing Agency Review 4
US ATLAS Testbed
Develop a tier’ed, scalable computing platformDevelop a tier’ed, scalable computing platform
Test grid middleware integrated with softwareTest grid middleware integrated with software
Test interoperabilityTest interoperability CMS EDG Other groups
Support physics effortsSupport physics efforts Data Challenges Production User analysis
14 Jan 0314 Jan 03J. Huth LHC Computing Agency ReviewJ. Huth LHC Computing Agency Review 5
Testbed Fabric
Production gatekeepers at ANL, BNL, LBNL, BU, IU, UM, OU, UTAProduction gatekeepers at ANL, BNL, LBNL, BU, IU, UM, OU, UTA
Large clusters at BNL, LBNL, IU, UTA, BULarge clusters at BNL, LBNL, IU, UTA, BU Heterogeneous system - Condor, LSF, PBS Currently > 100 nodes available Could double capacity quickly, if needed
+ Multiple R&D gatekeepers+ Multiple R&D gatekeepers gremlin@bnl - iVDGL GIIS heppc5@uta - ATLAS hierarchical GIIS atlas10/14@anl - EDG testing heppc6@uta+gremlin@bnl - glue schema heppc17/19@uta - GRAT development few sites - Grappa portal bnl - VO server few sites - iVDGL testbed
14 Jan 0314 Jan 03J. Huth LHC Computing Agency ReviewJ. Huth LHC Computing Agency Review 6
Lawrence BerkeleyNational Laboratory
BrookhavenNationalLaboratoryIndiana
University
Boston University
ArgonneNationalLaboratory
U Michigan
University ofTexas atArlington
OklahomaUniversity
US -ATLAS testbed launched February 2001Two new sites joining - UNM, SMU
Grid Testbed Sites
14 Jan 0314 Jan 03J. Huth LHC Computing Agency ReviewJ. Huth LHC Computing Agency Review 7
Testbed Tools
Many tools developed by the U.S. ATLAS testbed group Many tools developed by the U.S. ATLAS testbed group during past 2 yearsduring past 2 years
GridView - simple tool to monitor status of testbed Kaushik De, Patrick McGuigan
Gripe - unified user accounts Rob Gardner
Magda - MAnager for Grid DAta Torre Wenaus, Wensheng Deng (see Gardner & Wenaus talks)
Pacman - package management and distribution tool Saul Youssef
Grappa - web portal using active notebook technology Shava Smallen, Dan Engh
GRAT - GRid Application Toolkit Gridsearcher - MDS browser Jennifer Schopf
GridExpert - Knowledge Database Mark Sosebee
VO Toolkit - Site AA Rich Baker
14 Jan 0314 Jan 03J. Huth LHC Computing Agency ReviewJ. Huth LHC Computing Agency Review 8
Recent Accomplishments
May 2002May 2002 Globus 2.0 beta RPM developed at BNL Athena-atlfast grid package developed at UTA Installation using Pacman developed at BU GRAT toolkit for job submission on grid, developed at UTA & OU
June 2002June 2002 Tested interoperability - successfully ran ATLAS MC jobs on CMS & D0 grid
sites ANL demonstrated that U.S. testbed package can run successfully at EDG
sites
July 2002July 2002 New production software released & deployed 2 week Athena-atlfast MC production run using GRAT & GRAPPA Generated 10 million events, thousand files catalogued in Magda, all sites
participating
14 Jan 0314 Jan 03J. Huth LHC Computing Agency ReviewJ. Huth LHC Computing Agency Review 9
Accomplishments contd.
August/September 2002August/September 2002 3 week dc1 production run using GRAT Generated 200,000 events, using ~ 30,000 CPU hours, 2000 files, 100 GB
storage
October/November 2002October/November 2002 Prepare demos for SC2002 Deployed VO server at BNL Tier I facility Deployed new VDT 1.1.5 on testbed Test iVDGL packages Worldgrid, Sciencegrid Interoperability tests with EDG
December 2002December 2002 Developed software evolution plan during meeting with Condor/VDT team at
UW-Madison Generated 75k SUSY and Higgs events for DC1 Total DC1 files generated and stored > 500 GB, total CPU used >1000 CPU
days in 4 weeks
14 Jan 0314 Jan 03J. Huth LHC Computing Agency ReviewJ. Huth LHC Computing Agency Review 10
Lessons Learned
Globus, Magda and Pacman make grid production easy! On the grid - submit anywhere, run anywhere, store data anywhere
- really works! Error reporting, recovery & cleanup very important - will always
lose/hang some jobs Found many unexpected limitations, hangs, software problems -
next time, need larger team to quantify these problems and provide feedback to Globus, Condor, and other middleware teams
Large pool of hardware resources available on testbed: BNL Tier 1, LBNL (pdsf), IU & BU prototype Tier 2 sites, UTA (new $1.35M NSF-MRI), OU & UNM CS supercomputing clusters...
Testbed production effort suffering from severe shortage of human resources. Need people to debug middleware problems and provide feedback to middleware developers
14 Jan 0314 Jan 03J. Huth LHC Computing Agency ReviewJ. Huth LHC Computing Agency Review 11
DC1 Phase II
Provide data for high level trigger studiesProvide data for high level trigger studies
Data analysis from production at BNLData analysis from production at BNL
Unified set of grid tools for international ATLASUnified set of grid tools for international ATLAS Magda, PACMAN, cook-book database, VO management
There may still be divergences among grids (US/EU/Nordu grid)
14 Jan 0314 Jan 03J. Huth LHC Computing Agency ReviewJ. Huth LHC Computing Agency Review 12
ATLAS DC1 Phase 2
Pile up production:Pile up production: Ongoing! Both grid & non-grid based Add min. bias events to DC1 Phase 1 sample
Simulation in the U.S.Simulation in the U.S. Higgs re-simulation - 50k events SUSY simulation - 50k events
Athena reconstructionAthena reconstruction of complete DC1 Phase 2 sample
Analysis / user access to dataAnalysis / user access to data Magda already provides access to ~30k catalogued dc1 files from/to many
grid locations (need ATLAS VO to make this universal) Need higher level (web based?) tools to provide easy access to physicist DIAL being developed at BNL http://www.usatlas.bnl.gov/~dladams/dial/
14 Jan 0314 Jan 03J. Huth LHC Computing Agency ReviewJ. Huth LHC Computing Agency Review 13
DC2 Specifics
From new WBS (we will review the situation in early 2003)From new WBS (we will review the situation in early 2003)
Test LCG-1 (software and hardware) - in particular POOLTest LCG-1 (software and hardware) - in particular POOL
ATLAS testsATLAS tests simulation status of G4 (validation, hits, digits) pile-up in Athena relative role of G3/G4 calibration and alignment detector description and EDM distributed analysis
14 Jan 0314 Jan 03J. Huth LHC Computing Agency ReviewJ. Huth LHC Computing Agency Review 14
Integration
Coordination with other grid efforts and software developers - very Coordination with other grid efforts and software developers - very difficult task!difficult task!
Project centric:Project centric: Intl. ATLAS – K. De GriPhyN/iVDGL - Rob Gardner (J. Schopf, CS contact) PPDG – John Huth LCG – John Huth (POB), V. White (GDB), L. Bauerdick (SC2) EDG - Ed May, Jerry Gieraltowski ATLAS/LHCb - Rich Baker ATLAS/CMS - Kaushik De ATLAS/D0 - Jae Yu
Fabric/Middleware centric:Fabric/Middleware centric: Afs Software installations - Alex Undrus, Shane Canon, Iwona Sakrejda Networking - Shawn McKee, Rob Gardner Virtual and Real Data Management - Wendsheng Deng, Sasha Vaniachin,
Pavel Nevski, David Malon, Rob Gardner, Dan Engh, Mike Wilde Security/Site AA/VO - Rich Baker, Dantong Yu
14 Jan 0314 Jan 03J. Huth LHC Computing Agency ReviewJ. Huth LHC Computing Agency Review 15
NLV Analysis Tool: Plots Time vs. Event Name
Menu bar
Scale for load-line/pointsEvents
Legend
Zoom windowcontrols
Zoom box
Playback controls
Window sizeMax window size
Zoom-box actions
Playback speed
Summaryline
Time axis
You arehere
Title
14 Jan 0314 Jan 03J. Huth LHC Computing Agency ReviewJ. Huth LHC Computing Agency Review 16
Grid Planning
For Data Challenge 1, Phase 2, generation of High Level For Data Challenge 1, Phase 2, generation of High Level
Trigger – using a unified format for International ATLAS Trigger – using a unified format for International ATLAS
computing in a grid configuration.computing in a grid configuration.
For R+D purposes: increased integration into For R+D purposes: increased integration into
iVDGL/GriPhyN tools: VDT, Chimera, working with tools in iVDGL/GriPhyN tools: VDT, Chimera, working with tools in
common with US CMScommon with US CMS
Issue of divergences with EDG software, configurationsIssue of divergences with EDG software, configurations
14 Jan 0314 Jan 03J. Huth LHC Computing Agency ReviewJ. Huth LHC Computing Agency Review 17
External Groups
Trillium (GriPhyN,iVDGL,PPDG)Trillium (GriPhyN,iVDGL,PPDG)
EU initiativesEU initiatives
LCGLCG
Intl. ATLASIntl. ATLAS
U.S. CMS (large ITR pre-proposal)U.S. CMS (large ITR pre-proposal)
Other groups/experiments (HEP/LIGO/SDSS…)Other groups/experiments (HEP/LIGO/SDSS…)
Funding agenciesFunding agencies
14 Jan 0314 Jan 03J. Huth LHC Computing Agency ReviewJ. Huth LHC Computing Agency Review 18
Trillium Group
PPDG (J. Huth)PPDG (J. Huth) Metadata catalog (MAGDA) (Wensheng Deng, BNL) Interoperability, middleware evaluation (Jerry Gieraltowski, ANL) Virtual organization, monitoring in grid environment (Dantong Yu, BNL) Distributed analysis (David Adams, BNL)
iVDGL (R. Gardner)iVDGL (R. Gardner) Package deployment/installation (PACMAN)( S. Youssef, BU) –adopted by
VDT and CMS Incorporation of VDT, Chimera in next prototype round Hardware support – prototype Tier 2’s (Indiana, Boston University)
NB A tremendous amount of support comes from base efforts at Labs NB A tremendous amount of support comes from base efforts at Labs and Universities (netlogger – LBNL, grat- De UTA, support – H. and Universities (netlogger – LBNL, grat- De UTA, support – H. Severini, Oklahoma, S. McKee, Michigan, May, ANL, BNL, LBNL, BU)Severini, Oklahoma, S. McKee, Michigan, May, ANL, BNL, LBNL, BU)
14 Jan 0314 Jan 03J. Huth LHC Computing Agency ReviewJ. Huth LHC Computing Agency Review 19
EU Initiatives
Linkages viaLinkages via International ATLAS (using EDG tools/testbed)
LCG
ITR initiative
Middleware representatives (Foster/Kesselman/Livny/Gagliardi)
Issues:Issues: EDG mandate is larger than the LHC
Divergences in middleware, approaches “best in show” concept – David Foster (LCG)
Interoperability of testbeds
14 Jan 0314 Jan 03J. Huth LHC Computing Agency ReviewJ. Huth LHC Computing Agency Review 20
LCG
More than just grids – applications group is very largeMore than just grids – applications group is very large
Linkages viaLinkages via Intl. Experiments Middleware representatives Architects’ forum Applications (T. Wenaus) Representation: L Bauerdick (SC2), V. White (GDB), J. Huth (POB)
Issues: Issues: Communications about LCG-1
What are requirements for Tier 1’s? Divergences in site requirements?
Linage of Intl. Experiments with LCG DC1Phase2 versus LCG-1
14 Jan 0314 Jan 03J. Huth LHC Computing Agency ReviewJ. Huth LHC Computing Agency Review 21
International ATLAS
Main focus: Data ChallengesMain focus: Data Challenges G. Poulard (DC coordinator), A. Putzer (NCB chair)
LinkagesLinkages National Computing Board (resources, planning) – J. Huth Grid planning – R. Gardner Technical Board – K. De
IssuesIssues Divergence in US-EU grid services an issue Commonality vs. interoperability
Probably the tightest coupling (nothing focuses the mind like data)Probably the tightest coupling (nothing focuses the mind like data) Middleware selection Facilities planning
14 Jan 0314 Jan 03J. Huth LHC Computing Agency ReviewJ. Huth LHC Computing Agency Review 22
US CMS
Commonality in many tools (PACMAN, VDT, Chimera)Commonality in many tools (PACMAN, VDT, Chimera) Discussions in common forums (LCG, Trillium, CS groups)
Interoperability testsInteroperability tests
Large ITR proposal:Large ITR proposal: Coordination with EDG, LCG, US CMS, CS community Use existing management structure of US ATLAS+ US CMS
Coordinating group
Physics-based goal of “private grids” for analysis groups Coordination among funding agencies, LCG, experiments Further funding initiatives coordinated (EGEE, DOE)
14 Jan 0314 Jan 03J. Huth LHC Computing Agency ReviewJ. Huth LHC Computing Agency Review 23
Funding Agencies
Recent initiative to pull together funding agencies involved in grids and Recent initiative to pull together funding agencies involved in grids and HEP applications.HEP applications. Nov. 22nd meeting of NSF, DOE, Computing, HEP, EU and CERN
representatives Follow-up date proposed: Feb. 7th
November ITR workshop, with US CMS, US ATLAS, computer scientists and funding agency representatives Follow-up: M. Kasemann
Goal: Coordinated funding of EU/US/CERN efforts on the experiments Goal: Coordinated funding of EU/US/CERN efforts on the experiments and grid middleware.and grid middleware.
Beyond the large ITR:Beyond the large ITR: Medium ITR’s DOE initiatives EU funding Question: UK, other funding
14 Jan 0314 Jan 03J. Huth LHC Computing Agency ReviewJ. Huth LHC Computing Agency Review 24
Other groups
ExperimentsExperiments Direct linkage via GriPhyN, iVDGL: SDSS, LIGO
Contacts, discussions with D0, RHIC experiments
Grid organizationsGrid organizations PPARC
CrossGrid
HICB
Interoperability initiatives
14 Jan 0314 Jan 03J. Huth LHC Computing Agency ReviewJ. Huth LHC Computing Agency Review 25
US ATLAS Grid Organization
Present status: WBS projection onto grid topics of software Present status: WBS projection onto grid topics of software and facilitiesand facilities
Efforts have focused on SC2002 demos and ATLAS Efforts have focused on SC2002 demos and ATLAS production – near term milestonesproduction – near term milestones
I have held off until now on reorganizing the management I have held off until now on reorganizing the management structure because of the rapid evolutionstructure because of the rapid evolution Gain experience with the US ATLAS testbed Creation of new organizations Shake out of the LHC schedule
We are now discussing the necessary changes to the We are now discussing the necessary changes to the management structure to coordinate the effortsmanagement structure to coordinate the efforts
14 Jan 0314 Jan 03J. Huth LHC Computing Agency ReviewJ. Huth LHC Computing Agency Review 26
Proposed new structure
Creation of level-2 management slot for distributed Creation of level-2 management slot for distributed computing applicationscomputing applications Interaction with above list of groups (or delegate liaison)
Three sub-tasksThree sub-tasks Architecture –
Series of packages created for ATLAS production or prototyping Components
Testing and support of grid deliverables (e.g. high level ATLAS-specific interfaces)
Grid deliverables associated with IT initiatives Production
Running Intl. ATLAS production on the US ATLAS fabric Contributions to Intl. ATLAS production
14 Jan 0314 Jan 03J. Huth LHC Computing Agency ReviewJ. Huth LHC Computing Agency Review 27
14 Jan 0314 Jan 03J. Huth LHC Computing Agency ReviewJ. Huth LHC Computing Agency Review 28
Next Steps in Organization
Bring this new structure to the US ATLAS Computing Bring this new structure to the US ATLAS Computing Coordination Board, seek advice on level-2 and level-3 Coordination Board, seek advice on level-2 and level-3 managers.managers.
Coordinate with “liaison” groupsCoordinate with “liaison” groups
14 Jan 0314 Jan 03J. Huth LHC Computing Agency ReviewJ. Huth LHC Computing Agency Review 29
Comments/Summary
US ATLAS has made tremendous progress last year in US ATLAS has made tremendous progress last year in testing grid productiontesting grid production SC2002 Demo Interoperability tests Data Challenges
Management of diverse sources of effort is challenging!!Management of diverse sources of effort is challenging!! Mechanisms begun to coordinate further activitiesMechanisms begun to coordinate further activities
Goal: without creating divergences, or proliferation of new groups
New US ATLAS management structure under discussion New US ATLAS management structure under discussion