paul avery university of florida [email protected]
DESCRIPTION
Grid Computing in High Energy Physics Enabling Data Intensive Global Science. Paul Avery University of Florida [email protected]. Lishep2004 Workshop UERJ, Rio de Janeiro February 17, 2004. The Grid Concept. - PowerPoint PPT PresentationTRANSCRIPT
Lishep2004 (February 17, 2004)
Paul Avery 1
Paul AveryUniversity of [email protected]
Grid Computing in High Energy Physics
Enabling Data Intensive Global Science
Lishep2004 Workshop UERJ, Rio de Janeiro
February 17, 2004
Lishep2004 (February 17, 2004)
Paul Avery 2
The Grid Concept Grid: Geographically distributed computing
resources configured for coordinated use
Fabric: Physical resources & networks provide raw capability
Ownership: Resources controlled by owners and shared w/ others
Middleware: Software ties it all together: tools, services, etc.
Goal: Transparent resource sharing
US-CMS“Virtual Organization”
Lishep2004 (February 17, 2004)
Paul Avery 3
2000+ Physicists159 Institutes36 Countries
LHC: Key Driver for Data Grids Complexity: Millions of individual detector channels Scale: PetaOps (CPU), 100s of Petabytes (Data) Distribution: Global distribution of people & resources
CMS Collaboration
Lishep2004 (February 17, 2004)
Paul Avery 4
CMSATLAS
LHCb
Storage Raw recording rate 0.1 – 1 GB/s Monte Carlo data 100 PB total by ~2010
Processing PetaOps (> 300,000 3 GHz PCs)
LHC Data and CPU Requirements
Lishep2004 (February 17, 2004)
Paul Avery 5
All charged tracks with pt > 2 GeV
Reconstructed tracks with pt > 25 GeV
(+30 minimum bias events)
109 collisions/sec, selectivity: 1 in 1013
Complexity: Higgs Decay into 4 muons
Lishep2004 (February 17, 2004)
Paul Avery 6
ATLAS CMS
LHC Global Collaborations
Several 1000 per expt.
Lishep2004 (February 17, 2004)
Paul Avery 7
Driver for Transatlantic Networks (Gb/s)
2001 estimates, now seen as conservative!
2001 2002 2003 2004 2005 2006
CMS 0.1 0.2 0.3 0.6 0.8 2.5
ATLAS 0.05 0.1 0.3 0.6 0.8 2.5
BaBar 0.3 0.6 1.1 1.6 2.3 3.0
CDF 0.1 0.3 0.4 2.0 3.0 6.0
D0 0.4 1.6 2.4 3.2 6.4 8.0
BTeV 0.02 0.04 0.1 0.2 0.3 0.5
DESY 0.1 0.18 0.21 0.24 0.27 0.3
CERN 0.31 0.62 2.5 5.0 10 20
Lishep2004 (February 17, 2004)
Paul Avery 8
HEP Bandwidth Roadmap (Gb/s)
HEP: Leading role in network development/deployment
Year Production Experimental Remarks
2001 0.155 0.62 - 2.5 SONET/SDH
2002 0.622 2.5 SONET/SDH DWDM; GigE Integ.
2003 2.5 10 DWDM; 1+10 GigE Integration
2005 10 2-4 10 Switch; Provisioning
2007 2-4 10 1 40 1st Gen. Grids
2009 1-2 40 ~5 40 40 Gb/s; Switching
2011 ~5 40 ~25 40 Terabit Networks
2013 ~Terabit ~MultiTbps ~Fill One Fiber
Lishep2004 (February 17, 2004)
Paul Avery 9
CMS Experiment
Global LHC Data Grid Hierarchy
Online System
CERN Computer Center
USAKorea RussiaUK
Institute
0.1 - 1.5 GBytes/s
2.5-10 Gb/s
1-10 Gb/s
10-40 Gb/s
1-2.5 Gb/s
Tier 0
Tier 1
Tier 3
Tier 4
Tier 2
Physics caches
PCs
Institute
Institute
Institute
Tier2 Center
Tier2 Center
Tier2 Center
Tier2 Center
~10s of Petabytes/yr by 2007-8~1000 Petabytes in < 10 yrs?
Lishep2004 (February 17, 2004)
Paul Avery 10
Most IT Resources Outside CERN
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
CPU Disk Tape
CERN
Tier1
Tier2
Est. 2008 ResourcesTier0 + Tier1 + Tier2
17% 10%
38%
Lishep2004 (February 17, 2004)
Paul Avery 11
Analysis by Globally Distributed Teams
Non-hierarchical: Chaotic analyses + productions Superimpose significant random data flows
Lishep2004 (February 17, 2004)
Paul Avery 12
Data Grid Projects
Lishep2004 (February 17, 2004)
Paul Avery 13
U.S. ProjectsGriPhyN (NSF) iVDGL (NSF)Particle Physics Data Grid (DOE)PACIs and TeraGrid (NSF)DOE Science Grid (DOE)NEESgrid (NSF)NSF Middleware Initiative (NSF)
Global Context of Data Grid Projects
EU, Asia projectsEuropean Data Grid (EU)EDG-related national
ProjectsDataTAG (EU)LHC Computing Grid
(CERN)EGEE (EU)CrossGrid (EU)GridLab (EU) Japanese, Korea Projects
Two primary project clusters: US + EU Not exclusively HEP, but driven/led by HEP (with CS) Many 10s $M brought into the field
HEP-related Grid infrastructure projects
Lishep2004 (February 17, 2004)
Paul Avery 14
International Grid Coordination Global Grid Forum (GGF)
Grid standards bodyNew PNPA research group
HICB: HEP Inter-Grid Coordination BoardNon-competitive forum, strategic issues, consensusCross-project policies, procedures and technology, joint
projects
HICB-JTB Joint Technical BoardTechnical coordination, e.g. GLUE interoperability effort
LHC Computing Grid (LCG)Technical work, deploymentsPOB, SC2, PEB, GDB, GAG, etc.
Lishep2004 (February 17, 2004)
Paul Avery 15
LCG: LHC Computing Grid Project A persistent Grid infrastructure for LHC experiments
Matched to decades long research program of LHC
Prepare & deploy computing environment for LHC expts
Common applications, tools, frameworks and environmentsDeployment oriented: no middleware development(?)
Move from testbed systems to real production services
Operated and supported 24x7 globallyComputing fabrics run as production physics servicesA robust, stable, predictable, supportable infrastructure
Lishep2004 (February 17, 2004)
Paul Avery 16
LCG Activities Relies on strong leverage
CERN ITLHC experiments: ALICE, ATLAS, CMS, LHCbGrid projects: Trillium, DataGrid, EGEE
Close relationship with EGEE project (see Gagliardi talk)
EGEE will create EU production Grid (funds 70 institutions)Middleware activities done in common (same manager)LCG deployment in common (LCG Grid is EGEE prototype)
LCG-1 deployed (Sep. 2003)Tier-1 laboratories (including FNAL and BNL)
LCG-2 in process of being deployed
Lishep2004 (February 17, 2004)
Paul Avery 17
Sep. 29, 2003 announcement
Lishep2004 (February 17, 2004)
Paul Avery 18
LCG-1 Sites
Lishep2004 (February 17, 2004)
Paul Avery 19
Grid Analysis Environment GAE and analysis teams
Extending locally, nationally, globallySharing worldwide computing, storage & network resourcesUtilizing intellectual contributions regardless of location
HEPCAL and HEPCAL IIDelineate common LHC use cases & requirements (HEPCAL)Same for analysis (HEPCAL II)
ARDA: Architectural Roadmap for Distributed AnalysisARDA document released LCG-2003-033 (Oct. 2003)Workshop January 2004, presentations by the 4 experiments
CMS: CAIGEE ALICE: AliEnATLAS: ADA LHCb: DIRAC
Lishep2004 (February 17, 2004)
Paul Avery 20
InformationService
Grid AccessService
JobProvenance
Authentication
Authorization
Accounting
GridMonitoring
WorkloadManagement
SiteGatekeeper
DataManagementPackage
Manager
FileCatalog
MetadataCatalog
StorageElement
JobMonitor
ComputingElement
UserApplication
Auditing
One View of ARDA Core Services
LCG-2003-033 (Oct. 2003)
Lishep2004 (February 17, 2004)
Paul Avery 21
CMS Example: CAIGEE Clients talk standard
protocols to “Grid Services Web Server”, a.k.a. Clarens data/services portal with simple WS API
Clarens portal hides complexity of Grid Services from client
Key features: Global scheduler, catalogs, monitoring, and Grid-wide execution service
Clarens servers formglobal peerpeer network
SchedulerCatalogs
Grid ServicesWeb Server
ExecutionPriority
Manager
Grid WideExecutionService
DataManagement
Fully-ConcretePlanner
Fully-AbstractPlanner
Analysis Client
AnalysisClient
Virtual Data
Replica
ApplicationsMonitoring
Partially-AbstractPlanner
Metadata
HTTP, SOAP, XML/RPC
Lishep2004 (February 17, 2004)
Paul Avery 22
LCG Timeline Phase 1: 2002 – 2005
Build a service prototype, based on existing grid middleware
Gain experience in running a production grid serviceProduce the Computing TDR for the final system
Phase 2: 2006 – 2008 Build and commission the initial LHC computing
environment
Phase 3: ?
Lishep2004 (February 17, 2004)
Paul Avery 23
“Trillium”: U.S. Physics Grid Projects
Trillium = PPDG + GriPhyN + iVDGLLarge overlap in leadership, people, experimentsHEP members are main drivers, esp. LHC experiments
Benefit of coordinationCommon software base + packaging: VDT + PACMANCollaborative / joint projects: monitoring, demos, security,
…Wide deployment of new technologies, e.g. Virtual Data
Forum for US Grid projects Joint strategies, meetings and workUnified U.S. entity to interact with international Grid
projects
Build significant Grid infrastructure: Grid2003
Lishep2004 (February 17, 2004)
Paul Avery 24
Science Drivers for U.S. HEP Grids
LHC experiments100s of Petabytes
Current HENP experiments~1 Petabyte (1000 TB)
LIGO100s of Terabytes
Sloan Digital Sky Survey10s of Terabytes
Future Grid resources Massive CPU (PetaOps) Large distributed datasets (>100PB) Global communities (1000s)
Data
gro
wth
Com
mu
nit
y g
row
th
2007
2005
2003
2001
2009
Lishep2004 (February 17, 2004)
Paul Avery 25
Virtual Data Toolkit
Sources(CVS)
Patching
GPT srcbundles
NMI
Build & TestCondor pool
(37 computers)
…
Build
Test
Package
VDT
Build
Contributors (VDS, etc.)
Build
Pacman cache
RPMs
Binaries
Binaries
Binaries Test
Will use NMI processes soon
A unique resource for managing, testing, supporting, deploying, packaging, upgrading, & troubleshooting complex sets of software
Lishep2004 (February 17, 2004)
Paul Avery 26
Virtual Data Toolkit: Tools in VDT 1.1.12
Globus Alliance Grid Security Infrastructure
(GSI) Job submission (GRAM) Information service (MDS) Data transfer (GridFTP) Replica Location (RLS)
Condor Group Condor/Condor-G DAGMan Fault Tolerant Shell ClassAds
EDG & LCG Make Gridmap Cert. Revocation List
Updater Glue Schema/Info provider
ISI & UC Chimera & related tools Pegasus
NCSA MyProxy GSI OpenSSH
LBL PyGlobus Netlogger
Caltech MonaLisa
VDT VDT System Profiler Configuration software
Others KX509 (U. Mich.)
Lishep2004 (February 17, 2004)
Paul Avery 27
VDT Growth (1.1.13 Currently)
02468
101214161820
Number of Components
VDT 1.0Globus 2.0bCondor 6.3.1
VDT 1.1.3,1.1.4 & 1.1.5 pre-SC 2002
VDT 1.1.7Switch to Globus 2.2
VDT 1.1.11Grid2003
VDT 1.1.8First real use by LCG
Lishep2004 (February 17, 2004)
Paul Avery 28
Virtual Data: Derivation and Provenance
Most scientific data are not simple “measurements”They are computationally corrected/reconstructedThey can be produced by numerical simulation
Science & eng. projects are more CPU and data intensive
Programs are significant community resources (transformations)
So are the executions of those programs (derivations)
Management of dataset dependencies critical!Derivation: Instantiation of a potential data productProvenance: Complete history of any existing data
productPreviously: Manual methodsGriPhyN: Automated, robust
tools(Chimera Virtual Data System)
Lishep2004 (February 17, 2004)
Paul Avery 29
decay = WWWW e Pt > 20
decay = WWWW e
decay = WWWW leptons
mass = 160
decay = WW
decay = ZZ
decay = bb
Other cuts
Scientist adds a new derived data branch & continues analysis
LHC Higgs Analysis with Virtual Data
Other cuts Other cuts Other cuts
Lishep2004 (February 17, 2004)
Paul Avery 30
Chimera: Sloan Galaxy Cluster Analysis
1
10
100
1000
10000
100000
1 10 100
Num
ber
of C
lust
ers
Number of Galaxies
Galaxy clustersize distribution
Sloan Data
Lishep2004 (February 17, 2004)
Paul Avery 31
Outreach: QuarkNet-Trillium Virtual Data Portal
More than a web site Organize datasets Perform simple
computations Create new computations &
analyses View & share results Annotate & enquire
(metadata) Communicate and
collaborate
Easy to use, ubiquitous, No tools to install
Open to the community Grow & extend
Initial prototype implemented by graduate student Yong Zhao and M. Wilde (U. of Chicago)
CHEPREO: Center for High Energy Physics Research and Educational OutreachFlorida International University
Physics Learning Center CMS Research iVDGL Grid Activities AMPATH network (S.
America)
Funded September 2003
Lishep2004 (February 17, 2004)
Paul Avery 33
Grid2003: An Operational Grid28 sites (2100-2800 CPUs)400-1300 concurrent jobs10 applicationsRunning since October 2003
Koreahttp://www.ivdgl.org/grid2003
Lishep2004 (February 17, 2004)
Paul Avery 34
Grid2003 Participants
Grid2003
iVDGL
GriPhyN
PPDG
US-CMS
US-ATLAS
DOE Labs
Biology
Korea
Lishep2004 (February 17, 2004)
Paul Avery 35
Grid2003 Applications High energy physics
US-ATLAS analysis (DIAL),US-ATLAS GEANT3 simulation (GCE)US-CMS GEANT4 simulation (MOP)BTeV simulation
Gravity wavesLIGO: blind search for continuous sources
Digital astronomySDSS: cluster finding (maxBcg)
BioinformaticsBio-molecular analysis (SnB)Genome analysis (GADU/Gnare)
CS Demonstrators Job Exerciser, GridFTP, NetLogger-grid2003
Lishep2004 (February 17, 2004)
Paul Avery 36
Grid2003 Site Monitoring
Lishep2004 (February 17, 2004)
Paul Avery 37
Grid2003: Three Months Usage
Lishep2004 (February 17, 2004)
Paul Avery 38
Grid2003 Success Much larger than originally planned
More sites (28), CPUs (2800), simultaneous jobs (1300)More applications (10) in more diverse areas
Able to accommodate new institutions & applications
U Buffalo (Biology) Nov. 2003Rice U. (CMS) Feb. 2004
Continuous operation since October 2003Strong operations team (iGOC at Indiana)US-CMS using it for production simulations (next slide)
Lishep2004 (February 17, 2004)
Paul Avery 39
Production Simulations on Grid2003
Used = 1.5 US-CMS resources
US-CMS Monte Carlo Simulation
USCMS
Non-USCMS
Lishep2004 (February 17, 2004)
Paul Avery 40
Grid2003: A Necessary Step Learning how to operate a Grid
Add sites, recover from errors, provide info, update, test, etc.
Need tools, services, procedures, documentation, organization
Need reliable, intelligent, skilled people
Learning how to cope with “large” scale“Interesting” failure modes as scale increases Increasing scale must not overwhelm human resources
Learning how to delegate responsibilitiesMultiple levels: Project, Virtual Org., service, site,
applicationEssential for future growth
Grid2003 experience critical for building “useful” GridsFrank discussion in “Grid2003 Project Lessons” doc
Lishep2004 (February 17, 2004)
Paul Avery 41
Grid2003 Lessons (1): Investment Building momentum
PPDG 1999GriPhyN 2000 iVDGL 2001Time for projects to ramp up
Building collaborationsHEP: ATLAS, CMS, Run 2, RHIC, JlabNon-HEP: Computer science, LIGO, SDSSTime for collaboration sociology to kick in
Building testbedsBuild expertise, debug Grid software, develop Grid tools &
servicesUS-CMS 2002 – 2004+US-ATLAS 2002 – 2004+WorldGrid 2002 (Dec.)
Lishep2004 (February 17, 2004)
Paul Avery 42
Grid2003 Lessons (2): Deployment Building something useful draws people in
(Similar to a large HEP detector)Cooperation, willingness to invest time, striving for
excellence!
Grid development requires significant deploymentsRequired to learn what works, what fails, what’s clumsy, …Painful, but pays for itself
Deployment provides powerful training mechanism
Lishep2004 (February 17, 2004)
Paul Avery 43
Grid2003 Lessons (3): Packaging Installation and configuration (VDT + Pacman)
Simplifies installation, configuration of Grid tools + applications
Major advances over 13 VDT releases
A strategic issue and critical to usProvides uniformity + automationLowers barriers to participation scalingExpect great improvements (Pacman 3)
Automation: the next frontierReduce FTE overhead, communication trafficAutomate installation, configuration, testing, validation,
updatesRemote installation, etc.
Lishep2004 (February 17, 2004)
Paul Avery 44
Grid2003 and Beyond Further evolution of Grid3: (Grid3+, etc.)
Contribute resources to persistent GridMaintain development Grid, test new software releases Integrate software into the persistent GridParticipate in LHC data challenges
Involvement of new sitesNew institutions and experimentsNew international partners (Brazil, Taiwan, Pakistan?, …)
Improvements in Grid middleware and services Integrating multiple VOsMonitoringTroubleshootingAccounting…
Lishep2004 (February 17, 2004)
Paul Avery 45
U.S. Open Science Grid Goal: Build an integrated Grid infrastructure
Support US-LHC research program, other scientific effortsResources from laboratories and universitiesFederate with LHC Computing Grid
Getting there: OSG-1 (Grid3+), OSG-2, …Series of releases increasing functionality & scaleConstant use of facilities for LHC production computing
Jan. 12 meeting in ChicagoPublic discussion, planning sessions
Next stepsCreating interim Steering Committee (now)White paper to be expanded into roadmapPresentation to funding agencies (April/May?)
Lishep2004 (February 17, 2004)
Paul Avery 46
Inputs to Open Science Grid
OpenScience
Grid
Trillium
US-LHC
Laboratorycenters
Educationcommunity
Otherscience
applications
Technologists
ComputerScience
Universityfacilities
Multi-disciplinaryfacilities
Lishep2004 (February 17, 2004)
Paul Avery 47
Grids: Enhancing Research & Learning
Fundamentally alters conduct of scientific research“Lab-centric”: Activities center around large facility“Team-centric”: Resources shared by distributed teams“Knowledge-centric”: Knowledge generated/used by a
community Strengthens role of universities in research
Couples universities to data intensive scienceCouples universities to national & international labsBrings front-line research and resources to studentsExploits intellectual resources of formerly isolated schoolsOpens new opportunities for minority and women researchers
Builds partnerships to drive advances in IT/science/engHEP Physics, astronomy, biology, CS, etc.“Application” sciences Computer ScienceUniversities LaboratoriesScientists StudentsResearch Community IT industry
Lishep2004 (February 17, 2004)
Paul Avery 48
Grids and HEP’s Broad Impact Grids enable 21st century collaborative science
Linking research communities and resources for scientific discovery
Needed by LHC global collaborations pursuing “petascale” science
HEP Grid projects driving important network developments
Recent US – Europe “land speed records” ICFA-SCIC, I-HEPCCC, US-CERN link, ESNET, Internet2
HEP Grids impact education and outreachProviding technologies, resources for training, education, outreach
HEP has recognized leadership in Grid developmentMany national and international initiativesPartnerships with other disciplines Influencing funding agencies (NSF, DOE, EU, S.A., Asia)
HEP Grids will provide scalable computing infrastructures for scientific research
Lishep2004 (February 17, 2004)
Paul Avery 49
Grid References Grid2003
www.ivdgl.org/grid2003 Globus
www.globus.org PPDG
www.ppdg.net GriPhyN
www.griphyn.org iVDGL
www.ivdgl.org LCG
www.cern.ch/lcg EU DataGrid
www.eu-datagrid.org EGEE
egee-ei.web.cern.ch
2nd Editionwww.mkp.com/grid2
Lishep2004 (February 17, 2004)
Paul Avery 50
Extra Slides
Lishep2004 (February 17, 2004)
Paul Avery 51
UltraLight: 10 Gb/s Network
10 Gb/s+ network• Caltech, UF, FIU, UM, MIT• SLAC, FNAL• Int’l partners• Level(3), Cisco, NLR
Lishep2004 (February 17, 2004)
Paul Avery 52
Goal: PetaScale Virtual-Data Grids
Virtual Data Tools
Request Planning &Scheduling Tools
Request Execution & Management Tools
Transforms
Distributed resources(code, storage, CPUs,networks)
ResourceManagement
Services
Security andPolicy
Services
Other GridServices
Interactive User Tools
Production TeamSingle Researcher Workgroups
Raw datasource
PetaOps Petabytes Performance
Lishep2004 (February 17, 2004)
Paul Avery 53
Transformation Derivation
Data
product-of
execution-of
consumed-by/generated-by
“I’ve detected a muon calibration error and want to know which derived data products need to be recomputed.”
“I’ve found some interesting data, but I need to know exactly what corrections were applied before I can trust it.”
“I want to search a database for 3 muon SUSY events. If a program that does this analysis exists, I won’t have to write one from scratch.”
“I want to apply a forward jet analysis to 100M events. If the results already exist, I’ll save weeks of computation.”
Virtual Data Motivations (1)
Lishep2004 (February 17, 2004)
Paul Avery 54
Virtual Data Motivations (2) Data track-ability and result audit-ability
Universally sought by scientific applications
Facilitate resource sharing and collaborationData is sent along with its recipeA new approach to saving old data: economic
consequences?
Manage workflowOrganize, locate, specify, request data products
Repair and correct data automatically Identify dependencies, apply x-tions
Optimize performanceRe-create data or copy it (caches)
Manual /error prone Automated /robust
Lishep2004 (February 17, 2004)
Paul Avery 55
Virtual Data Language (VDL) Describes virtual data products
Virtual Data Catalog (VDC) Used to store VDL
Abstract Job Flow Planner Creates a logical DAG (dependency
graph)
Concrete Job Flow Planner Interfaces with a Replica Catalog Provides a physical DAG submission
file to Condor-G
Generic and flexible As a toolkit and/or a framework In a Grid environment or locally
Log
ical
Ph
ysi
cal
AbstractPlannerVDC
ReplicaCatalog
ConcretePlanner
DAX
DAGMan
DAG
VDLXML
Chimera Virtual Data System
XML
Virtual data & CMS productionMCRunJob