tony doyle “grid development in uk”, gluex meeting, glasgow, 5 august 2003

Download Tony Doyle “Grid Development in UK”, GlueX Meeting, Glasgow, 5 August 2003

If you can't read please download the document

Upload: stella-matthews

Post on 18-Jan-2018

221 views

Category:

Documents


0 download

DESCRIPTION

Tony Doyle - University of GlasgowOutline Motivation and Overview DataGrid Video Data Structures in Particle Physics How Does the Grid Work? Rare Phenomena – Huge Background Data Hierarchy LHC Computing Challenge Computing Hierarchy Events.. to Files.. to Events GridPP in Context “The Project Map” LHC Computing Grid Service European DataGrid (EDG) Grid Data Management Is the Middleware Robust? Applications Tier-1/A Tier-2 Testbed Data Challenges GridPP1 Summary Next Step: GridPP2 From Testbed to Production

TRANSCRIPT

Tony Doyle Grid Development in UK, GlueX Meeting, Glasgow, 5 August 2003 Tony Doyle - University of Glasgow Particle Physics and Grid Development Joint Edinburgh/Glasgow SHEFC JREI-funded project to develop a prototype Tier-2 centre for LHC computing.UK-wide project to develop a prototype Grid for Particle Physics Applications.EU-wide project to develop middleware for Particle Physics, Bioinformatics and Earth Observation applicationsEmphasis on UK developments Tony Doyle - University of GlasgowOutline Motivation and Overview DataGrid Video Data Structures in Particle Physics How Does the Grid Work? Rare Phenomena Huge Background Data Hierarchy LHC Computing Challenge Computing Hierarchy Events.. to Files.. to Events GridPP in Context The Project Map LHC Computing Grid Service European DataGrid (EDG) Grid Data Management Is the Middleware Robust? Applications Tier-1/A Tier-2 Testbed Data Challenges GridPP1 Summary Next Step: GridPP2 From Testbed to Production Tony Doyle - University of Glasgow DataGrid Video Clip Tony Doyle - University of Glasgow Rare Phenomena Huge Background 9 orders of magnitude The HIGGS All interactions Tony Doyle - University of Glasgow Online Data Rate vs Size Level 1 Rate (Hz) High Level-1 Trigger (1 MHz) High No. Channels High Bandwidth (500 Gbit/s) High Data Archive (PetaByte) LHCB KLOE HERA-B CDF II CDF H1 ZEUS UA1 LEP NA49 ALICE Event Size (bytes) ATLAS CMS It doesnt Factor O(1000) Online data reduction via trigger selection How can this data reach the end user? Tony Doyle - University of Glasgow A running (non-Grid) experiment Three Steps to select an event today 1.Remote access to O(100) TBytes of ESD data 2.Via remote access to 100 GBytes of TAG data 3.Using offline selection e.g. ZeusIO- Variable (Ee>20.0)and(Ntrks>4) Access to remote store via batch job 1% database event finding overhead O(1M) lines of reconstruction code No middleware 20k lines of C++ glue from Objectivity (TAG) to ADAMO (ESD) database TAG ESD Million selected events from 5 years data 4 TAG selection via 250 variables/event Tony Doyle - University of Glasgow Data Hierarchy RAW, ESD, AOD, TAG RAW Recorded by DAQ Triggered events Detector digitisation ~2 MB/event ESD Pseudo-physical information: Clusters, track candidates (electrons, muons), etc. Reconstructedinformation ~100 kB/event AOD Physical information: Transverse momentum, Association of particles, jets, (best) id of particles, Physical info for relevant objects Selectedinformation ~10 kB/event TAG Analysisinformation ~1 kB/event Relevant information for fast event selection Tony Doyle - University of Glasgow LHC Computing Challenge Tier2 Centre ~1 TIPS Online System Offline Farm ~20 TIPS CERN Computer Centre >20 TIPS RAL Regional Centre US Regional Centre French Regional Centre Italian Regional Centre Institute Institute ~0.25TIPS Workstations ~100 MBytes/sec Mbits/sec One bunch crossing per 25 ns 100 triggers per second Each event is ~1 MByte Physicists work on analysis channels Each institute has ~10 physicists working on one or more channels Data for these channels should be cached by the institute server Physics data cache ~PBytes/sec ~ Gbits/sec or Air Freight Tier2 Centre ~1 TIPS ~Gbits/sec Tier 0 Tier 1 Tier 3 Tier 4 ScotGRID++ ~1 TIPS Tier 2 Tony Doyle - University of Glasgow Events.. to Files.. to Events RAW ESD AOD TAG Interesting Events List RAW ESD AOD TAG RAW ESD AOD TAG Tier-0(International) Tier-1(National) Tier-2(Regional) Tier-3(Local) Data Files Data Files Data Files TAG Data Files Data Files Data Files RAW Data File Data Files Data Files ESD Data Files Data Files AOD Data Event 1 Event 2 Event 3 Tony Doyle - University of Glasgow Application Interfaces under development Tony Doyle - University of Glasgow How Does the Grid Work? 1. Authentication grid-proxy-init 2. Job submission dg-job-submit 3. Monitoring and control dg-job-status dg-job-cancel dg-job-get-output 4. Data publication and replication globus-url-copy, GDMP 5. Resource scheduling use of Mass Storage Systems JDL, sandboxes, storage elements 0. Web User Interface Tony Doyle - University of Glasgow Middleware Development Tony Doyle - University of Glasgow Project Overview Tony Doyle - University of Glasgow Institutes GridPP GridPP in Context Core e-Science Programme GridPP CERN LCG Tier-1/A Middleware Experiments Tier-2 Grid Support Centre EGEE Not to scale! Apps Dev Apps Int Tony Doyle - University of Glasgow The Project Map Tony Doyle - University of Glasgow LHC Computing Grid LCG-1 Release Ref.milestone descriptiontarget date M1.1First Global Service (LCG-1) - Initial Availability This comprises the construction and commissioning of the first LHC Computing service suitable for physics usage. The service must offer reliably 24x7 availability to all four LHC experiments and include some ten Regional Centers from Europe, North America and Asia. The milestone includes delivery of the associated Technical Design, containing description of the architecture and functionality and quantified technical specifications of performance (capacity, throughput, reliability, availability). It must also include middleware specifications, agreed as a common toolkit by Europe and US. The service must prove functional, providing a batch service for event production and analysis of the simulated data set. For the milestone to be met, operation must be sustained reliably during a 7 day period; stress tests and user productions will be executed, with a failure rate below 1%. July 2003 Tony Doyle - University of Glasgow Certification and distribution process established Middleware package components from European DataGrid (EDG) US (Globus, Condor, PPDG, GriPhyN) the Virtual Data Toolkit Agreement reached on principles for registration and security RAL to provide the initial grid operations centre FZK to operate the call centre Initial service being deployed now to 10 centres US, Europe, Asia Expand to other centres as soon as the service is stable LCG Academia Sinica Taipei, BNL, CERN, CNAF, FNAL, FZK, IN2P3 Lyon, Moscow State Univ., RAL, Univ. Tokyo LHC Computing Grid Service Tony Doyle - University of Glasgow Resources committed for 1Q04 LCG Service - Target for 2004 Establish the LHC Grid as a service for data challenges, and computing model evaluation the basic infrastructure for distribution, coordination, operation building collaboration between the people who manage and operate the Regional Centres integrating the majority of the resources in Regional Centres needed for the LHC data challenges of 2004 reliability this is the priority and essential for . providing measurable value for experiments production teams and attracting end-users to the grid CPU (kSI2K) Disk TB Support FTE Tape TB CERN Czech Repub France Germany Holland Italy Japan Poland Russia Taiwan Spain Sweden Switzerland UK USA Total LCG Tony Doyle - University of Glasgow European DataGrid (EDG) EDG 2.0EDG 2.1 Tony Doyle - University of Glasgow European DataGrid WP1 WP1 Workload Management Deploy and support Resource Brokers at IC Tony Doyle - University of Glasgow European DataGrid WP1 WP1 Workload Management Logging & Bookkeeping Server Saving of job checkpoint state state.saveState() Job Job checkpoint states saved in the LB server Retrieval of job checkpoint u Also used (even in rel. 1) as repository of job status info u Already proved to be robust and reliable u The load can be distributed between multiple LB servers, to address scalability problems Job Checkpointing in EDG2.0 Tony Doyle - University of Glasgow European DataGrid WP2 WP2 Data Management Storage Element Replica Manager Replica Location Service Replica Optimization Service Replica Metadata Catalog SE Monitor Network Monitor Information Service Resource Broker User Interface or Worker Node Storage Element Virtual Organization Membership Service UK Contributions RM in EDG2.0 Tony Doyle - University of Glasgow Grid Data Management: Requirements 1.Robust - software development infrastructure 2.Secure via Grid certificates 3.Scalable non-centralised 4.Efficient Optimised replication Examples: GDMPSpitfireReptorOptor Tony Doyle - University of Glasgow Servlet Container SSLServletSocketFactory TrustManager Security Servlet Does user specify role? Map role to connection id Authorization Module Yes Role Trusted CAs Is certificate signed by a trusted CA? No Has certificate been revoked? Revoked Certs repository Find default No Role repository Role ok? Connection mappings Translator Servlet RDBMS Request a connection ID Connection Pool MetaData: Spitfire Secure? At the level required in Particle Physics Tony Doyle - University of Glasgow OptorSim: File Replication Simulation Tests file replication strategies: e.g. economic model scalability Reptor: Replica architecture Optor: Test file replication strategies: economic model Reptor: Replica architecture Optor: Test file replication strategies: economic model Demo and Poster: Studying Dynamic Grid OptimisationStudying Dynamic Grid Optimisation Algorithms for File ReplicationAlgorithms for File Replication Tony Doyle - University of Glasgow Complex Systems: Flow Diagrams OptorResource BrokerComputing Element Tony Doyle - University of Glasgow European DataGrid WP3 WP3 Information and Monitoring Services UK Product RGMA in EDG2.0 R-GMA Consumers LDAP InfoProvider GIN LDAP Server LDAP InfoProvider Stream Producer GIN Consumer (CE) Consumer (SE) Consumer (SiteInfo) RDBMS Latest Producer GOUT ConsumerA PI Archiver (LatestProducer) Stream Producer R-GMA GLUE Schema Push mode Updates every 30s >70 sites (simul.) Tony Doyle - University of Glasgow European DataGrid WP4 WP4 Fabric Management UK Product LCFG configuration software from Univ. Edinburgh was used from Month 12 onwards of the EDG project. Newer version, LCFGng, in EDG-2.0 and in LCG-1 Tony Doyle - University of Glasgow European DataGrid WP5 WP5 Mass Storage Management SE in EDG2.0 Client App API SE HTTP library SSL socket library AXIS SE core SE Java Client Tomca t u The design of the SE follows a layered model with a central core handling all paths between client and MSS. Core is flexible and extensible making it easy to support new protocols, features and MSS Client App Java Client API C Client RMANMAN Apache Web Service UK Product Tony Doyle - University of Glasgow European DataGrid WP6 WP6 Testbed EDG Application testbed: More than 40 sites More than 1000 CPUs 5 Terabyte of storage Testbed successfully demonstrated during 2 nd EU review in Feb 2003 Large UK participation Tony Doyle - University of Glasgow European DataGrid WP7 WP7 Network Services Tony Doyle - University of Glasgow Is the Middleware Robust? 1.Code Base (1/3 Mloc) 2.Software Evaluation Process 3.Testbed Infrastructure: Unit Test Build Integration Certification Production 4.Code Development Platforms Tony Doyle - University of GlasgowApplications Applications: GANGA Underlying GRID services (GLOBUS toolkit) GRID middleware (EDG, PPDG,) Application specific layer (Athena/Gaudi, ) GUI interface OS and Network services Multilayered Grid architecture GANGA A common interface to the Grid for Atlas and LHCb Server Bookkeeping DB Production DB EDG UI PYTHON SW BUS XML RPC server XML RPC module GANGA Module OS Module Athena\ GAUDI GaudiPython PythonROOT PYTHON SW BUS GUI Job Configuration DB Remote user (client) Local Job DB LAN/WAN GRID LRMS Tony Doyle - University of Glasgow Applications: CHEP03 Papers ATCom GANGA DIRAC Grid Tests Three papers Six papers Total of 14 Application papers (plus 7 middleware papers). Tony Doyle - University of Glasgow Applications: CMS GUIDO portal demonstrated at All-Hands New, generic version to be unveiled this year. Tony Doyle - University of Glasgow Applications: CDF/D0 Tony Doyle - University of Glasgow Applications: CDF/D0 D0 plan to reprocess 22 TB of DST using SAMGRID between Sep 1 st and Nov 25th Tony Doyle - University of GlasgowTier-1/A Tier-2 Testbed All testbed sites can be said to be truly on a Grid by virtue of their registration in a comprehensive resource information publication scheme, their accessibility via a set of globally enabled resource brokers, and the use of one of the first scalable mechanisms to support distributed virtual communities (VOMS). There are few such Grids in operation in the World today. Tony Doyle - University of Glasgow Data Challenges Tony Doyle - University of Glasgow LHCb Data Challenge: 1/3 of events produced in the UK Tony Doyle - University of Glasgow Interoperability and Dissemination Tony Doyle - University of Glasgow Resources Project Map Summary Identifiable progress via tasks and metrics At the midpoint of the project, over half of the 182 tasks are completed and all of the 44 metrics are within specifications. The Project Map incorporates a complete Risk Register with 76 identified risks, reviewed on a regular basis. Oversight Committee (May 2003): Conclusion "Excellent progress, management, control, reporting and thinking (both tactical and strategic), we look forward to an equally successful second half of the project as the investments made translate into middleware roll- out and benefits to the HEP experiments. It is difficult to imagine more impressive progress all round." The LCG-1 release is imminent; a landmark moment. Tony Doyle - University of Glasgow GridPP1 Summary I 1.Dedicated people actively developing a Grid 2.All with personal certificates 3.Using the largest UK grid testbed (16 sites and more than 100 servers) 4.Deployed within EU-wide programme 5.Linked to Worldwide Grid testbeds Tony Doyle - University of Glasgow GridPP1 Summary II 6.Grid Deployment Programme Defined The Basis for LHC Computing 7.Active Tier-1/A Production Centre meeting International Requirements 8.Latent Tier-2 resources being monitored 9.Significant middleware development programme 10.First simple applications using the Grid testbed (open approach) Tony Doyle - University of Glasgow GridPP Theory and Experiment From Web to Grid Fit into UK e-Science structures LHC Computing Particle physicists will use experience in distributed computing to build and exploit the Grid Infrastructure tiered computing down to the physicist desktop Scale in UK? 0.5 PBytes and 2,000 distributed CPUs GridPP in Sept 2004 Importance of networking Existing experiments have immediate requirements Non-technical issues = recognising/defining roles (shared resources) UK GridPP started 1/9/01 EU DataGrid First Middleware ~1/9/01 Development requires a testbed with feedback Operational Grid GridPP Testbed is relatively small scale migration plans reqd. e.g. for UK e-Science CA Grid jobs are being submitted today.. user feedback loop is important.. Grid tools web page development by a VO (GridPP) Next stop Web and Grid services (WSDL and OGSA) Tony Doyle - University of Glasgow Next Step: GridPP2 (1/9/04-31/8/07)~30 page proposal + figures/tables + 11 planning documents: 15.Tier-0 16.Tier-1 17.Tier-2 18.The Network Sector 19.Middleware 20.Applications 21.Hardware Requirements 22.Management 23.Travel 24.Dissemination 25.From Testbed to Production Production Grid Whole Greater than the Sum of Parts.. Tony Doyle - University of Glasgow Tagged release selected for certification Certified release selected for deployment Tagged package Problem reports add unit tested code to repository Run nightly build & auto. tests Grid certification Fix problems Application Certification Build System Certification Testbed ~40CPU Production Testbed ~1000CPU Certified public release for use by apps. 24x7 Build system Test Group WPs Unit Test Build Certification Production Users Development Testbed ~15CPU Individual WP tests Integration Team Integration Overall release tests Releases candidate Tagged Releases Releases candidate Certified Releases Apps. Representatives From Testbed to Production Tony Doyle - University of Glasgow Experiment Requirements: UK only Total Requirement: Tony Doyle - University of Glasgow Projected Hardware Resources Total Resources: (note x2 scale change) Tony Doyle - University of Glasgow GridPP2 Summary Challenge in going from prototype to production system Project Map built in: to identify progress