lhc experiment dashboard main areas covered by the experiment dashboard: data processing monitoring...
TRANSCRIPT
LHC Experiment DashboardMain areas covered by the Experiment
Dashboard:Data processing monitoring (job monitoring)Data transfer monitoringSite/service monitoring from the VO
perspective
Thanks to Julia Andreeva and E. Karavakis for the slides
Dashboard for Monitoring the Computing Activities of the
LHC
Analysis + ProductionReal time and
Historical Views
Data transferData access
Site Status BoardSite usabilitySiteView
WLCG GoogleEarth Dashboard
14/04/2011 2Monitoring of the LHC computing activities
during the first year of data taking
Common SolutionsApplication ATLAS CMS LHCb ALICE
Job monitoring(multiple
applications)
Site Status Board
Site Usability Monitoring
DDM Monitoring
global transfer
monitoring system (en
projet 2011)
SiteView &GoogleEarth
14/04/2011 3
Not actively used
Job Monitoring
• Aimed at different types of users: individual scientists using the Grid for data analysis, user support teams, site admins, VO managers, managers of different computing projects
• Works transparently across different middleware, submission methods and execution backends
14/04/2011 4Monitoring of the LHC computing activities
during the first year of data taking
Job monitoringDuring 2010, Dashboard job monitoring for ATLAS was
completely redesigned. Most of applications are shared with CMS. The shared components are data schema of the data repositories and user interfaces.Information sources are different => collectors are different as well.
In case of CMS , CMS job submission tools (servers and job themselves) are instrumented to report job status information to Dashboard.
In case of ATLAS , Dashboard is integrated with PANDA job monitoring DB. The Dashboard collector retrieves data from the PANDA DB every 5 minutes. Jobs submitted via Ganga through WMS or to local batch systems are instrumented to report their status via Messaging System for the Grid (MSG) based on ActiveMQ.
The following applications were enabled for ATLAS:Interactive viewHistorical viewTask monitoring (first prototype)
CMS job monitoring was extended in order to collect file accessinformation which is used by the Data Popularity service.
During next half of the year:New version of the Historical view will be enabled for CMSContinue effort aimed to improve performance, both for data
collectors and UisDevelopment of the new version of ATLAS task monitoring for the
analysis users with the possibility to resubmit/kill jobs via the monitoring UI
Job monitoring
Task Monitoring
Distributionby Site
Detailed Job Information
Distributionby Status
ProcessedEvents
over Time
Failure Diagnostics for Grid and Application Failures
Efficiency Distributedby Site
14/04/2011 7
• User / User-support perspective
• Wide selection of plots
• CMS & ATLAS • >350 CMS users daily
Monitoring of the LHC computing activities during the first year of data taking
Job Summary & Historical Views
14/04/2011 8
Job Summary• Shifter, Expert, Site
perspective• Real time job
metrics by site, activity, …
Historical Views• Site, Management
perspective• Job metrics as a
function of time
Monitoring of the LHC computing activities during the first year of data taking
Data transfer monitoringNew version of the ATLAS Distributed Data
Management monitoring (ATLAS DDMDashboard). Improved visualization (matrix) which allows to monitor data transfer by source or by destination and to spot easier any tranfer problems. The first prototype was released in May and is already in use by the ATLAS community:http://dashb-atlas-data.cern.ch/dashboard/ddm2/First feedback is very positive.
During second half of 2011, in collaboration with the GT group of the CERN IT department will start to develop the global transfer monitoring system.The distributed FTS instances will be instrumented for reporting of the data transfer events via MSG. Dashboard collector will consume these events and record them into the central data repository, generate overall transfer statistics and expose this info to the user community via UIs and APIs.Most of ATLAS DDM Dashboard code should be re-used for the new data transfer monitoring system. More details can be found at:https://twiki.cern.ch/twiki/bin/view/LCG/WLCGTransferMonitoring
The detailed roadmap for this project is not yet defined.
Data transfer monitoring
Site/service monitoring
Include following applications:Site Usability (based on the results of SAM
tests)Site Status BoardWLCG Google Earth Dashboard
Site Usability Monitoring (SUM)
During 2010 and beginning of 2011 SAM framework was completely redesigned and new version is based on Nagios.
The LHC VOs started to submit remote tests via Nagios. The Dashboard Site Usability application is being re-designed to be compatible with the new SAM architecture. The first prototype was deployed on the validation server in April and should be validated by the LHC VOs. New SUM should be deployed to production by the end of summer 2011.
SUM Snapshots
14/04/2011 13Monitoring of the LHC computing activities
during the first year of data taking
Site Status Board (SSB)During second half of 2010 a lot of improvements were
implemented for SSB:New version of the collectors which allowed to solve the DB
locking problem and to provide the necessary level of performance were deployed in production (February 2011)
New version of the UI with improved performance and extended functionality were deployed in production (Spring 2011)
Both ATLAS and CMS are using SSB for the computing shifts and site commissioning activity
Further development will follow the needs and requests of the LHC VOs
SSB Snapshot
Maintenance
Easy to identify sites with problems
Grouped sites
14/04/2011 15Monitoring of the LHC computing activities
during the first year of data taking
WLCG Google Earth DashboardGoogleEarth Dashboard is integrated with all VO-
specific monitoring systems, including Dirac and MonAlisa , so it shows activities for all 4 experiments
Recent development was focussed on the improvement of the robustness and reliability of the application.
http://dashb-earth.cern.ch/dashboard/doc/guides/service-monitor-gearth/html/user/setupSection.html