lhc experiment dashboard main areas covered by the experiment dashboard: data processing monitoring...

16
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service monitoring from the VO perspective Thanks to Julia Andreeva and E. Karavakis for the slides

Upload: annice-bryant

Post on 24-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service

LHC Experiment DashboardMain areas covered by the Experiment

Dashboard:Data processing monitoring (job monitoring)Data transfer monitoringSite/service monitoring from the VO

perspective

Thanks to Julia Andreeva and E. Karavakis for the slides

Page 2: LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service

Dashboard for Monitoring the Computing Activities of the

LHC

Analysis + ProductionReal time and

Historical Views

Data transferData access

Site Status BoardSite usabilitySiteView

WLCG GoogleEarth Dashboard

14/04/2011 2Monitoring of the LHC computing activities

during the first year of data taking

Page 3: LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service

Common SolutionsApplication ATLAS CMS LHCb ALICE

Job monitoring(multiple

applications)

Site Status Board

Site Usability Monitoring

DDM Monitoring

global transfer

monitoring system (en

projet 2011)

SiteView &GoogleEarth

14/04/2011 3

Not actively used

Page 4: LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service

Job Monitoring

• Aimed at different types of users: individual scientists using the Grid for data analysis, user support teams, site admins, VO managers, managers of different computing projects

• Works transparently across different middleware, submission methods and execution backends

14/04/2011 4Monitoring of the LHC computing activities

during the first year of data taking

Page 5: LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service

Job monitoringDuring 2010, Dashboard job monitoring for ATLAS was

completely redesigned. Most of applications are shared with CMS. The shared components are data schema of the data repositories and user interfaces.Information sources are different => collectors are different as well.

In case of CMS , CMS job submission tools (servers and job themselves) are instrumented to report job status information to Dashboard.

In case of ATLAS , Dashboard is integrated with PANDA job monitoring DB. The Dashboard collector retrieves data from the PANDA DB every 5 minutes. Jobs submitted via Ganga through WMS or to local batch systems are instrumented to report their status via Messaging System for the Grid (MSG) based on ActiveMQ.

Page 6: LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service

The following applications were enabled for ATLAS:Interactive viewHistorical viewTask monitoring (first prototype)

CMS job monitoring was extended in order to collect file accessinformation which is used by the Data Popularity service.

During next half of the year:New version of the Historical view will be enabled for CMSContinue effort aimed to improve performance, both for data

collectors and UisDevelopment of the new version of ATLAS task monitoring for the

analysis users with the possibility to resubmit/kill jobs via the monitoring UI

Job monitoring

Page 7: LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service

Task Monitoring

Distributionby Site

Detailed Job Information

Distributionby Status

ProcessedEvents

over Time

Failure Diagnostics for Grid and Application Failures

Efficiency Distributedby Site

14/04/2011 7

• User / User-support perspective

• Wide selection of plots

• CMS & ATLAS • >350 CMS users daily

Monitoring of the LHC computing activities during the first year of data taking

Page 8: LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service

Job Summary & Historical Views

14/04/2011 8

Job Summary• Shifter, Expert, Site

perspective• Real time job

metrics by site, activity, …

Historical Views• Site, Management

perspective• Job metrics as a

function of time

Monitoring of the LHC computing activities during the first year of data taking

Page 9: LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service

Data transfer monitoringNew version of the ATLAS Distributed Data

Management monitoring (ATLAS DDMDashboard). Improved visualization (matrix) which allows to monitor data transfer by source or by destination and to spot easier any tranfer problems. The first prototype was released in May and is already in use by the ATLAS community:http://dashb-atlas-data.cern.ch/dashboard/ddm2/First feedback is very positive.

Page 10: LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service

During second half of 2011, in collaboration with the GT group of the CERN IT department will start to develop the global transfer monitoring system.The distributed FTS instances will be instrumented for reporting of the data transfer events via MSG. Dashboard collector will consume these events and record them into the central data repository, generate overall transfer statistics and expose this info to the user community via UIs and APIs.Most of ATLAS DDM Dashboard code should be re-used for the new data transfer monitoring system. More details can be found at:https://twiki.cern.ch/twiki/bin/view/LCG/WLCGTransferMonitoring

The detailed roadmap for this project is not yet defined.

Data transfer monitoring

Page 11: LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service

Site/service monitoring

Include following applications:Site Usability (based on the results of SAM

tests)Site Status BoardWLCG Google Earth Dashboard

Page 12: LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service

Site Usability Monitoring (SUM)

During 2010 and beginning of 2011 SAM framework was completely redesigned and new version is based on Nagios.

The LHC VOs started to submit remote tests via Nagios. The Dashboard Site Usability application is being re-designed to be compatible with the new SAM architecture. The first prototype was deployed on the validation server in April and should be validated by the LHC VOs. New SUM should be deployed to production by the end of summer 2011.

Page 13: LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service

SUM Snapshots

14/04/2011 13Monitoring of the LHC computing activities

during the first year of data taking

Page 14: LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service

Site Status Board (SSB)During second half of 2010 a lot of improvements were

implemented for SSB:New version of the collectors which allowed to solve the DB

locking problem and to provide the necessary level of performance were deployed in production (February 2011)

New version of the UI with improved performance and extended functionality were deployed in production (Spring 2011)

Both ATLAS and CMS are using SSB for the computing shifts and site commissioning activity

Further development will follow the needs and requests of the LHC VOs

Page 15: LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service

SSB Snapshot

Maintenance

Easy to identify sites with problems

Grouped sites

14/04/2011 15Monitoring of the LHC computing activities

during the first year of data taking

Page 16: LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service

WLCG Google Earth DashboardGoogleEarth Dashboard is integrated with all VO-

specific monitoring systems, including Dirac and MonAlisa , so it shows activities for all 4 experiments

Recent development was focussed on the improvement of the robustness and reliability of the application.

http://dashb-earth.cern.ch/dashboard/doc/guides/service-monitor-gearth/html/user/setupSection.html