rsv and nagios in osg

13
RSV and Nagios in OSG Rob Quick

Upload: moswen

Post on 31-Jan-2016

37 views

Category:

Documents


0 download

DESCRIPTION

RSV and Nagios in OSG. Rob Quick. Current State of OSG. ~ 100 Sites ~ 30 VOs April 8th: 216,000 jobs (85% successful) 375,000 wallclock hours About half of the jobs were run on resources NOT owned by the VO that owns the resources. Recent and Upcoming Operations Highlights. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: RSV and Nagios in OSG

RSV and Nagios in OSG

Rob Quick

Page 2: RSV and Nagios in OSG

2March 11, 2008 USCMS Tier-2 Workshop

Current State of OSG

• ~ 100 Sites

• ~ 30 VOs

• April 8th: 216,000 jobs (85% successful) 375,000 wallclock hours About half of the jobs were run on

resources NOT owned by the VO that owns the resources

Page 3: RSV and Nagios in OSG

3March 11, 2008 USCMS Tier-2 Workshop

Recent and Upcoming Operations Highlights

• WLCG SAM reporting of availability Statistics SAM Interface GridView Interface

• OIM Registration Database

• RSV Version 2 Easier to configure and upkeep SE Probes

Page 4: RSV and Nagios in OSG

4March 11, 2008 USCMS Tier-2 Workshop

SAM Environment

Page 5: RSV and Nagios in OSG

5March 11, 2008 USCMS Tier-2 Workshop

GridView

Page 6: RSV and Nagios in OSG

6March 11, 2008 USCMS Tier-2 Workshop

RSV Version 2

• New probes SE GUMS Software versions CA Certificates up to date

• New simplified configuration scheme• Service Certificates!• VO access to RSV Database info and web

interface• Hook to OIM

Page 7: RSV and Nagios in OSG

7March 11, 2008 USCMS Tier-2 Workshop

A Probe

[rquick@feynman probes]$ ./jobmanagers-status-probe -u proton.fis.cinvestav.mx -m all

metricName: org.osg.batch.jobmanager-fork-statusmetricType: statustimestamp: 2008-04-24T11:57:41ZmetricStatus: OKserviceType: globus-GRAM-forkserviceURI: proton.fis.cinvestav.mxgatheredAt: feynman.uits.iupui.edusummaryData: OKdetailsData: A test job was successfully submitted to "proton.fis.cinvestav.mx/jobmanager-fork", its status when last

checked was a valid one ("ACTIVE"); and finally the test job was successfully cleaned up!EOT

metricName: org.osg.batch.jobmanager-pbs-statusmetricType: statustimestamp: 2008-04-24T11:57:41ZmetricStatus: OKserviceType: globus-GRAM-PBSserviceURI: proton.fis.cinvestav.mxgatheredAt: feynman.uits.iupui.edusummaryData: OKdetailsData: A test job was successfully submitted to "proton.fis.cinvestav.mx/jobmanager-pbs", its status when last

checked was a valid one ("DONE"); and finally the test job was successfully cleaned up!EOT

Page 8: RSV and Nagios in OSG

8March 11, 2008 USCMS Tier-2 Workshop

Local RSV Structure

Page 9: RSV and Nagios in OSG

9March 11, 2008 USCMS Tier-2 Workshop

RSV Reporting to Nagios Console

QuickTime™ and a decompressor

are needed to see this picture.

Page 10: RSV and Nagios in OSG

10March 11, 2008 USCMS Tier-2 Workshop

Provided by:

Sarah Williams

Page 11: RSV and Nagios in OSG

11March 11, 2008 USCMS Tier-2 Workshop

History of Monitoring in OSG

“Monitoring is always a difficult beast to tame.  Much careful thought has gone into it over the years, and the highway to this point is littered with lots of dead monitoring bodies.  I think the current effort is striving for simplicity, and I hope it gets there!” -Alan Sill (TACC)

Page 12: RSV and Nagios in OSG

12March 11, 2008 USCMS Tier-2 Workshop

Planned Central Structure

•Can it be this simple?

Page 13: RSV and Nagios in OSG

13March 11, 2008 USCMS Tier-2 Workshop

Questions/Comments?