atlas off-grid sites (tier-3) monitoring a. petrosyan on behalf of the atlas collaboration...

20
ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, 17.07.12, JINR, Dubna

Upload: georgina-day

Post on 28-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, 17.07.12, JINR, Dubna

ATLAS Off-Grid sites (Tier-3) monitoring

A. Petrosyan on behalf of the ATLAS collaboration

GRID’2012, 17.07.12, JINR, Dubna

Page 2: ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, 17.07.12, JINR, Dubna

GRID'2012, JINR, Dubna 2

Goals of the project

• Provide reasonable monitoring solution for ‘off grid’ sites (unplugged geographically close computing resources)

• Monitoring of computing facility of local groups with collocated storage system (Tier1+Tier3, Tier2+Tier3)

• Present Tier-3 sites activity on global level• Data transfer monitoring across XRootD

federation17.07.2012

Page 3: ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, 17.07.12, JINR, Dubna

GRID'2012, JINR, Dubna 3

Tier-3 sites monitoring levels

• Monitoring of the local infrastructure for site administration

• Central system for monitoring of the VO activities at Tier-3 sites

17.07.2012

Page 4: ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, 17.07.12, JINR, Dubna

GRID'2012, JINR, Dubna 4

Objectives of the local monitoring system at Tier-3 site

• Detailed monitoring of the local fabric• Monitoring of the batch system• Monitoring of the job processing• Monitoring of the mass storage system• Monitoring of the VO computing activities on

the local site

17.07.2012

Page 5: ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, 17.07.12, JINR, Dubna

GRID'2012, JINR, Dubna 5

Objectives of the global Tier-3 monitoring

• Monitoring of the VO usage of the Tier-3 resources in terms of data transfer, data access, and job processing

• Quality of the provided service based on the job processing and data transfer monitoring metrics

17.07.2012

Page 6: ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, 17.07.12, JINR, Dubna

GRID'2012, JINR, Dubna 6

Site monitoring

• Based on Ganglia monitoring system• Collects basic metrics using Ganglia sensors• Plugin system for monitoring specific metrics• PostgreSQL to aggregate data• More details for each package at https://

svnweb.cern.ch/trac/t3mon/wiki/T3MONHome

• Monitoring modules available for Condor, Lustre, PBS, Proof, XRootD; each has plugin to deliver data to the global level

• Examples of UI for different systems at http://vm01.jinr.ru/ganglia/

17.07.2012

Page 7: ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, 17.07.12, JINR, Dubna

GRID'2012, JINR, Dubna 7

Data flow for the site monitoring• Common UI for various

data sources• Small core with separate

modules allows to install only needed software

• Delivery to global level can be switched off

17.07.2012

Page 8: ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, 17.07.12, JINR, Dubna

GRID'2012, JINR, Dubna 8

Global monitoring

• Ganglia as executor• MSG as transmitting system• Publisher on local site: is executed by gmond,

intercommunicates with local DB and sends information to MSG system

• Backend: consumer(s) of messages at CERN and data popularity and jobs statistics presentation via Dashboard

17.07.2012

Page 9: ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, 17.07.12, JINR, Dubna

GRID'2012, JINR, Dubna 9

Data flow for the global monitoring

17.07.2012

Page 10: ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, 17.07.12, JINR, Dubna

GRID'2012, JINR, Dubna 10

Data flow for Proof, Condor• PostgreSQL for data aggregation on local

site• Ganglia UI to present data popularity on

site level• Ganglia gmond to execute summary

gathering• Summary is delivered to Dashboard

historical views once per hour• Data being sent to global level:

• Job status: Ok, stopped, aborted• Site name• Time of report• Amount of processed events• Bytes read• Amount of active users

17.07.2012

Page 11: ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, 17.07.12, JINR, Dubna

GRID'2012, JINR, Dubna 11

Data flow for XRootD• Both summary and detailed events

gatherer implemented as Linux daemon• Summary data goes directly to Ganglia• File transfer data can be stored in local

PostgreSQL and then presented via Ganglia

• Detailed data can be delivered to ActiveMQ directly

• Data being sent to global level:• Domain from, host and ip address• Domain to, host and ip address• User• File, size• Bytes read, written• Time transfer started and finished

17.07.2012

Page 12: ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, 17.07.12, JINR, Dubna

GRID'2012, JINR, Dubna 12

Tier-3 monitoring status• Full chain of development from Tier-3 site to Dashboard was

performed• Site-level presentation via Ganglia Web 2.0• Global-level presentation of Proof jobs via Dashboard Historical

Views• Tier-3 site to DQ2 popularity: formats agreed, delivers, consumer on

DQ2 side is in testing stage

• T3Mon software was installed on pilot sites• Distribution is available via our repository:

https://svnweb.cern.ch/trac/t3mon/wiki/YumConfigure• We are welcome more sites to try and to send their feedback to our

support list: [email protected]

17.07.2012

Page 13: ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, 17.07.12, JINR, Dubna

GRID'2012, JINR, Dubna 13

XRootD transfers monitoring

• Goal: present transfers between servers and sites in federation via one UI

• Messages from XRootD servers are being collected via T3Mon UDP collector and then being sent into AMQ

• Data is stored in Hbase storage• Hadoop processing is used to prepare data summaries• Web-services for data export• Dashboard transfer interface as UI

17.07.2012

Page 14: ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, 17.07.12, JINR, Dubna

GRID'2012, JINR, Dubna 14

Data flow for the XRootD federation monitoring

17.07.2012

Page 15: ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, 17.07.12, JINR, Dubna

GRID'2012, JINR, Dubna 15

T3Mon UDP messages collector

• Can be installed anywhere, implemented as Linux daemon• Extracts transfer info from several messages and compose

file transfer message• Sends complete transfer message to ActiveMQ• Message includes:

– Domain from, host and ip address– Domain to, host and address– User– File, size– Bytes read/written– Time transfer started/finished

17.07.2012

Page 16: ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, 17.07.12, JINR, Dubna

GRID'2012, JINR, Dubna 16

AMQ2Hadoop collector

• Can be installed anywhere, implemented as Linux daemon

• Listens ActiveMQ queue• Extracts messages• Inserts into Hbase raw table

17.07.2012

Page 17: ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, 17.07.12, JINR, Dubna

GRID'2012, JINR, Dubna 17

Hadoop processing

• Reads raw table• Prepares data summary: 10 min stats as structure:

– From– To– Sum bytes read– Sum bytes written– Amount files read– Amount files written

• Inserts summary data into summary table• MapReduce: we use Java, we also working on enabling Pig

routines17.07.2012

Page 18: ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, 17.07.12, JINR, Dubna

GRID'2012, JINR, Dubna 18

Storage2UI data export

• Web-service• Extracts data from the storage• Feeds Dashboard XBrowse UI

17.07.2012

Page 19: ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, 17.07.12, JINR, Dubna

GRID'2012, JINR, Dubna 19

Status

• In prototype stage:– Hadoop processing is executed manually– Simulated data

• UI:http://xrdfedmon-dev.jinr.ru/ui/#date.from=201206210000&date.interval=0&date.to=201206220000&grouping.dst=(host)&grouping.src=(host)

• We are ready to start testing on real federation

17.07.2012

Page 20: ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, 17.07.12, JINR, Dubna

GRID'2012, JINR, Dubna 20

Thanks for attention

17.07.2012