xrootd monitoring report - indico.fnal.gov · 0. 5. 10. 15. 20. 25. 30. 35. 40. 45 # sites. number...

37
XRootD Monitoring Report A.Beche D.Giordano

Upload: others

Post on 29-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: XRootD Monitoring Report - indico.fnal.gov · 0. 5. 10. 15. 20. 25. 30. 35. 40. 45 # sites. Number of sites reporting . XRootD federation monitoring Activity started during summer

XRootD Monitoring Report A.Beche

D.Giordano

Page 2: XRootD Monitoring Report - indico.fnal.gov · 0. 5. 10. 15. 20. 25. 30. 35. 40. 45 # sites. Number of sites reporting . XRootD federation monitoring Activity started during summer

Outlines Talk 1: XRootD Monitoring Dashboard Context Dataflow and deployment model Database: storage & aggregation User interface & use cases Open issues & future work Summary

Talk 2: Beyond XRootD monitoring HTTP/WebDAV integration Integration in the WLCG Transfers Dashboard

10 – April - 14 A.Beche – Federated Workshop 2

Page 3: XRootD Monitoring Report - indico.fnal.gov · 0. 5. 10. 15. 20. 25. 30. 35. 40. 45 # sites. Number of sites reporting . XRootD federation monitoring Activity started during summer

05

1015202530354045

# si

tes

Number of sites reporting

XRootD federation monitoring

Activity started during summer 2012 4 sites for FAX, 11 for AAA

Monitoring data increased accordingly

July 2012 March 2014

AAA 606k 43M

FAX 15k 22M

10 – April - 14 A.Beche – Federated Workshop 3

Page 4: XRootD Monitoring Report - indico.fnal.gov · 0. 5. 10. 15. 20. 25. 30. 35. 40. 45 # sites. Number of sites reporting . XRootD federation monitoring Activity started during summer

Why monitoring ?

Understand data flows to estimate data traffic Provide information for efficient operations

Identify access patterns and propose data

placement strategies

10 – April - 14 A.Beche – Federated Workshop 4

Page 5: XRootD Monitoring Report - indico.fnal.gov · 0. 5. 10. 15. 20. 25. 30. 35. 40. 45 # sites. Number of sites reporting . XRootD federation monitoring Activity started during summer

Raw

Stats

10 m

inut

es

XRootD monitoring dataflow

Federation GLED

Collector Consumer

WEB API

Dashboard UI

External applications

real time

UDP

stomp

stomp

asynchronous

ActiveMQ

10 – April - 14 A.Beche – Federated Workshop 5

Page 6: XRootD Monitoring Report - indico.fnal.gov · 0. 5. 10. 15. 20. 25. 30. 35. 40. 45 # sites. Number of sites reporting . XRootD federation monitoring Activity started during summer

GLED Deployment model

050

100150200

Hz

EOS monitoring data rate

05

101520

Hz

Federation monitoring data rate

AMQ @ CERN Shared cluster

5 nodes AAA

UCSD (16Hz)

EOS CERN (10Hz)

EOS CERN (150Hz)

FAX US SLAC (9Hz)

FAX EU CERN (1 site)

10 – April - 14 A.Beche – Federated Workshop 6

Page 7: XRootD Monitoring Report - indico.fnal.gov · 0. 5. 10. 15. 20. 25. 30. 35. 40. 45 # sites. Number of sites reporting . XRootD federation monitoring Activity started during summer

Consolidated dataflow

Two usage of these raw data: Dashboard monitoring XRootD popularity

Now share the same database: Storage optimization Consistency guaranteed

10 – April - 14 A.Beche – Federated Workshop 7

Page 8: XRootD Monitoring Report - indico.fnal.gov · 0. 5. 10. 15. 20. 25. 30. 35. 40. 45 # sites. Number of sites reporting . XRootD federation monitoring Activity started during summer

AAA ~300 GB

~1B records

Database

FAX ~600 GB

~2B records

Daily insert 2 GB / 6M rows

Storage Raw, statistics, metadata Tables daily partitioned, no global indexes

0100200300400500600700

GB

Database usage growth*

* Indexes excluded

10 – April - 14 A.Beche – Federated Workshop 8

Page 9: XRootD Monitoring Report - indico.fnal.gov · 0. 5. 10. 15. 20. 25. 30. 35. 40. 45 # sites. Number of sites reporting . XRootD federation monitoring Activity started during summer

Database Raw data aggregation: Done using PL/SQL procedures Events are unordered Stateless: Full re-computation of touched bins

each time Compute stats from raw data in 10 min bins Aggregate 10 min stats in daily bins

10 – April - 14 A.Beche – Federated Workshop 9

Page 10: XRootD Monitoring Report - indico.fnal.gov · 0. 5. 10. 15. 20. 25. 30. 35. 40. 45 # sites. Number of sites reporting . XRootD federation monitoring Activity started during summer

Aggregation methods

2pm 3pm 4pm 5pm 6pm 7pm Tran

sfer

s

Easy method

Transfers 1 0 0 2 1

Bytes

10 0 0 15 20

10 – April - 14 A.Beche – Federated Workshop 10

Page 11: XRootD Monitoring Report - indico.fnal.gov · 0. 5. 10. 15. 20. 25. 30. 35. 40. 45 # sites. Number of sites reporting . XRootD federation monitoring Activity started during summer

Aggregation methods

2pm 3pm 4pm 5pm 6pm 7pm

Transfers 1 0 0 2 1

Bytes

10 0 0 15 20

Transfers 1 (1) 1 (0) 2 (0) 3 (2) 1 (1)

Bytes 8 1 14 (9+6) 15 (1+9+5) 5

Easy method

Tran

sfer

s

Adopted method

10 – April - 14 A.Beche – Federated Workshop 11

Page 12: XRootD Monitoring Report - indico.fnal.gov · 0. 5. 10. 15. 20. 25. 30. 35. 40. 45 # sites. Number of sites reporting . XRootD federation monitoring Activity started during summer

Visualization Interface

10 – April - 14 A.Beche – Federated Workshop 12

Page 13: XRootD Monitoring Report - indico.fnal.gov · 0. 5. 10. 15. 20. 25. 30. 35. 40. 45 # sites. Number of sites reporting . XRootD federation monitoring Activity started during summer

Pre-defined set of views

10 – April - 14 A.Beche – Federated Workshop 13

Page 14: XRootD Monitoring Report - indico.fnal.gov · 0. 5. 10. 15. 20. 25. 30. 35. 40. 45 # sites. Number of sites reporting . XRootD federation monitoring Activity started during summer

Use case example Understand site access patterns

1. Which sites are reading from FNAL

2. Zoom to a specific site to understand which users are reading

3. Understand which files are read by a user

1

2 3

2

10 – April - 14 A.Beche – Federated Workshop 14

Page 15: XRootD Monitoring Report - indico.fnal.gov · 0. 5. 10. 15. 20. 25. 30. 35. 40. 45 # sites. Number of sites reporting . XRootD federation monitoring Activity started during summer

Data popularity XRootD monitoring provides information

about file access patterns: Including non official collections (ie: user files) Contribute to simplify and make more efficient the

usage of disk resources

Popularity data analytics built on this information: Adopted already for CMS-EOS will be extended to full AAA

10 – April - 14 A.Beche – Federated Workshop 15

Page 16: XRootD Monitoring Report - indico.fnal.gov · 0. 5. 10. 15. 20. 25. 30. 35. 40. 45 # sites. Number of sites reporting . XRootD federation monitoring Activity started during summer

Archive recommendation for CMS-EOS Help to manage the disk space of EOS including user space

No central bookkeeping system

Unused files: created > 4 months ago, no access in the last 3 months: ~500 TB of space occupied and not used <=> 30% of total for these areas

10 – April - 14 A.Beche – Federated Workshop 16

% TB

Page 17: XRootD Monitoring Report - indico.fnal.gov · 0. 5. 10. 15. 20. 25. 30. 35. 40. 45 # sites. Number of sites reporting . XRootD federation monitoring Activity started during summer

Open issues Missing servers: Dcache sites

Server should provide their site name. CMS: only 5 sites:

anon, BUDAPEST, Hephy-Vienna, T2_US_USCD, UKI-LT2-Brunel Not coherent convention naming

ATLAS: GLED RPM to be deployed

GLED Collector improvements: Reliability of the service:

Recover time, can be long due to time difference GLED should be operated as a production service

Scalability: to be fixed with automatic reconnection soon

10 – April - 14 A.Beche – Federated Workshop 17

Page 18: XRootD Monitoring Report - indico.fnal.gov · 0. 5. 10. 15. 20. 25. 30. 35. 40. 45 # sites. Number of sites reporting . XRootD federation monitoring Activity started during summer

Future work Strong requirement from ATLAS to

understand efficiency: Need the concept of error / failure How XRootD server could be instrumented to report it?

European GLED collector is up and running: Only 1 pilot site is reporting to it (CNAF) Should we keep it?

Data mining activity (not started yet): Almost 2 years of raw data (1TB)

10 – April - 14 A.Beche – Federated Workshop 18

Page 19: XRootD Monitoring Report - indico.fnal.gov · 0. 5. 10. 15. 20. 25. 30. 35. 40. 45 # sites. Number of sites reporting . XRootD federation monitoring Activity started during summer

Data Mining Extract further knowledge from the data… Detect inefficiencies Propose deletion strategies Define data placement

… by Understand access patterns and data usage Correlate data traffic and data access performance

Possibility to automate some operations

10 – April - 14 A.Beche – Federated Workshop 19

Page 20: XRootD Monitoring Report - indico.fnal.gov · 0. 5. 10. 15. 20. 25. 30. 35. 40. 45 # sites. Number of sites reporting . XRootD federation monitoring Activity started during summer

Application usage

20

10

30

15

FAX AAA

10 – April - 14 A.Beche – Federated Workshop 20

Page 21: XRootD Monitoring Report - indico.fnal.gov · 0. 5. 10. 15. 20. 25. 30. 35. 40. 45 # sites. Number of sites reporting . XRootD federation monitoring Activity started during summer

Summary Monitoring federations is a challenge

High rate of traffic & information Challenge met by data aggregation, scalable technologies

Dashboard is not actively used

Less than 10 daily users (FAX), less than 15 (AAA) Is there any missing functionalities?

Improvement work is ongoing

New requests are coming

XRootD monitoring is a one piece of the entire Data transfers puzzle See next talk

10 – April - 14 A.Beche – Federated Workshop 21

Page 22: XRootD Monitoring Report - indico.fnal.gov · 0. 5. 10. 15. 20. 25. 30. 35. 40. 45 # sites. Number of sites reporting . XRootD federation monitoring Activity started during summer

Beyond XRootD monitoring A.Beche

D.Giordano

Page 23: XRootD Monitoring Report - indico.fnal.gov · 0. 5. 10. 15. 20. 25. 30. 35. 40. 45 # sites. Number of sites reporting . XRootD federation monitoring Activity started during summer

Outlines Talk 1: XRootD Monitoring Dashboard Context Dataflow and deployment model Database: storage & aggregation User interface & use cases Open issues & future work Summary

Talk 2: Beyond XRootD monitoring HTTP/WebDAV integration Integration in the WLCG Transfers Dashboard

10 – April - 14 A.Beche – Federated Workshop 23

Page 24: XRootD Monitoring Report - indico.fnal.gov · 0. 5. 10. 15. 20. 25. 30. 35. 40. 45 # sites. Number of sites reporting . XRootD federation monitoring Activity started during summer

HTTP Federation is coming

HTTP protocol will be used in the future XRootD servers can be accessed See Fabrizio’s presentation on xrdhttp

Two kind of accesses: Pure HTTP access (through Apache) HTTP gate to XRootD server

Can’t be monitor in the same way

10 – April - 14 A.Beche – Federated Workshop 24

Page 25: XRootD Monitoring Report - indico.fnal.gov · 0. 5. 10. 15. 20. 25. 30. 35. 40. 45 # sites. Number of sites reporting . XRootD federation monitoring Activity started during summer

Monitoring XRootD access protocol

XRootD 4 will now reports the user protocol: All the monitoring chain needs to be updated Dashboard DB and UI are fully ready

HTTP

XRootD

10 – April - 14 A.Beche – Federated Workshop 25

Page 26: XRootD Monitoring Report - indico.fnal.gov · 0. 5. 10. 15. 20. 25. 30. 35. 40. 45 # sites. Number of sites reporting . XRootD federation monitoring Activity started during summer

Site

GLED collector

ActiveMQ

JOB

XRootD Federation

XRoo

tD

Site

SE

HTTP/WebDAV federation monitoring

10 – April - 14 A.Beche – Federated Workshop 26

Page 27: XRootD Monitoring Report - indico.fnal.gov · 0. 5. 10. 15. 20. 25. 30. 35. 40. 45 # sites. Number of sites reporting . XRootD federation monitoring Activity started during summer

Site

GLED collector

ActiveMQ

JOB

XRootD Federation

XRoo

tD

Site

SE

HTTP Federation

Site

HTTP/WebDAV federation monitoring

10 – April - 14 A.Beche – Federated Workshop 27

Page 28: XRootD Monitoring Report - indico.fnal.gov · 0. 5. 10. 15. 20. 25. 30. 35. 40. 45 # sites. Number of sites reporting . XRootD federation monitoring Activity started during summer

Site

28

GLED collector

ActiveMQ

JOB

JOB

XRootD Federation HTTP Federation

XRoo

tD Xrd

HTTP

Site Site

SE

29 November 2013 Alexandre Beche - ITTF

HTTP/WebDAV federation monitoring

Page 29: XRootD Monitoring Report - indico.fnal.gov · 0. 5. 10. 15. 20. 25. 30. 35. 40. 45 # sites. Number of sites reporting . XRootD federation monitoring Activity started during summer

Site

GLED collector

ActiveMQ

JOB

JOB

JOB

XRootD Federation HTTP Federation

XRoo

tD Xrd

HTTP

Apache

Site Site

SE

HTTP/WebDAV federation monitoring

10 – April - 14 A.Beche – Federated Workshop 29

Page 30: XRootD Monitoring Report - indico.fnal.gov · 0. 5. 10. 15. 20. 25. 30. 35. 40. 45 # sites. Number of sites reporting . XRootD federation monitoring Activity started during summer

Site

GLED collector

ActiveMQ

JOB

JOB

JOB

XRootD Federation HTTP Federation

XRoo

tD Xrd

HTTP

Apache

Site Site

SE

?

HTTP/WebDAV federation monitoring

10 – April - 14 A.Beche – Federated Workshop 30

Page 31: XRootD Monitoring Report - indico.fnal.gov · 0. 5. 10. 15. 20. 25. 30. 35. 40. 45 # sites. Number of sites reporting . XRootD federation monitoring Activity started during summer

How to compare data from different applications?

10 – April - 14 A.Beche – Federated Workshop 31

Page 32: XRootD Monitoring Report - indico.fnal.gov · 0. 5. 10. 15. 20. 25. 30. 35. 40. 45 # sites. Number of sites reporting . XRootD federation monitoring Activity started during summer

data transfers & accesses monitoring tools

WEB API / UI

WEB API/UI

WEB API/UI

WLCG FAX AAA

FAX

EO

S

AAA

EO

S

FTS

10 – April - 14 A.Beche – Federated Workshop 32

Page 33: XRootD Monitoring Report - indico.fnal.gov · 0. 5. 10. 15. 20. 25. 30. 35. 40. 45 # sites. Number of sites reporting . XRootD federation monitoring Activity started during summer

WLCG Transfers Dashboard federated approach

WEB API / UI

WEB API/UI

WEB API/UI

FTS FAX AAA FA

X

EO

S

AAA

EO

S

FTS

WLCG Transfers Dashboard API / UI

10 – April - 14 A.Beche – Federated Workshop 33

Page 34: XRootD Monitoring Report - indico.fnal.gov · 0. 5. 10. 15. 20. 25. 30. 35. 40. 45 # sites. Number of sites reporting . XRootD federation monitoring Activity started during summer

Some plots

10 – April - 14 A.Beche – Federated Workshop 34

FTS

XRootD

ALTAS

CMS

LHCb

ALICE

Page 35: XRootD Monitoring Report - indico.fnal.gov · 0. 5. 10. 15. 20. 25. 30. 35. 40. 45 # sites. Number of sites reporting . XRootD federation monitoring Activity started during summer

Summary

Lots of effort has been put in XRootD monitoring workflow and dashboard in the last 2 years Reliable system achieved Lots of use cases covered

HTTP Monitoring already started Will require a lot of effort to reach XRootD monitoring level

New WLCG Transfers Dashboard architecture Highly extensible system Cross-VO or cross-technology analysis

10 – April - 14 A.Beche – Federated Workshop 35

Page 36: XRootD Monitoring Report - indico.fnal.gov · 0. 5. 10. 15. 20. 25. 30. 35. 40. 45 # sites. Number of sites reporting . XRootD federation monitoring Activity started during summer

Credits Andreeva Julia Cons Lionel Giordano Domenico Saiz Pablo Tadel Matevz Tuckett David Vukotic Ilija The AAA and FAX deployment team ….

10 – April - 14 A.Beche – Federated Workshop 36

Page 37: XRootD Monitoring Report - indico.fnal.gov · 0. 5. 10. 15. 20. 25. 30. 35. 40. 45 # sites. Number of sites reporting . XRootD federation monitoring Activity started during summer

Useful links AAA Dashboard

http://dashb-cms-xrootd-transfers.cern.ch FAX Dashboard:

http://dashb-atlas-xrootd-transfers.cern.ch CHEP materials

https://indico.cern.ch/abstractDisplay.py?abstractId=101&confId=214784 https://indico.cern.ch/getFile.py/access?contribId=94&sessionId=6&resId=0&materialId=slide

s&confId=214784 https://indico.cern.ch/getFile.py/access?contribId=265&sessionId=6&resId=1&materialId=slid

es&confId=214784

Xbrowse framework: https://twiki.cern.ch/twiki/bin/view/ArdaGrid/XbrowseFramework

Thanks for your attention

10 – April - 14 A.Beche – Federated Workshop 37