performance of the relational grid monitoring architecture (r-gma) cms data challenges. the nature...

58
Performance of the Relational Grid Monitoring Architecture (R-GMA) • CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance test description Performance test results • Conclusions

Upload: elinor-stanley

Post on 16-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

Performance of the Relational Grid Monitoring Architecture (R-GMA)

• CMS data challenges. The nature of the problem.

• What is GMA ?• And what is R-GMA ?• Performance test description• Performance test results• Conclusions

Page 2: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

The Nature of the problem

• As part of the preparations for data taking CMS is performing DATA CHALLENGES.

• Large number of simulated events to optimise detectors and prepare software• Enormous processing requirements

BUTeach event is independent of all the others

each event can be generated on a machine without any interaction with any other

Page 3: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

The local solution

Work split between farms.

How to handle the book-keeping ?

a data-base automaticallyupdated

Implemented via a job wrapper BOSSOutput to <stdout> and <stderr> is intercepted and the information is recorded in a mySQL production database.Event generation and job accounting decoupled

Page 4: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

The local solution (schematic)

DatabaseMachine

SubmissionMachine

UI

WorkerNode (WN) WN

WN

WN

WNWN

WN

WN

WN

Page 5: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

The grid solution (schematic)

DatabaseMachine

SubmissionMachine

UI

Page 6: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

Grid Monitoring Architecture (GMA) of the GGF

Producer

Consumer

Registry (Directory services)

register producer

locate producer

address of producer

data

data

data

data

Ask fordata

data

data

data

data

Page 7: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

R-GMA (Relational GMA)

Developed for E(uropean) D(ata) G(rid)

Extends the GMA in two important ways

1. Introduces a time stamp on the data.

2. A relational implementation

3. Hides the registry behind the API

Can be used for information and monitoring

Each Virtual Organisation appears to have one RDBMS

Page 8: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

The syntax of R-GMA

The user interface to R-GMA is via SQL statements(not all SQL statements and structures are supported)

Information is advertised via a table createInformation is published via insertInformation is read via select … from table

The first read request registers the consumer as interested in this data.

Relational queries are supported

NOTE : sql is the interface – it should not be supposed an actual database lies behind it.

Page 9: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

Fit between R-GMA and BOSS

R-GMA can be dropped into the framework with very little disruption

1. Set up calls for mySQL are replaced by those for R-GMA producers

2. An archiver (joint consumer/producer) runs on a single machine which collects the data from all the running jobs and writes it to a local database (and possible republishes it).

The data can then be queried either by direct mySQL calls or via R-GMA consumer (a distributed database has been created)

Page 10: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

DatabaseBOSS

LAN Connection

R-GMAR-GMAR-GMAR-GMAR-GMAR-GMAR-GMAR-GMA

WAN Connection

Fit between R-GMA and BOSS (i)

Page 11: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

R-GMA Measurements

• The architecture of GMA clearly provides a putative solution to the wide area monitoring problem.

BUTDoes a specific implementation provide a

practical solution

Before entrusting CMS production to R-GMA, we must be confident that it will perform.

What load will it fail at and why ?

Page 12: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

Message time distribution from 44 jobs

<Message length> 35 chars.

Page 13: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

Simulation of a CMS job

Multi-threaded jobeach thread produces messages. Length 35 chars, suitable distribution.

Threads starting time distribution can be altered.One machine delivers the R-GMA load of a farm.

R-GMA servlet

R-GMAconsumer

Page 14: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

Simulation of the CMS Grid

One machine per grid cluster providing loads of greater than the cluster

R-GMAconsumer

R-GMA servlet

R-GMA servlet

R-GMA servlet

R-GMA servlet

Page 15: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

Current status

R-GMA can survive loads of around 20% of the current CMS requirements and does provides a grid method for monitoring. An overload of a factor 2 jobs causes problems after about five minutes running.

We believe these instabilities are soluble.

When production starts in earnest we will compare reality with our model.

Page 16: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

GridICE Server

Installation

16

Page 17: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

Brief Introduction

GridICE:– is a distributed monitoring tool for grid systems

– integrates with local monitoring systems

– offers a web interface for publishing monitoring data at the Grid level

– fully integrated in the LCG-2 Middleware• gridice-clients data collector installation and

configuration for each site ralized by the Yaim scripts.

17

Page 18: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

System Requirements

• Suggested Operating system is Scientific Linux with a minimal installation

• The GridICE server should be installed on a performant machine– PostgreSQL service - RAM intensive demand– Apache web server - RAM-CPU intensive demand

18

Page 19: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

Core Packages & Dependencies

The GridICE server software is composed by three core packages:1. gridice-core

(setup and maintenance scripts / discovery components)2. gridice-www

(web interface scripts and components)3. gridice-plugins

(monitoring scripts)

Plus several dependencies:– Apache http web server– PostgreSQL database server– Nagios monitoring tool– ...

19

Page 20: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

The Four Main Phases of Monitoring

20

Generation

Distributing

Presenting

Pro

cessin

g

Sensors inquiring entities and encoding the

measurements according to a schema

Transmission of the events from the source to any interested parties (data

delivery model: push vs. pull; periodic vs. aperiodic)

Processing and abstract the number of received events in order to enable the consumer to draw conclusions about the

operation of the monitored system

e.g., filtering according to some predefined criteria, or

summarising a group of events

Page 21: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

The GridICE Approach

21

Page 22: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

Generating Events

• Generation of events:– Sensors: typically perl scripts or c programs.– Schema:

• GLUE Schema v.1.1 + GridICE extension.– System related (e.g., CPU load, CPU Type, Memory size).

– Grid service related (e.g., CE ID, queued jobs).

– Network related (e.g., Packet loss).

– Job usage (e.g., CPU Time, Wall Time).

– All sensors are executed in a periodic fashion.

22

Page 23: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

Distributing Events

• Distribution of events:– Hierarchical model.

• Intra-site: by means of the local monitoring service – default choice, LEMON (http://www.cern.ch/lemon).

• Inter-site: by offering data through the Grid Information Service.• Final Consumer: depending on the client application.

– Mixed data delivery model.• Intra-site: depending on the local monitoring service (push for

lemon).• Inter-site: depending on the GIS (current choice, MDS 2.x, pull).• Final consumer: pull (browser/application), push

(publish/subscribe notification service coming on the next release).

23

Page 24: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

Presenting Events

• Data stored in a RDBMS used to build aggregated statistics.

• Data retrieved from the RDBMS are encoded in XML files.

• XSL to XHTML transformations to publish aggregated data in a Web context.

24

Page 25: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

Monitoring a Grid25

Page 26: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

Challenges for Data Collection

• The distribution of monitoring data is strongly characterised by significant requirements

(e.g., Scalability, Heterogeneity, Security, System Health)

• None of the existing tools satisfy all of these requirements

• Grid data collection should be customized depending on what are the needs of your Grid users selected

26

Page 27: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

Challenges for Data Presentation

• Different Grid users are interested in different subset of Grid data and different aggregation levels

• Usability principles should be taken into account to help users finding relevant Grid monitoring information

• A sintetic data aggregation is crucial to permit a drill-down navigation (from the general to te detailed) of the Grid data

27

Page 28: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

Grid Monitoring Architecture (GMA) of the GGF

Producer

Consumer

Registry (Directory services)

register producer

locate producer

address of producer

data

data

data

data

Ask fordata

data

data

data

data

Page 29: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

R-GMA (Relational GMA)

Developed for E(uropean) D(ata) G(rid)

Extends the GMA in two important ways

1. Introduces a time stamp on the data.

2. A relational implementation

3. Hides the registry behind the API

Can be used for information and monitoring

Each Virtual Organisation appears to have one RDBMS

Page 30: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

The user interface to R-GMA is via SQL statements(not all SQL statements and structures are supported)

Information is advertised via a table createInformation is published via insertInformation is read via select … from table

The first read request registers the consumer as interested in this data.

Relational queries are supported

NOTE : sql is the interface – it should not be supposed an actual database lies behind it.

The syntax of R-GMA

Page 31: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

R-GMA can be dropped into the framework with very little disruption

1. Set up calls for mySQL are replaced by those for R-GMA producers

2. An archiver (joint consumer/producer) runs on a single machine which collects the data from all the running jobs and writes it to a local database (and possible republishes it).

The data can then be queried either by direct mySQL calls or via R-GMA consumer (a distributed database has been created)

Fit between R-GMA and BOSS

Page 32: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

DatabaseBOSS

LAN Connection

R-GMAR-GMAR-GMAR-GMAR-GMAR-GMAR-GMAR-GMA

WAN Connection

Fit between R-GMA and BOSS (i)

Page 33: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

How is Ganglia different from Nagios

• Ganglia is architecturally designed to perform efficiently in very large monitoring environments: each Ganglia gmond performs its service checks locally, reporting in at a regular interval to the gmetad. Nagios performs its service checks by polling each device across a network connection and waiting for a response (known as "active checks"), which can be more resource and bandwidth intensive.

• Nagios uses the results of its active checks to determine state by comparing the metrics it polls to thresholds. These state changes can in turn be used to generate notifications and customizable corrective actions. Ganglia, by contrast, has no built-in thresholds, and so does not generate events or notifications.

• The general rule of thumb has been: if you need to monitor a limited number of aspects of a large number of identical devices, use Ganglia; if you want to monitor lots of aspects of a smaller number of different devices, use Nagios. But those distinctions are blurring as Ganglia supports more and more devices, and as Nagios' scalability improves.

04/21/23 T.R.LEKHAA/AP/IT/SNSCE 33

Page 34: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

How is Ganglia different from Nagios

• The problem with ganglia and all the other external web pages we have been looking at is that you have to look at them!

• If all is well with your system you don’t want to have to look.

• This is where Nagios comes in. It can be setup to alert you when something goes wrong, or a value passes a threshold.

04/21/23 T.R.LEKHAA/AP/IT/SNSCE 34

Page 35: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

Metric Relevance

Packet loss Connectivity

Lost packets re-Tx traffic jams Can you connect?

RTT TCP = send-acknowledge protocol delayed acknowledge = delayed traffic

TCP/ UDP Thru'put

Network view? Application view?

Jitter Variation in delay - UDP & multicast (UDP) only

Monitoring: What?

Page 36: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

Monitoring: How(1)?

www.visualisation

Monitor

Node Gridmiddleware

Monitoring Architecture

IperfER

PingER

UDPmon

MiperfER

bbcp/ftp

Tools installed on dedicated &similar node at each centre MESH

Publication

service

30 mins

Page 37: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

Monitoring: How(2)?

Metric Tool Origin RTT packet loss connectivity

PingER Ping SLAC et al

IperfER NCSA's iperf SLAC and UCL

bbcp SLAC

TCP thru'put

bbftp Tool = IN2P3 Monitoring = SLAC

UDP thru'put UDPmon [email protected], EDG

Multicast thru’put packet loss jitter

MiperfER IperfER Manchester Computing

Page 38: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

Network Weather Service

Page 39: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

Introduction

• “NWS provides accurate forecasts of dynamically changing performance characteristics from a distributed set of metacomputing resources”

• What will be the future load (not current load) when a program is executed?

• Producing short-term performance forecasts based on historical performance measurements

• The forecasts can be used by dynamic scheduling agents

Page 40: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

Introduction

• Resource allocation and scheduling decisions must be based on predictions of resource performance during a timeframe

• NWS takes periodic measurements of performance and using numerical models, forecasts resource performance

Page 41: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

NWS Goals

• Components– Persistent state– Name server– Sensors

• Passive (CPU availability)• Active (Network measurements)

– Forecaster

Page 42: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

Architecture

Page 43: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

Architecture

Page 44: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

Performance measurements

• Using sensors• CPU sensors

– Measures CPU availability– Uses

• uptime• vmstat• Active probes

• Network sensors– Measures latency and bandwidth

• Each host maintains– Current data– One-step ahead predictions– Time series of data

Page 45: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

Network Measurements

Page 46: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

Issues with Network Sensors

• Appropriate transfer size for measuring throughput

• Collision of network probes

• Solutions– Tokens and

hierarchical trees with cliques

Page 47: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

Available CPU measurement

Page 48: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

Available CPU measurement

• The formulae shown does not take into account job priorities

• Hence periodically an active probe is run to adjust the estimates

Page 49: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

Predictions

• To generate a forecast, forecaster requests persistent state data

• When a forecast is requested, forecaster makes predictions for existing measurements using different forecast models

• Dynamic choice of forecast models based on the best Mean Absolute Error, Mean Square Prediction Error, Mean Percentage Prediction Error

• Forecasts requested by:– InitForecaster()– RequestForecasts()

• Forecasting methods– Mean-based– Median based– Autoregressive

Page 50: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

Forecasting Methods

Notations:

Prediction Accuracy:

Mean Absolute Error (MAE) is the average of the abovePrediction Method:

Page 51: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

Forecasting Methods – Mean-based

1.

2.

3.

Page 52: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

Forecasting Methods – Mean-based

4.

5.

Page 53: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

Forecasting Methods – Median-based

1.

2.

3.

Page 54: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

Autoregression1.

ai found such that it minimizes the overall error.

ri ,j is the autocorellation function for the series of N measurements.

Page 55: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

Forecasting Methodology

Page 56: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

Forecast Results

Page 57: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

Forecasting Complexity vs Accuracy•Semi Non-parametric Time Series Analysis (SNP) – an accurate but complicated model

•Model fit using iterative search

•Calculation of conditional expected value using conditional probability density

Page 58: Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance

Sensor Control

• Each sensor connects to other sensors and perform measurements O(N2)

• To reduce the time complexity, sensors organized in hierarchy called cliques

• To avoid collisions, tokens are used

• Adaptive control using adaptive token timeouts

• Adaptive time-out discovery and distributed leader election protocol