an agent based, dynamic service system to monitor, control ... · an agent based, dynamic service...

40
Iosif Iosif Legrand Legrand California Institute of Technology An Agent Based, Dynamic Service System to Monitor, An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed Systems Control and Optimize Distributed Systems June 2007

Upload: others

Post on 15-Mar-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An Agent Based, Dynamic Service System to Monitor, Control ... · An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed Systems ¾MonALISA is a Dynamic,

Iosif Legrand June 20071

IosifIosif LegrandLegrandCalifornia Institute of Technology

An Agent Based, Dynamic Service System to Monitor,An Agent Based, Dynamic Service System to Monitor,Control and Optimize Distributed SystemsControl and Optimize Distributed Systems

June 2007

Page 2: An Agent Based, Dynamic Service System to Monitor, Control ... · An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed Systems ¾MonALISA is a Dynamic,

Iosif Legrand June 200722

The The MonALISAMonALISA FrameworkFramework

An Agent Based, Dynamic Service System to Monitor,An Agent Based, Dynamic Service System to Monitor,Control and Optimize Distributed SystemsControl and Optimize Distributed Systems

MonALISA is a Dynamic, Distributed Service System capable to collect any type of information from different systems, to analyze it in near real time and to provide support for automated control decisions and global optimization of workflows in complex grid systems. The MonALISA system is designed as an ensemble of autonomous multi-threaded, self-describing agent-based subsystems which are registered as dynamic services, and are able to collaborate and cooperate in performing a wide range of monitoring tasks. These agents can analyze and process the information, in a distributed way, and to provide optimization decisions in large scale distributed applications.An agent-based architecture provides the ability to invest the system with increasing degrees of intelligence; to reduce complexity and make global systems manageable in real time. For an effective use of distributed resources, these services provide adaptability and self-organization

Page 3: An Agent Based, Dynamic Service System to Monitor, Control ... · An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed Systems ¾MonALISA is a Dynamic,

Iosif Legrand June 20073

The The MonALISAMonALISA ArchitectureArchitecture

3

Regional or Global High Level Regional or Global High Level Services, Services, Repositories & ClientsRepositories & Clients

Secure and reliable communicationSecure and reliable communicationDynamic load balancing Dynamic load balancing Scalability & ReplicationScalability & ReplicationAAA for ClientsAAA for Clients

Distributed Dynamic Distributed Dynamic Registration and DiscoveryRegistration and Discovery--based on a lease based on a lease mechanism and remote eventsmechanism and remote events

JINI-Lookup Services Secure & Public

MonALISA services

Proxies

HL services

Agents

Network of

Distributed System for gathering and Distributed System for gathering and analyzing information based on analyzing information based on mobile agents: mobile agents: Customized aggregation, Triggers,Customized aggregation, Triggers,ActionsActions

Fully Distributed System with no Single Point of Failure

Page 4: An Agent Based, Dynamic Service System to Monitor, Control ... · An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed Systems ¾MonALISA is a Dynamic,

Iosif Legrand June 20074

MonALISAMonALISA service & Data Handlingservice & Data Handling

4

Data Store

Data CacheService & DB

Configuration Control (SSL)

Predicates & Agents

Data (via ML Proxy)

Applications Clients or Higher Level

Services

WS Clients andservice

WebService

WSDLSOAP

LookupService

LookupService

RegistrationDiscovery

Postgres

AGENTSAGENTSFILTERS / TRIGGERSFILTERS / TRIGGERS

Monitoring ModulesMonitoring ModulesCollects any type of information Dynamic Loading

Push and Pull

Page 5: An Agent Based, Dynamic Service System to Monitor, Control ... · An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed Systems ¾MonALISA is a Dynamic,

Iosif Legrand June 20075

Monitoring Grid sites, Running Jobs, Monitoring Grid sites, Running Jobs, Network Traffic, and ConnectivityNetwork Traffic, and Connectivity

5

TOPOLOGY

JOBS

ACCOUNTING

Running Jobs

Page 6: An Agent Based, Dynamic Service System to Monitor, Control ... · An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed Systems ¾MonALISA is a Dynamic,

Iosif Legrand June 20076

ApMon – Application Monitoring

0

10

20

30

40

50

60

70

0 1000 2000 3000 4000 5000 6000

Messages per second

Mon

ALI

SA C

PU U

sage

(%)

6

MonALISAService

MonALISAService

ApMon

ApMon

APPLICATION

APPLICATION

MonitoringData

UDP/XDR

Mbps_out: 0.52Status: reading

App. Monitoring

MB_inout: 562.4

ApMonConfig

parameter1: valueparameter2: value

App. Monitoring

...

Time;IP;procIDMonitoring

Data

UDP/XDR

MonitoringData

UDP/XDR

load1: 0.24processes: 97

System Monitoring

pages_in: 83

MonALISA

hostsConfig Servlet dynamic

reloading

ApMon configuration generated automatically by a servlet / CGI script

No Lost Packages

Lightweight library of APIs (C, C++, Java, Perl, Python) that can be used to send any information to MonALISA ServicesHigh comm. performance FlexibleComplete Sys. Monitoring

Page 7: An Agent Based, Dynamic Service System to Monitor, Control ... · An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed Systems ¾MonALISA is a Dynamic,

Iosif Legrand June 20077

Monitoring the Execution of JobsMonitoring the Execution of Jobsand the Time Evolutionand the Time Evolution

7

SPLIT JOBSSPLIT JOBS

LIFELINES for JOBS

Job Job

Job1

Job2

Job3Job31

Job32

Summit a Job

DAG

Page 8: An Agent Based, Dynamic Service System to Monitor, Control ... · An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed Systems ¾MonALISA is a Dynamic,

Iosif Legrand June 200788

End User / Client AgentLISALISA-- LocalhostLocalhost Information Service AgentInformation Service Agent

AuthorizationService discoveryLocal detection of the hardware and software configurationComplete end-system monitoring: Per-process load, I/O and network throughputs, End-to-end performance measurements Will act as an active listener for all events related with the requests generated by its local applications.Can execute agents to re-configure the system and rollback

Page 9: An Agent Based, Dynamic Service System to Monitor, Control ... · An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed Systems ¾MonALISA is a Dynamic,

Iosif Legrand June 20079

LISA – Network Monitoring

Network monitoring module – monitors the network performance of the local workstation (network interfaces, traffic patterns, TCP/IP running stack parameters) and it also able to configure TCP/IP parameters for optimizing the networking performances. It can use IPERF, WEB 100 and other network monitoring tools.

Page 10: An Agent Based, Dynamic Service System to Monitor, Control ... · An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed Systems ¾MonALISA is a Dynamic,

Iosif Legrand June 200710

LISA- Provides an Efficient Integration for Distributed Systems and Applications

LISALookupService

LookupService

Discovery

Registration

Best Service

MonALISA

Application Service

MonALISA

Application Service

MonALISA

Application Service

MonALISA

Application Service

It is using external services to identify the real IP of the end system, its network ID and ASDiscovers MonALISA services and can select, based on service attributes, different applications and their parameters (location, AS, functionality, load … )

Based on information such as AS number or location, it determines a list with the best possible services. Registers as a listener for other service attributes (eg. number of connected clients). Continuously monitors the network connection with several selected services and provides the best one to be used from the client’s perspective. Measures network quality, detects faults and informs upper layer services to take appropriate decisions

Page 11: An Agent Based, Dynamic Service System to Monitor, Control ... · An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed Systems ¾MonALISA is a Dynamic,

Iosif Legrand June 200711

Monitoring Internet2 backbone NetworkMonitoring Internet2 backbone Network

11

Test for a Land Speed Record Test for a Land Speed Record ~ 7 ~ 7 Gb/sGb/s in a single TCP stream in a single TCP stream from Geneva to Caltechfrom Geneva to Caltech

Page 12: An Agent Based, Dynamic Service System to Monitor, Control ... · An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed Systems ¾MonALISA is a Dynamic,

Iosif Legrand June 200712

Monitoring USLHCnet

Operations & management assisted by agentOperations & management assisted by agent--based softwarebased softwareUsed on the new CIENA equipment used for network Used on the new CIENA equipment used for network managmentmanagment

Page 13: An Agent Based, Dynamic Service System to Monitor, Control ... · An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed Systems ¾MonALISA is a Dynamic,

Iosif Legrand June 200713

USLHCNet Integrated Traffic

Page 14: An Agent Based, Dynamic Service System to Monitor, Control ... · An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed Systems ¾MonALISA is a Dynamic,

Iosif Legrand June 200714

The UltraLight Network

BNL ESnet IN /OUT

Page 15: An Agent Based, Dynamic Service System to Monitor, Control ... · An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed Systems ¾MonALISA is a Dynamic,

Iosif Legrand June 200715

Monitoring The GLORIAD Ring

Page 16: An Agent Based, Dynamic Service System to Monitor, Control ... · An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed Systems ¾MonALISA is a Dynamic,

Iosif Legrand June 200716

Monitoring Network Topology Monitoring Network Topology Latency, RoutersLatency, Routers

16

NETWORKS

AS

ROUTERS

Page 17: An Agent Based, Dynamic Service System to Monitor, Control ... · An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed Systems ¾MonALISA is a Dynamic,

Iosif Legrand June 200717

Available Bandwidth MeasurementsAvailable Bandwidth MeasurementsEmbedded Embedded PathloadPathload module.module.

17

Page 18: An Agent Based, Dynamic Service System to Monitor, Control ... · An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed Systems ¾MonALISA is a Dynamic,

Iosif Legrand June 200718

Monitoring and Controlling Optical Switches

18

Port power monitoring

Controlling

Page 19: An Agent Based, Dynamic Service System to Monitor, Control ... · An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed Systems ¾MonALISA is a Dynamic,

Iosif Legrand June 200719

Monitoring Optical Switches

Page 20: An Agent Based, Dynamic Service System to Monitor, Control ... · An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed Systems ¾MonALISA is a Dynamic,

Iosif Legrand June 200720

ALICE : Job status & traffic - real-time map

20

Page 21: An Agent Based, Dynamic Service System to Monitor, Control ... · An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed Systems ¾MonALISA is a Dynamic,

Iosif Legrand June 200721

Local and Global Decision Framework

Based on monitoring information, Based on monitoring information, actions can be taken inactions can be taken in

ML ServiceML ServiceML Global Services / RepositoryML Global Services / Repository

Actions can be triggered byActions can be triggered byValues above/below given Values above/below given thresholdsthresholdsAbsence/presence of valuesAbsence/presence of valuesCorrelation between multiple valuesCorrelation between multiple values

Decisions typesDecisions typesAlertsAlerts

oo ee--mailmailoo Instant messagingInstant messagingoo RSS FeedsRSS Feeds

External commandsExternal commandsEvent loggingEvent loggingGlobal Optimization Services Global Optimization Services

21

Global ML

Services

ML Service

ML Service

Actions based onglobal information

Actions based onlocal information

• Traffic• Connectivity• Jobs• Hosts• Apps

• Temperature• Humidity• A/C Power• …

Sensors Local decisions

Global decisions

Global ML

Services

Page 22: An Agent Based, Dynamic Service System to Monitor, Control ... · An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed Systems ¾MonALISA is a Dynamic,

Iosif Legrand June 200722

Alerts and actionsAlerts and actions

22

MySQL daemon is automatically restartedwhen it runs out of memoryTrigger: threshold on VSZ memory usage

ALICE Production jobs queue is automaticallykept full by the automatic resubmissionTrigger: threshold on the number of aliprod waiting jobs

Administrators are kept up-to-date on the services’ statusTrigger: presence/absence of monitored information

Page 23: An Agent Based, Dynamic Service System to Monitor, Control ... · An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed Systems ¾MonALISA is a Dynamic,

Iosif Legrand June 200723

Monitoring Video Conference System: Reflectors and Communication Topology

Page 24: An Agent Based, Dynamic Service System to Monitor, Control ... · An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed Systems ¾MonALISA is a Dynamic,

Iosif Legrand June 200724

Creating a Dynamic, Global, Minimum Spanning Tree to optimize the connectivity

∑∈

=Tuv

uvwTw),(

)),(()(

A weighted connected graph G = (V,E) with nvertices and m edges. The quality of connectivity between any two reflectors is measured every 2s.Building in near real time a minimum-spanning tree T

Page 25: An Agent Based, Dynamic Service System to Monitor, Control ... · An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed Systems ¾MonALISA is a Dynamic,

Iosif Legrand June 200725

EVO: LISA Detects the Best Reflector for each Client and MonALISA Agents keep the reflectors connected in a MST

Dynamic Discovery of Dynamic Discovery of Reflectors Reflectors Creates and maintains, Creates and maintains, in realin real--time, the optimal time, the optimal connectivity between connectivity between reflectors (MST) based reflectors (MST) based on periodic network on periodic network measurements. measurements. Detects and monitor the Detects and monitor the User configuration, its User configuration, its hardware, the hardware, the connectivity and its connectivity and its performance.performance.Dynamically connects Dynamically connects the client to the best the client to the best reflector reflector Provides secure Provides secure administration. administration. It is using alarm triggers It is using alarm triggers to notify unexpected to notify unexpected events events

Page 26: An Agent Based, Dynamic Service System to Monitor, Control ... · An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed Systems ¾MonALISA is a Dynamic,

Iosif Legrand June 200726

“On-Demand”, Dynamic Path Allocation

26

Internet

A

>FDT A/fileX B/path/

OS path availableConfiguring interfacesStarting Data Transfer

Monitor

Control

TL1

Optical Switch

MonALISAService

MonALISA Distributed Service System

BOSAgent

Active light path

Regula

r IP pa

thReal time monitoring

APPLICATION

LISA AGENTLISA sets up

- Network Interfaces- TCP stack- Kernel parameters- Routes

LISA APPLICATION“use eth1.2, …”

LISALISAAgentAgent

DATA

CREATES AN END TO END PATH < 1s

Detects errors and automatically recreate theDetects errors and automatically recreate thepath in less than the TCP timeout path in less than the TCP timeout

Page 27: An Agent Based, Dynamic Service System to Monitor, Control ... · An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed Systems ¾MonALISA is a Dynamic,

Iosif Legrand June 20072727

FDT – Fast Data Transfer

FDT is an application for efficient data transfers.Easy to use. Written in java and runs on all major platforms. It is based on an asynchronous, multithreaded system which is using the NIO library and is able to:

stream continuously a list of files use independent threads to read and write on each physical devicetransfer data in parallel on multiple TCP streams, when necessaryuse appropriate size of buffers for disk IO and networking resume a file transfer session

Page 28: An Agent Based, Dynamic Service System to Monitor, Control ... · An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed Systems ¾MonALISA is a Dynamic,

Iosif Legrand June 20072828

FDT – Fast Data Transfer

Pool of buffers Kernel Space

Pool of buffers Kernel Space

Data Transfer Sockets / Channels

Independent threads per device

Restore the files frombuffers

Control connection / authorization

Page 29: An Agent Based, Dynamic Service System to Monitor, Control ... · An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed Systems ¾MonALISA is a Dynamic,

Iosif Legrand June 20072929

FDT – Memory to Memory Tests in WAN

CPUs Dual Core Intel Xenon @ 3.00 GHz, 4 GB RAM, 4 x 320 GB SATA Disks Connected with 10Gb/s Myricom

Page 30: An Agent Based, Dynamic Service System to Monitor, Control ... · An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed Systems ¾MonALISA is a Dynamic,

Iosif Legrand June 20073030

FDT – Memory to Memory Tests in WAN

C1-NY -> C1-GVA

Page 31: An Agent Based, Dynamic Service System to Monitor, Control ... · An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed Systems ¾MonALISA is a Dynamic,

Iosif Legrand June 20073131

FDT – Memory to Memory Tests in WAN

C1-NY <-> C1-GVA

Page 32: An Agent Based, Dynamic Service System to Monitor, Control ... · An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed Systems ¾MonALISA is a Dynamic,

Iosif Legrand June 200732

Memory-to-memory in transfers in WAN in both directions with two pairs of servers:

C1-NY -> C1-GVA

C2-NY <- C2-GVA

Page 33: An Agent Based, Dynamic Service System to Monitor, Control ... · An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed Systems ¾MonALISA is a Dynamic,

Iosif Legrand June 200733

Disk -to- Disk transfers in WAN

Page_INC1-NY -> C1-GVA

Read and writes on 4 SATA disks in parallel on each server

Mean traffic ~ 210 MB/s~ 0.75 TB per hour

MB

/s

CERN ->CALTECH

Read and writes on 2 RAID Controllers in parallel on each server

Mean traffic ~ 545 MB/s~ 2 TB per hour

Page 34: An Agent Based, Dynamic Service System to Monitor, Control ... · An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed Systems ¾MonALISA is a Dynamic,

Iosif Legrand June 200734

CERNGeneva

CALTECHPasadena

Starlight

Manlan

USLHCnet

Internet2

Controlling Optical Planes Automatic Path Recovery

“Fiber cut” simulationsThe traffic moves from one transatlantic line to the other oneFDT transfer (CERN – CALTECH) continues uninterruptedTCP fully recovers in ~ 20s

12

34

FDT Transfer

4 Fiber cuts simulations

Page 35: An Agent Based, Dynamic Service System to Monitor, Control ... · An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed Systems ¾MonALISA is a Dynamic,

Iosif Legrand June 200735

Bandwidth Challenge at SC2005

151 Gbs

~ 500 TB Total in 4h

Page 36: An Agent Based, Dynamic Service System to Monitor, Control ... · An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed Systems ¾MonALISA is a Dynamic,

Iosif Legrand June 200736

FDT Used at SC 2006Entire management was done with LISA & MonALISA

Page 37: An Agent Based, Dynamic Service System to Monitor, Control ... · An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed Systems ¾MonALISA is a Dynamic,

Iosif Legrand June 200737

Official BWCOfficial BWC Hyper BWCHyper BWC

SC2006

Page 38: An Agent Based, Dynamic Service System to Monitor, Control ... · An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed Systems ¾MonALISA is a Dynamic,

Iosif Legrand June 200738

Data Collection and Interfacing with Other Tools

MonALISA is interfaced with many monitoring tools and is capable to collect information from different applications:Computing Nodes / Farms (system information , network traffic… )

SNMP, Ganglia, dedicated scripts Routers , Switches , Optical Switches

SNMP, NetFlow, SFlow, TL1, MRTG, WSEnd to End Network performance

Pathload, IPERF, Pipes, Abing, ABping …Batch Queuing Systems

LSF, PBS, Condor, NQS, Grid Job Manager Applications

Root, Xrootd, CRAB, RRD, VRVS /EVO, …

Page 39: An Agent Based, Dynamic Service System to Monitor, Control ... · An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed Systems ¾MonALISA is a Dynamic,

Iosif Legrand June 200739

Communities using Communities using MonALISAMonALISA

39

Major CommunitiesALICEOSGCMSSTARVRVS LGC RUSSIA SE Europe GRID APAC Grid UNAM Grid (Mx)ITU

ABILENE ULTRALIGHT GLORIADLHC Net RoEduNETEnlightened

--

VRVSVRVSALICE

ABILENABILENEE

VRVSVRVS

OSGOSG

Demonstrated at:

Telecom World

WSIS 2003

SC 2004

Internet2 2005

TERENA 2005

IGrid 2005

SC 2005

CHEP 2006

CENIC 2006 Innovation Award for High-Performance Applications

MonALISA TodayRunning 24 X 7

at ~340 SitesCollecting ~ 1 000 000parameters in near real-time Update rate of 20,000 parameter updates per second Monitoring

12,000 computers> 100 WAN Links

Thousands of Grid jobs running concurrently

Page 40: An Agent Based, Dynamic Service System to Monitor, Control ... · An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed Systems ¾MonALISA is a Dynamic,

Iosif Legrand June 20074040

The MonALISA Architecture Provides:Distributed Distributed Registration and DiscoveryRegistration and Discovery for Services and Applications. for Services and Applications. Monitoring all aspects of complex systems :Monitoring all aspects of complex systems :

System information for computer nodes and clusters System information for computer nodes and clusters Network monitoring, topology, end to end performanceNetwork monitoring, topology, end to end performanceMonitoring the performance of Applications, Jobs or services Monitoring the performance of Applications, Jobs or services The End User Systems, its performance The End User Systems, its performance Environment; Video streaming Environment; Video streaming

Can Can interact with any other servicesinteract with any other services to provide in near realto provide in near real--time customized time customized information based on monitoring datainformation based on monitoring dataSecure, remote Secure, remote administrationadministration for services and applications for services and applications Agents to supervise applicationsAgents to supervise applications, trigger alarms, restart or reconfigure them, , trigger alarms, restart or reconfigure them, and to notify other services when certain conditions are detecand to notify other services when certain conditions are detected.ted.The The MonALISAMonALISA framework is used framework is used to develop higher level decision servicesto develop higher level decision services, , implemented as a distributed network of communicating agents, toimplemented as a distributed network of communicating agents, to perform perform global optimization tasks. global optimization tasks. Graphical User InterfacesGraphical User Interfaces to visualize complex informationto visualize complex information

http://monalisa.caltech.edu