parmon a comprehensive cluster monitoring system parmon team centre for development of advanced...
TRANSCRIPT
PARMON A Comprehensive Cluster Monitoring
System
PARMON Team
Centre for Development of Advanced
Computing, Bangalore, India
Contact: Rajkumar Buyya ([email protected])
Topics of Discussion
PARMON System Model & Architecture PARMON Server PARMON Client
PARMON Features and Services PARMON Installation and its Usage Monitoring with PARMON PARMON Integration with other products Conclusions and Future Directions
Motivations
Workstation clusters have off late become a cost-effective solution for HPC ? .
C-DAC’s PARAM OpenFrame is a large cluster of more than 40 Ultra-4 workstations interconnected through low-latency, high bandwidth communication networks.
Monitoring such huge systems is a tedious and challenging task since typical workstations are designed to work as a standalone system, rather than a part of workstation clusters.
System administrators require tools to effectively monitor such huge systems. PARMON provides the solution to this challenging problem.
CLUSTER HARDWARE
SOLARIS
Light Weight Protocols
Message Passing InterfacesC-MPI, PVM
SYSTEM MANAGEMENT
TOOLS
Parallel File
systemC-PFS Languages
C, F77, F90,
Development ToolsF90 IDE, DIVIA
APPLICATIONS
C-DAC HPCC Software Architecture
PARMON - Salient Features Online creation of Node and Group database Allows to monitor system activities at Component, Node, Group,
or entire Cluster level monitoring Designed using state-of-the-art Java technology Monitoring of System Components :
CPU, Memory, Disk and Network
Allows to monitor multiple instances of the same componet. Facility for definition of events and automatic notification Miscellaneous facilities : Message broadcast, Invocation of
system management commands (halt, reboot, etc.), System Information & Configuration
PARMON provides GUI interface for initiating activities/request and presents results graphically.
PARMON System Model
PARMONHigh-Speed
Switch
parmond
parmon
PARMON Serveron Solaris Node
PARMON Client on JVM
PARMON Implementation
Server Multithreaded using POSIX and Solaris Developed using C as it need to access system internals It is a stateless server
Client Developed using Java Java features are extensively used.. New Window is created for each client request, which interacts
with server Threads are used extensively to while creating online resource
utilization meters Dynamically configures with changes to node date base.
Setting up of PARMON
Server installation & invocation Binding to port Rights (requires root permission for full functionality) parmond or parmond <port-no>
(either at boot time or on-line) Needs to be loaded on all nodes to be monitored
Client installation & invocation Java based client (client machine can be PC/workstation
supporting JVM) CLASSPATH (pointing to classes.zip, parmon.jar) jar file (parmon.jar) java parmon or java parmon <port-no>
Setting up of PARMON
Server installation & invocation Binding to port Rights (requires root permission for full functionality) parmond or parmond <port-no>
(either at boot time or on-line) Needs to be loaded on all nodes to be monitored
Client installation & invocation Java based client (client machine can be PC/workstation
supporting JVM) CLASSPATH (pointing to classes.zip, parmon.jar) jar file (parmon.jar) java parmon or java parmon <port-no>
Monitoring System Activities and Resource Utilization
PARMON Launcher
Creation of Node Database
Node Deletion
Group Creation
Group Modification/Deletion
Resource Utilization at a Glance
Selection of Nodes/Group
CPU Usage Monitoring
Memory Usage monitoring
Disk/Network Usage Monitoring
Message Viewer (System logs)
Process activities
Kernel Data Catalog - CPU
Kernel Data Catalog - Memory
Kernel Data Catalog - Disk
Kernel Data Catalog - Network
Catalog of CPU Parameters
Component View - Physical
Component View - Logical
Message Broadcast
System Configuration
System Information
Issuing Commands : halt, shutdown, etc.
Node Diagnostics - Online (SunVTS)
Online Help
PARMON Integration with other Products
PARMON can send resource utilization information to any other product if protocols are made available
PARAM online bulletin board
parmond
Node 1
Node N
Conclusions and Future Directions
PARMON successfully used in monitoring PARAM OpenFrame Supercomputer, which is a cluster of 48 Ultra-4 workstations running SUN-Solaris operating system.
Portable across platforms supporting Java Comprehensive monitoring support and GUI PARMON supports Solaris and Linux clusters and
planned for supporting NT clusters. Can easily be extended to support web-based monitoring
of clusters, by creating a interface server (running on web-server) between client and PARMON server running on cluster nodes.
Thank YOU
?