monitoring system basics, and adding support for a new...
TRANSCRIPT
Mitg
lied
derH
elm
holtz
-Gem
eins
chaf
t
Monitoring system basics,and adding support fora new batch system
September 19, 2012 Carsten Karbach and Wolfgang Frings
Content
1 PTP System Monitoring2 Architecture3 Large-scale system Markup Language (LML)4 Implementation5 Scalability6 Add new target system7 Conclusion
September 19, 2012 Carsten Karbach and Wolfgang Frings 2 40
Mitg
lied
derH
elm
holtz
-Gem
eins
chaf
t
Part I: PTP System Monitoring
September 19, 2012 Carsten Karbach and Wolfgang Frings
1. PTP System Monitoring
PTP Monitoring Scope
job and system monitoring of large-scale supercomputersmonitoring of multiple target systems in one perspectivesupport for many batch systems (Grid Engine,LoadLeveler, Open MPI, PBS, Slurm, Torque)overview of the system on a single screenbased on monitoring application LLview
September 19, 2012 Carsten Karbach and Wolfgang Frings 4 40
1. PTP System Monitoring
PTP Monitoring Perspective
September 19, 2012 Carsten Karbach and Wolfgang Frings 5 40
1. PTP System Monitoring
PTP Monitoring Perspective
September 19, 2012 Carsten Karbach and Wolfgang Frings 5 40
1. PTP System Monitoring
PTP Monitoring Perspective
September 19, 2012 Carsten Karbach and Wolfgang Frings 5 40
1. PTP System Monitoring
PTP Monitoring Perspective
September 19, 2012 Carsten Karbach and Wolfgang Frings 5 40
1. PTP System Monitoring
PTP Monitoring Perspective
September 19, 2012 Carsten Karbach and Wolfgang Frings 5 40
1. PTP System Monitoring
Monitoring Views
Nodes View renders target system architecture,maps jobs to compute resourcesActive Jobs View lists running jobsInactive Jobs View lists queued jobsMonitoring View selects active target system,starts/stops monitoringMessage View shows message of the day
September 19, 2012 Carsten Karbach and Wolfgang Frings 6 40
Mitg
lied
derH
elm
holtz
-Gem
eins
chaf
t
Part II: Architecture
September 19, 2012 Carsten Karbach and Wolfgang Frings
2. Architecture
Monitoring Architecture I
September 19, 2012 Carsten Karbach and Wolfgang Frings 8 40
2. Architecture
Monitoring Architecture II
LML da gathers status information,calls target system’s remote commands, written in PerlLML is a data format for status informationof supercomputersLML request: contains table filtering information,visible/hidden columnsLML response: contains the request and status informationclient stores current layout requestfor successive Eclipse sessions
September 19, 2012 Carsten Karbach and Wolfgang Frings 9 40
2. Architecture
Communications Protocol
September 19, 2012 Carsten Karbach and Wolfgang Frings 10 40
Mitg
lied
derH
elm
holtz
-Gem
eins
chaf
t
Part III: Large-scale system MarkupLanguage (LML)
September 19, 2012 Carsten Karbach and Wolfgang Frings
3. Large-scale system Markup Language (LML)
Large-scale system Markup Language (LML)
markup language for description of supercomputer’s statusinterface between LML da and visualization clientsdescribes all graphical components available in LLview(job-list, node display, charts ...)logical and system independent descriptionof current statusclose to graphical representation→ thin visualization clientsimplemented in XML, validation against XML-Schema
September 19, 2012 Carsten Karbach and Wolfgang Frings 12 40
3. Large-scale system Markup Language (LML)
LML Structure
Global data: intermediate data format, scheduling objectsGraphical components:data for visualization componentsLayout: hints for visualization, node display hierarchy
September 19, 2012 Carsten Karbach and Wolfgang Frings 13 40
3. Large-scale system Markup Language (LML)
Example node display
LML-description of a node display
<nodedisplay title="Cluster" id="1">
<scheme >
<el1 tagname="Node" min="1" max="4">
<el2 tagname="CPU" min="1" max="3"/>
</el1>
</scheme >
<data>
<el1 min="1" max="2" oid="job1">
<el2 min="3" oid="job2"/>
</el1>
<el1 min="3" max="4" oid="job2"/>
</data>
</nodedisplay >
September 19, 2012 Carsten Karbach and Wolfgang Frings 14 40
Mitg
lied
derH
elm
holtz
-Gem
eins
chaf
t
Part IV: Implementation
September 19, 2012 Carsten Karbach and Wolfgang Frings
4. Implementation
Plug-in Overview
September 19, 2012 Carsten Karbach and Wolfgang Frings 16 40
4. Implementation
Plug-in Details I
lml.core
uses JAXB to manage LML filesprovides helper classes for comfortable access on LMLdefines events and listeners for all UI actions
lml.ui
implements a ViewPart for Nodes / Jobs and Info Viewsrenders LML components, considers LML layout hintsUI implementation:
Jobs View → JFace TreeViewerNodes View → nests Composites, PaintListenerInfo View → SWT Text / Table
September 19, 2012 Carsten Karbach and Wolfgang Frings 17 40
4. Implementation
Plug-in Details II
lml.da
set of Perl scripts → simple installationsplit into independent modules:1 Driver calls system specific commands to get status infos2 Combiner merges LML files generated by the Driver3 AddColor assigns unique color to each job4 LML2LML converts intermediate format to final LML
fully configurable workflowonly Driver scripts and workflow description aretarget system specific
September 19, 2012 Carsten Karbach and Wolfgang Frings 18 40
Mitg
lied
derH
elm
holtz
-Gem
eins
chaf
t
Part V: Scalability
September 19, 2012 Carsten Karbach and Wolfgang Frings
5. Scalability
Node Display – Compression
September 19, 2012 Carsten Karbach and Wolfgang Frings 20 40
5. Scalability
Node Display – CompressionAttribute Inheritance
September 19, 2012 Carsten Karbach and Wolfgang Frings 21 40
5. Scalability
Node Display – Compression
Schemedefines architecture of empty system
September 19, 2012 Carsten Karbach and Wolfgang Frings 22 40
5. Scalability
Node Display – CompressionRanges
Result→ 3 objects instead of 16
September 19, 2012 Carsten Karbach and Wolfgang Frings 23 40
5. Scalability
Scalable Visualization – Row Level
September 19, 2012 Carsten Karbach and Wolfgang Frings 24 40
5. Scalability
Scalable Visualization – Nodeboard Level
September 19, 2012 Carsten Karbach and Wolfgang Frings 25 40
5. Scalability
Scalable Visualization – CPU Node Level
September 19, 2012 Carsten Karbach and Wolfgang Frings 26 40
5. Scalability
Table Filtering
scalability of job data displayfiltering by specification ofattribute values / rangesserver- and client-side filtering
September 19, 2012 Carsten Karbach and Wolfgang Frings 27 40
Mitg
lied
derH
elm
holtz
-Gem
eins
chaf
t
Part VI: Add new target system
September 19, 2012 Carsten Karbach and Wolfgang Frings
6. Add new target system
LML da – Data Extraction
Which status data is required?
jobs: owner, queue, dispatch date, state ...nodes: memory, cores, state, GPUs ...system: hostname, date, high messages ...
How is status data extracted?
one Perl script per data typecall batch system commands(e.g. qstat and pbsnodes for TORQUE)convert output into intermediate LML files
September 19, 2012 Carsten Karbach and Wolfgang Frings 29 40
6. Add new target system
LML da – add new target system
1 write scripts for extracting status information e.g.da jobs info LML.pl gathers jobsda nodes info LML.pl gathers nodesda system info LML.pl gets global system information
2 write functions for checking remote commands and forcomposing the workflow → da check info LML.pl
3 put all scripts into a folder in lml.da/rms,folder named after target system
4 add lml.monitor.ui.monitors extensionfor the new system type
September 19, 2012 Carsten Karbach and Wolfgang Frings 30 40
6. Add new target system
Client – add new target system
LML da generates system independent LML filesclient visualizes new target systems without any changesbut a system specific LML layout can be configuredcurrently this layout configuration is tricky
September 19, 2012 Carsten Karbach and Wolfgang Frings 31 40
6. Add new target system
LML Layout Configuration I
the following procedure is only meant for testingclient side configuration of LML layout is plannedlong-term target:configure entire layout via GUI, not via XML filethe layout could be saved together withthe target system configuration
September 19, 2012 Carsten Karbach and Wolfgang Frings 32 40
6. Add new target system
LML Layout Configuration II
1 write your own LML layout file, or adjust an existing(examples in lml.da plug-in)
2 start the monitoring connection to the target system3 log-in on the remote machine and switch
to the .eclipsesettings folder in your home directory4 create the file .LML da options with content
nocheckrequest=15 replace samples/layout default.xml in .eclipsesettings
with your own layout6 refresh the monitoring perspective on your client
September 19, 2012 Carsten Karbach and Wolfgang Frings 33 40
6. Add new target system
Layout Example – Default
September 19, 2012 Carsten Karbach and Wolfgang Frings 34 40
6. Add new target system
Layout Example – Custom I
September 19, 2012 Carsten Karbach and Wolfgang Frings 35 40
6. Add new target system
Layout Example – Custom II
September 19, 2012 Carsten Karbach and Wolfgang Frings 36 40
Mitg
lied
derH
elm
holtz
-Gem
eins
chaf
t
Part VII: Conclusion
September 19, 2012 Carsten Karbach and Wolfgang Frings
7. Conclusion
Conclusion
5 interacting views form the monitoring perspectiveLML as data interface between LML da and clientLML is divided into global data,graphical components and layoutcomposition of 6 Eclipse plug-ins(lml.core/ui, monitor.core/ui, lml.da/lml.da.server)new target system:write data extraction scripts + configure layoutScalability
scalable data acquisitionscalable data formatscalable presentation
September 19, 2012 Carsten Karbach and Wolfgang Frings 38 40
7. Conclusion
Questions?
September 19, 2012 Carsten Karbach and Wolfgang Frings 39 40
7. Conclusion
Contact
E-mail:[email protected], [email protected] → http://llview.zam.kfa-juelich.de/LML
LLview → http://www.fz-juelich.de/jsc/llview
September 19, 2012 Carsten Karbach and Wolfgang Frings 40 40