integrated workload management for beowulf clusters
DESCRIPTION
Integrated Workload Management for Beowulf Clusters. Bill DeSalvo – April 14, 2004 [email protected]. What We’ll Cover. Platform LSF Family of Products What is Platform LSF HPC Key Features & Benefits How it Works Q&A. What is the Platform LSF Family of Products?. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/1.jpg)
1
Integrated Workload Management for Beowulf Clusters
Bill DeSalvo – April 14, 2004
![Page 2: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/2.jpg)
© Platform Computing Inc. 20032
What We’ll Cover
Platform LSF Family of Products
What is Platform LSF HPC
Key Features & Benefits
How it Works
Q&A
![Page 3: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/3.jpg)
© Platform Computing Inc. 20033
What is the Platform LSF Family of Products?
![Page 4: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/4.jpg)
© Platform Computing Inc. 20034
What Problems Are We Solving?
Solve large, grand challenge, complex problems by optimizing the placement of workload in High Performance Computing environments
![Page 5: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/5.jpg)
© Platform Computing Inc. 20035
Platform LSF HPC
Intelligent, policy-driven high performance computing (HPC) workload processing
Parallel & sequential batch workload management for High Performance Computing (HPC)
Includes patent-pending topology-based scheduling
Intelligently schedules parallel batch jobs
Virtualizes resources
Prioritizes service levels based on policies
Based on Platform LSF:
Standards-based, OGSI-compliant, grid-enabled solution
Commercial production quality product
![Page 6: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/6.jpg)
© Platform Computing Inc. 20036
Platform Customers
![Page 7: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/7.jpg)
© Platform Computing Inc. 20037
Platform Customers
![Page 8: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/8.jpg)
© Platform Computing Inc. 20038
Platform Customers
![Page 9: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/9.jpg)
© Platform Computing Inc. 20039
Platform LSF HPC
Platform LSF HPC AlphaServer SC
Platform LSF HPC for IBM
Platform LSF HPC for Linux
Platform LSF HPC for SGI
Platform LSF HPC for Cray
![Page 10: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/10.jpg)
© Platform Computing Inc. 200310
Extensive Hardware Support
HP
HP AlphaServer SC
HP XC
HP Superdome
HP-UX 11i
SGI
SGI IRIX
SGI TRIX
SGI Altix, SGI Propack
IBM
IBM RS/6000 AIX
IBM SP2/SP3
Linux
IA-64 systens with RedHat
Intel, AMD 32-bit systems with LINUX kernel
Sun
SUN Solaris
High Performance Interconnects
Myrinet with GM
Quadrics QsNet
SGI Numa Flex SGI NumaLink
IBM SP Switch
![Page 11: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/11.jpg)
© Platform Computing Inc. 200311
Platform LSF HPC – Linux Support
HP
HP XC Systems running Unlimited Linux
HP Itanium 2 systems running LINUX 2.4.x kernel, glibc 2.2 with RMS on Quadrics QsNet/Elan3
HP Alpha/AXP systems running LINUX 2.4.x kernel, glibc 2.2.x with RMS on Quadrics QsNet/Elan3
Linux
IA-64 systems, Kernel 2.4.x, compiled with glibc 2.2.x, tested on RedHat 7.3
x86 systems:
Kernel 2.2.x, compiled with glibc 2.1.x, tested on Debian 2.2, OpenLinux 2.4, RedHat 6.2 and 7.0, SuSE 6.4 and 7.0, TurboLinux 6.1
Kernel 2.4.x, compiled with glibc 2.1.x, tested on RedHat 7.x and 8.0, and SuSE 7.0, and RedHat Linux Advanced Server 2.1
Clustermatic Linux 3.0 Kernel 2.4.x, compiled with glibc 2.2.x, tested on RedHat 8.0
Scyld Linux, Kernel 2.4.x, compiled with glibc 2.2.x.
SGI
SGI Altix systems running Linux Kernel 2.4.x compiled with glibc 2.2.x and SGI Propack 2.2 and higher
![Page 12: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/12.jpg)
Key Features and Benefits Platform LSF HPC
![Page 13: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/13.jpg)
© Platform Computing Inc. 200313
Key Features
Optimized Application, System and Hardware Performance
Enhanced Accounting, Auditing & Control
Commercial Grade System Scalability & Reliability
Extensive Hardware Support
Comprehensive, Extensible and Standards-based Security
![Page 14: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/14.jpg)
© Platform Computing Inc. 200314
Key Features – Platform LSF HPC
Optimized Application, System and Hardware Performance
Enhanced Accounting, Auditing & Control
Commercial Grade System Scalability & Reliability
Comprehensive, Extensible and Standards-based Security
![Page 15: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/15.jpg)
© Platform Computing Inc. 200315
Adaptive Interconnect Performance Optimization
Scheduling that takes advantage of unique interconnect properties
IBM SP Switch at the POE software level
RMS on AlphaServer SC (Quadrics)
SGI topology hardware graph
Out-of-the-box functionality without any customization required
![Page 16: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/16.jpg)
© Platform Computing Inc. 200316
Generic Parallel Job Launcher
Generic support for all different types of Parallel Job Launchers
LAMMPI, MPICH-GM, MPICH-P4, POE, SCALI, CHAMPION PRO, etc
Customizable for any vendor or publicly available parallel solution
Control - ensuring no jobs can escape the workload management system
![Page 17: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/17.jpg)
© Platform Computing Inc. 200317
Integrated out-of-the-box Parallel Launcher Support
Full integration with IRIX MPI and array session daemon
Full integration with SGI MPI for Linux
Full integration with Sun HPC Clustertools providing full MPI control, accounting and integration with SUNs PRISM debugger
Vendor MPI libraries provide better performance than open source libraries
Vendor MPI library full support
Vendor integration supported by Platform
Seamless control and accounting
![Page 18: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/18.jpg)
© Platform Computing Inc. 200318
HPC Workload Scheduling
Dynamic load balancing supporting heterogeneous workloads
IBM SP switch aware scheduling
Scheduling of parallel jobs
Number of CPUs, min/max, node span
Backfill on processor & memory
Processor & memory reservation
Topology aware scheduling
Exclusive scheduling
Advance Reservation
Fairshare, Preemption
Accounting
![Page 19: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/19.jpg)
© Platform Computing Inc. 200319
High Performing, Open, Scalable Architecture
Scalable scheduler architecture
Modularized, support for over 500,000 active jobs per cluster
More than 2,000 multi-processor host per cluster - with multiple processors in each host
Process 5x more work & achieve 100% utilization
Scale with business growth
External executable support
Collect information from multiple external resources to track site specific local and global resources
Extends out-of-the-box capabilities to manage additional resources and customer application execution
Differentiation
Multiple vs single external resource collector
Job Groups
Organize jobs into higher level work units - hierarchical tree
Easy to manage and control work to increase user productivity by reducing complexity
OGSI compliance
Future-proof & protect grid investment using standards-based solutions, interoperate with third-party systems
![Page 20: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/20.jpg)
© Platform Computing Inc. 200320
Intelligent Scheduling Policies
Fairshare (User & Project-based)
Ensure job resources are used for the right work
Guarantees resource allocation among users and projects are met
Co-ordinate access to the right number of resources for different users and projects according to pre-defined shares
Differentiation
Hierarchal & guaranteed
Policy-based Preemption
Maximizes throughput of high priority critical work based on priority and load conditions
Prevents starvation of lower priority work
Differentiation
Platform LSF supports multiple preemption policies
Goal-oriented SLA driven policies
Based on customer SLA driven goals: Deadline, Velocity, Throughput
Guarantees projects are completed on time
Reduces projects and administration costs
Provides visibility into the progress of projects
Allows the admin focus on “What work and When” needs to be done, not “how” the resources are to be allocated
Inte
llig
ent
Sch
edu
ler
Fairshare
Preemption
Resource Reservation
Advance Reservation
SLA SchedulingService Level
Agreement
MultiCluster
Other Scheduling
Modules
Plugin Schedulers
License Scheduling
![Page 21: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/21.jpg)
© Platform Computing Inc. 200321
Advanced Self-Management
Flexible, Comprehensive Resource Definitions
Resources defined on a node basis across an entire cluster or subset of the nodes in a cluster
Auto-detectable or user defined resources
Adaptive membership – nodes join and leave Platform LSF clusters dynamically and automatically without administration effort
Dynamic or static resources
Job Level Exception Management
Exception-based error detection to take automatic, configurable, corrective actions
Increased job reliability & predictability
Improved visibility on job and system errors & reduced administration overhead and costs
Automatic Job Migration and Requeue
Automatically migrate and requeue jobs based on policies in the event of host or network failures
Reduce user and administrator overhead in managing failures & reduce risk of running critical workloads
Master Scheduler Failover
Automatically fail over to another host if the master host is unavailable
Continuous scheduling service and execution of jobs & eliminate manual intervention
![Page 22: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/22.jpg)
© Platform Computing Inc. 200322
Backfill
Policy configured at the queue level and applies to all jobs in a queue
Smaller sequential jobs are ‘backfilled’ behind larger parallel jobs
Improves hardware utilization
Users provided with an accurate time when their job will start
![Page 23: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/23.jpg)
Key New Feature & BenefitsPlatform LSF V6.0
![Page 24: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/24.jpg)
© Platform Computing Inc. 200324
Feature Overview
OGSI Compliance
Goal-Oriented SLA-Driven Scheduling
License-Aware Scheduling
Job-Level Exception Management (Self Management Enhancement)
Job Group Support
Other Scheduling Enhancements
Queue-Based Fairshare
User Fairshare by Queue Priority
Job Starvation Prevention plug-in
![Page 25: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/25.jpg)
© Platform Computing Inc. 200325
Feature Overview (Cont.)
HPC Enhancements
Dynamic ptile Enforcement
Resource Requirement Specification for Advance Reservation
Thread Limit Enforcement
General Parallel Support
Parallel Job Size Scheduling
Job Limit Enhancements
Non-normalized Job Run Limit
Resource Allocation Limit Display
Administration and Diagnostics
Scheduler Dynamic Debug
Administrator Action Messages
![Page 26: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/26.jpg)
© Platform Computing Inc. 200326
Goal-Oriented SLA-Driven Scheduling
What is it?
A new scheduling policy.
Unlike current scheduling policies based on configured shares or limits, SLA-driven scheduling is based on customer provided goals:
Deadline based goal: Specify the deadline for a group of jobs.
Velocity based goal: Specify the number of jobs running at any one time.
Throughput based goal: Specify the number of finished jobs per hour.
This scheduling policy works on top of queues and host partitions.
Benefits
Guarantees projects are completed on time according to explicit SLA definitions.
Provides visibility into the progress of projects to see how well projects are tracking to SLAs
Allows the admin focus on “What work and When” needs to be done, not “how” the resources are to be allocated.
Guarantees service level deliveries to the user community, reduces the risks of projects and administration cost.
![Page 27: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/27.jpg)
© Platform Computing Inc. 200327
User case
Problem: we need to finish all simulation jobs before 15:00pm.
Solution: Configure a deadline service class in lsb.serviceclasses file.
Begin ServiceClass
NAME=simulation
PRIORITY=100
GOALS = [deadline timeWindow (13:00 – 15:00)]
DESCRIPTION = A simple deadline demo
End ServiceClass
Submitting and monitoring jobs
$bsub –sla simulation –W 10 –J A[1-50] mySimulation
$date;bsla
Wed Aug 20 14:00:16 EDT 2003
SERVICE_CLASS_NAME: simulation
GOAL: DEADLINE ACTIVE_WINDOW: (13:00 – 15:00)
STATUS: Active:Ontime
DEAD_LINE: (Wed Aug 20 15:00)
ESTIMATED_FINISH_TIME: (Wed Aug 20 14:30)
Optimum Number of Running Jobs: 5
NJOBS PEND RUN SSUSP USUSP FINISH
50 25 5 20
![Page 28: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/28.jpg)
© Platform Computing Inc. 200328
Job-Level Exception Management (Self Management Enhancement)
What is it?
Platform LSF can monitor the exception behavior and take action accordingly.
Benefits
Increased reliability of job execution
Improved visibility on job and system errors
Reduced administration overhead and costs
How it works
Platform LSF V6 handles following exceptions:
“Job eating” machine (or “black-hole” machine): for some reason, jobs keep exiting abnormally on a machine (e.g. no processes, mount daemon dies, etc.)
Job underrun (job run time less than configured minimum time)
Job overrun (job run time more than configured maximum time)
Job run idle (job run without cpu usage increasing).
![Page 29: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/29.jpg)
© Platform Computing Inc. 200329
Job-Level Exception Management (Self Management Enhancement) (Cont.)
Use Case 1:
Requirement: If the host has more than 30 jobs exited in past 5 minutes, I want LSF to close that machine, then notify me and tell me the machine name.
Solution:
Configure host exceptions (EXIT_RATE in lsb.hosts).
Begin Host
HOST_NAME MXJ EXIT_RATE # Keywords
Default ! 6
End Host
Configure the JOB_EXIT_RATE_DURATION = 5 in lsb.params (default value is 10 minutes)
![Page 30: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/30.jpg)
© Platform Computing Inc. 200330
Job-Level Exception Management (Self Management Enhancement) (Cont.)
Use Case 2:
Requirement: If any job runs more than 3 hours, I want LSF to notify me and tell me the jobID.
Solution:
Configure job exceptions (lsb.queues)
Begin Queue
…
JOB_OVERRUN = 3*60 # run time in minutes
End Queue
![Page 31: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/31.jpg)
© Platform Computing Inc. 200331
Job Starvation Prevention Plug-in
What is it?
External scheduler plug-in allows users to define their own equation for job priority
Benefits
Low priority work is guaranteed to run after ‘waiting’ for a specified time ensuring that the job does not wait forever (i.e. starvation).
How it works
By default, the scheduler provides the following calculation
Job priority =A * (q_priority) *MIN(1, int(wait_time/T0))
* (B*requested_processors+MAX(C*wait_time*(1+1/run_time),D)
+E*requested_memory)
Where A, B, C, D, E are coefficients. T0 is the grace period. Default run_time= INFINIT
Admin can define different coefficients for each queue with the following format:
MANDATORY_EXTSCHED=JOBWEIGHT[A=val1; B=val2; …]
![Page 32: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/32.jpg)
© Platform Computing Inc. 200332
Job Starvation Prevention Plug-in
Use Case:
Requirement: Lowest priority queue can wait no more than 10 hours.
Solution: If highest priority queue PRIORITY = 100, lowest priority queue PRIORITY = 20. Configure the following in Lowest queue:
MANDATORY_EXTSCHED=JOBWEIGHT[A=1;B=0;C=10;D=1;E=0;T0=0.1]
After waiting 10 hours, the job in Lowest queue will have higher priority than jobs in highest priority queue.
Note: The formula for calculating job weight is open source and customers can customize it.
![Page 33: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/33.jpg)
© Platform Computing Inc. 200333
Resource Requirement Specification For Advance Reservation
What is it?
Enable users to select the hosts for advance reservation based on the resource requirement.
Benefit
More flexible to reserve the host slots for the mission critical job.
How it works
brsvadd command supports select string: brsvadd –R “select[type==LINUX]” –n 4 –u xwei –b 10:00 –e 12:00
![Page 34: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/34.jpg)
© Platform Computing Inc. 200334
Key Features – Platform LSF HPC
Enhanced Accounting, Auditing & Control
Optimized Application, System and Hardware Performance
Commercial Grade System Scalability & Reliability
Comprehensive, Extensible and Standards-based Security
![Page 35: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/35.jpg)
© Platform Computing Inc. 200335
Job Termination Reasons
Accounting log with detailed audit & error information for every job in the system
Indicates why a job was terminated
Difference between an abnormal termination or caused by Platform LSF HPC
![Page 36: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/36.jpg)
© Platform Computing Inc. 200336
Key Features – Platform LSF HPC
Optimized Application, System and Hardware Performance
Enhanced Accounting, Auditing & Control
Comprehensive, Extensible and Standards-based Security
Commercial Grade System Scalability & Reliability
![Page 37: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/37.jpg)
© Platform Computing Inc. 200337
Enterprise Proven
Running on several of the top 10 supercomputers in the world on the “TOP500” (#2,4,5,6)
More than 250,000 licenses in use spanning 1,500 customer sites
Scales to over 100 clusters, 200,000 CPUs and 500,000 active jobs per cluster
11+ years experience in distributed & grid computing
Risk free investment – proven solution
Commercial production quality
![Page 38: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/38.jpg)
© Platform Computing Inc. 200338
Key Features – Platform LSF HPC
Optimized Application, System and Hardware Performance
Enhanced Accounting, Auditing & Control
Commercial Grade System Scalability & Reliability
Comprehensive, Extensible and Standards-based Security
![Page 39: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/39.jpg)
© Platform Computing Inc. 200339
Comprehensive, Extensible, Standards-based Security
Scalable scheduler architecture
Multiple scheduler plug-in API support
External executable support
Web GUI
Open source components
Risk free investment – proven solution
Commercial grade
Scalability and flexibility as a business grows
![Page 40: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/40.jpg)
How It Works Platform LSF HPC
![Page 41: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/41.jpg)
© Platform Computing Inc. 200341
Fault Tolerance via Master Election
slaveLIM
sbd
Host iHost i
slaveLIM
sbd
Host NHost N
MasterLIM
sbd
Host 1Host 1
mbd
Am I master ?
master announcementmaster announcement
exchange load info
mbsched
![Page 42: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/42.jpg)
© Platform Computing Inc. 200342
Virtual Server Technology
LIM: Collects & centralizes status of all resources in cluster RES: Transparent remote task execution
ELIM
MasterLIM
Load Information
Free memory
Idle Time
Disk I/O RateFree swap space
Number of CPUs
Host Status
CustomStatus
RES RES
Cluster APIs
RES
SlaveLIM
SlaveLIM
SlaveLIM
SlaveLIM
RES
System Monitor
Workload Management
Admin Tools
![Page 43: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/43.jpg)
© Platform Computing Inc. 200343
Executing Work
SBD
SBD
MasterLIM
SlaveLIM
SlaveLIM
MBD
ELIM
Chooses best, available resource to process the job
Gaussian Distributi
onJob
Computational
Chemistry Job ProteinModeling Job
BLASTSequence Job
Jobs
Clients
SlaveLIM
SlaveLIM
SBD SBD SBD
![Page 44: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/44.jpg)
© Platform Computing Inc. 200344
Grid-enabled, Scalable Architecture
Open, modular plug-in schedulers scale
with the growth of your business
![Page 45: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/45.jpg)
© Platform Computing Inc. 200346
Scheduler Framework
The framework hides the complexity of interacting with core services.
Resource Broker responsible for resource information collection from other core services.
Minimize the inter-dependencies between scheduling policies
Maximize extensibility through the plug-in scheduler module stack
Scheduler Framework
Scheduler Modules
Resource Broker
![Page 46: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/46.jpg)
© Platform Computing Inc. 200347
The Four Scheduling Phases
1. Pre-Processing
2. Matching / Limits
3. Order / Allocation
4. Post-Processing
Pre-Selected Jobs
Scheduling Decisions/Job Control DecisionsScheduling Decisions/Job Control Decisions
Localized setup
• Prioritize jobs and allocate resources
• Match eligible resources to nodes
• Allocation adjustments
![Page 47: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/47.jpg)
© Platform Computing Inc. 200348
Multiple Scheduling Modules
Pre-Processing
Matching / Limits
Order / Allocation
Post-Processing
Internal Module
Pre-Processing
Matching / Limits
Order / Allocation
Post-Processing
...
...
...
...
Add-onModule 1
Pre-Processing
Matching / Limits
Order / Allocation
Post-Processing
Add-onModule N
• Vendor specific matching policies (without changing the existing scheduler
• Support for external scheduler
![Page 48: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/48.jpg)
© Platform Computing Inc. 200349
Maui Integration
MBD
SCH_FM
RMGetInfo
Post-Processing
Pre-processing
Order jobs
UIProcessClients
QueueScheduleSJobsQueueScheduleRJobsQueueScheduleIJobs
QueueBackFill
Job, Host, Res Info
Decisions and ack
Sync
MAUI PluginEvent Handle
(wait until GO event)
MAUIScheduler
![Page 49: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/49.jpg)
Linux-specific Solutions
![Page 50: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/50.jpg)
© Platform Computing Inc. 200351
Controlling an MPI job
On a distributed system (Linux cluster) there are many problems to address:
1. Job launch across multiple nodes
2. Gather resource usage while job executes
3. Propagate signals
4. Job “clean-up” to eliminate “dangling” MPI processes
5. Comprehensive job accounting
![Page 51: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/51.jpg)
© Platform Computing Inc. 200352
Resource manager
Resource manager
submitsubmit
mpirunmpirun
a.outa.out a.outa.out
JobscriptJobscript
“traditional” MPI sequence
Joblauncher
Joblauncher
![Page 52: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/52.jpg)
© Platform Computing Inc. 200353
Platform LSF HPC for Linux - MPICH-GM
mbatchdmbatchd
sbatchdsbatchd
Job scriptJob script
mpirunmpirun
TSTS
resres
gmmpirun_wrappergmmpirun_wrapper
a.outa.out
TSTS
resres
PIMPIM
bsubbsub
a.outa.out
pampam
resres
PIMPIM
![Page 53: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/53.jpg)
© Platform Computing Inc. 200354
Execution Host H1
PIM LIM
master LIM
Master Host
lsblib
LIM PIM
bsub
SBD
MBD SBD
SBD child
pam
high
med
hpc_queue
Queues
MBSCHD
Submission host
H2
PJL
TaskStarter
a.out: process 1
TaskStarter
a.out: process 2
PJL wrapper Root resRoot res
LIM
Signals and rusage collection
Hostname & pid
Hostname & pid
Platform LSF HPC for Linux/Myrinet - Generic PJL
![Page 54: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/54.jpg)
© Platform Computing Inc. 200355
Execution Host H1
PIM LIM
master LIM
Master Host
lsblib
LIM PIM
bsub
SBD
MBD SBD
SBD child
pam
high
med
hpc_queue
Queues
MBSCHD
Submission host
H2
esub
elim
elim
Mpirun.ch_gm
TaskStarter
a.out: process 1
TaskStarter
a.out: process 2
Gmmpirun_wrapper
Root resRoot res
LIM
elim
Set LSF_PJL_TYPETo mpich_gm
Report resource availability
Signals and rusage collection
Report resource availability
Hostname & pid
Hostname & pid
rsh
Platform LSF HPC for Linux/Myrinet - MPICH_GM
Mpirun.lsf
![Page 55: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/55.jpg)
© Platform Computing Inc. 200356
Platform LSF HPC for Linux/Myrinet - LAM/MPI
Execution Host H1
PIM LIM
master LIM
Master Host
lsblib
LIM PIM
bsub
SBD
MBD SBD
SBD child
pam
high
med
hpc_linux
Queues
MBSCHD
Submission host
H2
esub
elim
elim
mpirun
TaskStarter
a.out: process 1
TaskStarter
a.out: process 2
Lammpirun_wrapper
Root resRoot res
LIM
elim
Set LSF_PJL_TYPETo lammpi
Report resource availability
Signals and rusage collection
Report resource availability
Hostname & pid
Hostname & pid
lamd
lamd
Mpirun.lsf
![Page 56: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/56.jpg)
© Platform Computing Inc. 200357Execution Host H1
PIM LIM
master LIM
Master Host
lsblib
LIM PIM
bsub
SBD
MBDSBD
SBD child
pam
high
med
low
Queues
MBSCHD
Submission host
H2
mpimon
TaskStarter
a.out: process 1
TaskStarter
a.out: process 2
Scali mpi wrapper
Root resRoot res
LIM
Signals and rusage collection
Hostname & pid
Hostname & pid
Platform LSF HPC for Linux/Myrinet - Scali MPI
mpidmpid
mpisubmon mpisubmon
![Page 57: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/57.jpg)
© Platform Computing Inc. 200358
Platform LSF HPC for Linux/QsNet
LSF Execution host /RMS node n0
PIM LIM
master LIM
Master Host
lsblib
LIM PIM
bsub
SBD
MBD
SBD
SBD child – exec() res
Res – rms_run()high
med
low
Queues
MBSCHD
Submission host
RLA
Job’s Allocation
User Job
Node n1
Node n2
RMS plugin
![Page 58: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/58.jpg)
© Platform Computing Inc. 200359
Scyld Beowulf Integration
• Scyld Beowulf handles the systems management challenge effectively
• No OS to distribute / synchnronize• Central point of control from master• Single process space makes it appear as large SMP
• Platform integrates with Scyld treating cluster as SMP and allocating resources
• Integrate with mpirun, mpprun or bpsh to start tasks• Collect resource usage from BPROC• Collect load information via BPROC APIs• Singe user interface across Sycld & non-Scyld env.
![Page 59: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/59.jpg)
© Platform Computing Inc. 200360
Platform LSF HPC for Linux/BProc
Bproc Front-end Node
PIM LIM
master LIM
Master Host
lsblib
LIM PIM
bsub
SBD
MBDSBD
high
med
low
Queues
1A
1B
1C
2
3
4
6B
6C
MBSCHD
5
Submission host
Job file
H3
Res
SBD child –exec() res
allocated nodes
Computing Nodes
Bpsh/mpirun
User Job Processes
esub
Modify submission options
![Page 60: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/60.jpg)
© Platform Computing Inc. 200361
More info at:
• www.platform.com/customers
• www.platform.com/barriers
![Page 61: Integrated Workload Management for Beowulf Clusters](https://reader031.vdocuments.site/reader031/viewer/2022020715/5681475f550346895db49d7a/html5/thumbnails/61.jpg)
Q & A