scheduler basics

12
Scientific Computing Division Juli Rew CISL User Forum May 19, 2005 Scheduler Basics

Upload: goldy

Post on 13-Jan-2016

75 views

Category:

Documents


0 download

DESCRIPTION

Scheduler Basics. Juli Rew CISL User Forum May 19, 2005. IBM Scheduling Life of a Job Submit Filter Batch Priority Scheduler Factors Affecting BPS Job Scheduling LoadLeveler Load Sharing Facility Scheduling • LSF Scheduling on Linux Systems - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Scheduler Basics

Scientific Computing Division

Juli Rew

CISL User Forum

May 19, 2005

Scheduler Basics

Page 2: Scheduler Basics

Scientific Computing Division

Overview

IBM Scheduling• Life of a Job• Submit Filter• Batch Priority Scheduler• Factors Affecting BPS Job Scheduling• LoadLevelerLoad Sharing Facility Scheduling• LSF Scheduling on Linux Systems• Differences from IBM Scheduling

Page 3: Scheduler Basics

Scientific Computing Division

IBM Scheduling: Life of a Jobllsubmit

jobllsubmit

job

Submit FilterRequirements Processing

Submit FilterRequirements Processing

BPSJob Ordering

BPSJob Ordering

Requirements ProblemStaff Rejects Job

Requirements ProblemStaff Rejects Job

Build OrderedList of Jobs

Build OrderedList of Jobs

LoadLevelerJob ExecutionLoadLeveler

Job Execution

Job StartsJob Starts

Job CompletesJob Completes

Requirements Not MetReject Job

Requirements Not MetReject Job

Done

Done Done

Page 4: Scheduler Basics

Scientific Computing Division

Submit Filter Features

• Checks the LoadLeveler job script for:

- valid parameters

- valid queue name

- consistent combinations of features, eg., shared/not_shared, tasks_per_node/node options

• Moves jobs with allocation holds to hold queues• Moves jobs with cutoff projects to standby queue

Page 5: Scheduler Basics

Scientific Computing Division

Batch Priority Job Scheduler Features

• Written at NCAR• Orders jobs based on policy• Creates separate facilities (Community, Climate

System Laboratory)• Further separates jobs into proposal groups

(NCAR/UNIV, CCSM/oCSL)• Hands the final order list to LoadLeveler• Allows for backfilling of jobs to avoid idle

resources

Page 6: Scheduler Basics

Scientific Computing Division

Bluesky Queue Priorities all_spec

all_sp32 all_sp8

CSLNCAR UNIV

csl_sp32

csl_pr32

..

csl_sb32

COMNCAR UNIV

com_sp32

com_pr32

..

com_sb32

CSLCCSM oCSL

csl_sp8

csl_pr8

..

csl_sb8

COMCCSM oCSL

com_sp8

com_pr8

..

com_sb8

interactive, debug, share, test

Page 7: Scheduler Basics

Scientific Computing Division

Prioritization of Jobs by BPS

• all_spec jobs run with the highest priority and can access all nodes

• Below that, all com and csl jobs divided equally• Round Robin by Group/User------------------ all_spec------------------com csl \ / top job

• 50-50% split not hard

Page 8: Scheduler Basics

Scientific Computing Division

Other Factors Affecting Job Scheduling

• Backfilling - Jobs that will not interfere with start of highest priority job allowed to slip in

- Sweet spot: < 3 hours and small node count• Allocation Holds - Job flagged if a

project/division exceeds its 30-day or 90-day allocation thresholds

- H1 and H2 jobs reordered at a priority above standby but below non-flagged jobs

• Special Initiatives - Nodes reserved for real-time or other special runs

Page 9: Scheduler Basics

Scientific Computing Division

Documentation and Utilities

• batchview command gives snapshot of current ordering

• Basic information on scheduling given at

http://www.scd.ucar.edu/docs/ibm/ref/llsched.html

Page 10: Scheduler Basics

Scientific Computing Division

LoadLeveler• IBM's batch control job system

• Allows jobs to be started, stopped, or cancelled

• Controls allocation of resources (CPU, memory)

• Allows custom scheduler plug-in (e.g., BPS)

• Two mutually-exclusive options: LoadLeveler scheduler or custom scheduler.

Page 11: Scheduler Basics

Scientific Computing Division

Load Sharing Facility

• Commercial product from Platform Computing• Currently being used on major Linux platforms• Also available for IBM, but still in evaluation• Ability to do Hierarchical Fair-Share Scheduling

with Backfill, based on same facility scheme used in BPS

• Community/CSL facility division implemented implicitly within the scheduler rather than explicitly by queue name

• Can schedule among multiple platforms - "Grid”

Page 12: Scheduler Basics

Scientific Computing Division

Questions?