grid(lab) resource management system

70
Grid(Lab) Resource Management System …and general Grid Resource Management Jarek Nabrzyski et al. [email protected] Poznan Supercomputing And Networking Center

Upload: ludlow

Post on 09-Jan-2016

47 views

Category:

Documents


0 download

DESCRIPTION

Grid(Lab) Resource Management System. …and general Grid Resource Management. Jarek Nabrzyski et al. [email protected]. Poznan Supercomputing And Networking Center. GridLab. EU funded project, involving 11 European and 3 American partners (Globus and Condor teams), - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Grid(Lab) Resource Management System

Grid(Lab) Resource Management System

…and general Grid Resource Management

Jarek Nabrzyski et al.

[email protected]

Poznan Supercomputing And Networking Center

Page 2: Grid(Lab) Resource Management System

CGW 2003

GridLab

EU funded project, involving 11 European and 3 American partners (Globus and Condor teams),

January 2002 – December 2004

Main goal: to develop a Grid Application Toolkit (GAT) and set of grid services and tools...

resource management (GRMS), data management,monitoring, adaptive components, mobile user support, security services,portals,

... and test them on a real testbed with real applications

Page 3: Grid(Lab) Resource Management System

CGW 2003

GridLab Members

PSNC (Poznan) - coordination AEI (Potsdam) ZIB (Berlin) Univ. of Lecce Cardiff University Vrije Univ. (Amsterdam) SZTAKI (Budapest) Masaryk Univ. (Brno) NTUA (Athens)

Sun MicrosystemsCompaq (HP)

ANL (Chicago, I. Foster) ISI (LA, C.Kesselman) UoWisconsin (M. Livny)

collaborating with:Users!

EU Astrophysics Network,

DFN TiKSL/GriKSL

NSF ASC Project

other Grid projectsGlobus, Condor,

GrADS,

PROGRESS,

GriPhyn/iVDGL,

CrossGrid and all the other European Grid Projects (GRIDSTART)

other...

Page 4: Grid(Lab) Resource Management System

CGW 2003

It’s Easy to ForgetHow Different 2003 is From 1993

Ubiquitous Internet: 100+ million hostsCollaboration & resource sharing the norm

Ultra-high-speed networks: 10+ Gb/sGlobal optical networks

Enormous quantities of data: PetabytesFor an increasing number of communities, gating step is not collection but analysis

Huge quantities of computing: 100+ Top/sUbiquitous computing via clusters

Moore’s law everywhere: 1000x/decadeInstruments, detectors, sensors, scanners

Courtesy of Ian Foster

Page 5: Grid(Lab) Resource Management System

CGW 2003

And Thus,The Grid “Problem” (or Opportunity)

Dynamically link resources/servicesFrom collaborators, customers, eUtilities, … (members of evolving “virtual organization”)

Into a “virtual computing system”Dynamic, multi-faceted system spanning institutions and industries

Configured to meet instantaneous needs, for:

Multi-faceted QoX for demanding workloadsSecurity, performance, reliability, …

Courtesy of Ian Foster

Page 6: Grid(Lab) Resource Management System

CGW 2003

Integration as a Fundamental Challenge

R

Discovery

Many sourcesof data, services,computation

R

Registries organizeservices of interestto a community

Access

Data integration activitiesmay require access to, &exploration of, data at many locations

Exploration & analysismay involve complex,multi-step workflows

RM

RM

RMRM

RM

Resource managementis needed to ensureprogress & arbitrate competing demands

Securityservice

Securityservice

PolicyservicePolicyservice

Security & policymust underlie access& managementdecisions

Courtesy of Ian Foster

Page 7: Grid(Lab) Resource Management System

CGW 2003

Grid Scheduling

Current approach:Extension of job scheduling for parallel computersResource discovery and load-distribution to a remote resourceUsually batch job scheduling model on remote machine

But actually required for Grid scheduling is:Co-allocation and coordination of different resource allocations for a Grid job Instantaneous ad-hoc allocation not always suitable

This complex task involves:“Cooperation” between different resource providersInteraction with local resource management systems (interfaces and functions of many LRM differ from each other)Support for reservations and service level agreementsOrchestration of coordinated resource allocations

Page 8: Grid(Lab) Resource Management System

CGW 2003

Allocation for Grid Job Example

time

Data Transfer

Loading Data Parallel Computation Providing Data

Data Transfer Network 1

Computer 1

Parallel ComputationComputer 2

Communication for Computation

Network 3

VR-Cave Visualization

Data Data Access Storing Data

Communication for Visualization

Network 2

Software UsageSoftware License

Data StorageStorage

Task of the Grid Resource Management!

Task of the Grid Resource Management!

Page 9: Grid(Lab) Resource Management System

CGW 2003

Local Scheduling Systems

Observation:

Local resource management (LRM) systems existrequire extension for Grids by additional software or

will directly support Grids in the future

DRMAA is available today for some LRM systems

Different LRM systems will be part of the Grid and will perform a lower-level scheduling

In addition the Grid will require some higher-level scheduling for coordinating the user’s jobs.

Multi-level scheduling model

Page 10: Grid(Lab) Resource Management System

CGW 2003

Multi-level Grid Scheduling Architecture

Scheduler

Schedule

time

localJob-Queues

Resource 1

Scheduler

Schedule

time

localJob-Queues

Resource 2

Scheduler

Schedule

time

localJob-Queues

Resource n

Grid-Scheduler

Grid-User

Higher-levelGrid Scheduling

Lower-levelScheduling

Grid SchedulingArchitecture

Courtesy of Ramin Yahyapour

Page 11: Grid(Lab) Resource Management System

CGW 2003

User Objective

Local computing typically has:

A given scheduling objective as minimization of response time

Use of batch queuing strategies

Simple scheduling algorithms: FCFS, Backfilling

Grid Computing requires:

Individual scheduling objectivebetter resources

faster execution

cheaper execution

More complex objective functions apply for individual Grid jobs!

Page 12: Grid(Lab) Resource Management System

CGW 2003

Provider/Owner Objective

Local computing typically has:

Single scheduling objective for the whole system: e.g. minimization of average weighted response time or high utilization/job throughput

In Grid Computing:

Individual policies must be considered:access policy,priority policy,accounting policy, and other

More complex objective functions apply for individual resource allocations!User and owner policies/objectives may be subject to privacy considerations!

Page 13: Grid(Lab) Resource Management System

CGW 2003

Grid Economics – Different Business Models

Cost modelUse of a resource

Reservation of a resource

Individual scheduling objective functionsUser and owner objective functions

Formulation of an objective function

Integration of the function in a scheduling algorithmMarket-economic approaches

Application of computational intelligence

Resource selectionThe scheduling instances act as broker

Collection and evaluation of resource offers

Page 14: Grid(Lab) Resource Management System

CGW 2003

Scheduling Model

Using a Brokerage/Trading strategy:

Submit Grid Job Description

Analyze Query

Query for Allocation Offers

Higher-levelscheduling

Lower-levelscheduling

Discover ResourcesSelect Offers

Collect Offers

Coordinate Allocations

Generate Allocation Offer

Consider individual owner policies

Consider individual userpolicies

Consider communitypolicies

Page 15: Grid(Lab) Resource Management System

CGW 2003

Properties of Multi-Level Scheduling Model

Multi-level scheduling must support different RM systems and strategies.

Provider can enforce individual policies in generating resource offers

User receives resource allocation optimized to the individual objective

Different higher-level scheduling strategies can be applied

Multiple levels of scheduling instances are possible

Support for fault-tolerant and load-balanced services

Page 16: Grid(Lab) Resource Management System

CGW 2003

Negotiation in Grids

Multilevel Grid scheduling architectureLower level local scheduling instance

Implementation of owner policies

Higher level Grid scheduling instanceResource selection and coordination

(Static) Interface definition between both instancesDifferent types of resources

Different local scheduling systems with different properties

Different owner policies

(Dynamic) Communication between both instancesResource discovery

Job monitoring

Page 17: Grid(Lab) Resource Management System

CGW 2003

GGF WG Scheduling Attributes

Define the attributes of a lower-level scheduling instance that can be exploited by a higher-level scheduling instance.

Attributes of allocation properties Guaranteed completion time of allocation,

Allocations run-to-completion, …

Attributes of available informationAccess to tentative schedule,

Exclusive control,…

Attributes for manipulating allocation executionPreemption,

Migration,…

Attributes for requesting resourcesAllocation Offers,

Advance Reservation,…

Page 18: Grid(Lab) Resource Management System

CGW 2003

Towards Grid Scheduling

Grid Scheduling Methods:Support for individual scheduling objectives and policies

Economic scheduling methods to Grids

Multi-criteria scheduling models (most general)

Architectural requirements:Generic job description

Negotiation interface between higher- and lower-level scheduler

Economic management services

Workflow management

Integration of data and network management

Interoperability is a key!

Page 19: Grid(Lab) Resource Management System

CGW 2003

Data and Network Scheduling

Most new resource types can be included via individual lower-level resource management systems.

Additional considerations for

Data managementSelect resources according to data availability

But data can be moved if necessary!

Network managementConsider advance reservation of bandwidth or SLA

Network resources usually depend on the selection of other resources!

Coordinate data transfers and storage allocation

User empowered/owned lambdas!

Page 20: Grid(Lab) Resource Management System

CGW 2003

Example of a Scheduling Process

Scheduling Service:1. receives job description2. queries Information Service for static resource

information3. prioritizes and pre-selects resources4. queries for dynamic information about resource

availability5. queries Data and Network Management Services6. generates schedule for job7. reserves allocation if possible

otherwise selects another allocation8. delegates job monitoring to Job Manager

Job Manager/Network and Data Management: service, monitor and initiate allocation

Example:

40 resources of requested type are found.

12 resources are selected.

8 resources are available.

Network and data dependencies are detected.

Utility function is evaluated.

6th tried allocation is confirmed.

Data/network provided and

job is started

Page 21: Grid(Lab) Resource Management System

CGW 2003

Conclusions for Grid Scheduling

Grids ultimately require coordinated scheduling services.

Support for different scheduling instancesdifferent local management systems

different scheduling algorithms/strategies

For arbitrary resourcesnot only computing resources, also

data, storage, network, software etc.

Support for co-allocation and reservationnecessary for coordinated grid usage (see data, network, software, storage)

Different scheduling objectivescost, quality, other

Grid resource management services are a key to success of the Grid vision!…so are the applications that could show the Grid benefit!

Page 22: Grid(Lab) Resource Management System

CGW 2003

Integration of a Grid Scheduling System

Globus as de-facto standardbut no higher-level scheduling services available

Many projects include scheduling requirementsFocus on a running implementation for a specific problem

No attempt to generate a general solution

Grid scheduling cannot be developed by single groupsRequirements for several other services

Community effort is key!

Requires open Grid standards that enables Grid schedulingSupport for different implementations while being interoperable

Page 23: Grid(Lab) Resource Management System

CGW 2003

Activities

Core service infrastructureOGSA/OGSI

GGF hosts several groups in the area of Grid scheduling and resource management.

Examples:

WG Scheduling Attributes (finished)

WG Grid Resource Allocation Agreement Protocol (active)

WG Grid Economic Services Architecture (active)

WG Scheduling Ontology (proposed)

RG Grid Scheduling Architecture (proposed)

Network of Excellence “CoreGRID” (proposed)define the software infrastructure for Next Generation Grids

Page 24: Grid(Lab) Resource Management System

CGW 2003

What are Basic Blocks for a Grid Scheduling Architecture?

Scheduling Service

Data Management Service

Network Management Service

Information Service

Resources

Data

Network

Network-Resources

Management SystemNetwork

Network Manager

ManagementSystem

Compute/ Storage /Visualization etc

Compute Manager Data Manager

Data-Resources

Query for resources

Maintain information

Maintain information

static & scheduled/forecasted

Reservation

Accounting and Billing

Service

Job Supervisor

Service

Basic Blocks and Requirements are still

to be defined!

Courtesy of Ramin Yahyapour

Page 25: Grid(Lab) Resource Management System

CGW 2003

Conclusion

Resource management and scheduling is a key service in an Next Generation Grid

In a large Grid the user cannot handle this taskNor is the orchestration of resources a provider task

System integration is complex but vitalIndividual results may be of limited benefit without being embedded in a larger project

Basic research is required in this area.No ready-to-implement solution is available (although EDG, CrossGrid, GridLab etc. work on it)New concepts are necessary

A significant joint effort is needed to support Grid Scheduling!Also research is still necessary!

Page 26: Grid(Lab) Resource Management System

CGW 2003

RM in GridLab: What our users want...

Two primary applications: Cactus (simulations) and Triana (data mining/analysis)

other application communities are also being engaged,

Application oriented environment Resources (grid) on demandAdaptive applications/adaptive scenarios – adaptive grid environment

job checkpoint, migration, spawn off a new job when needed,

Open, pervasive, not even restricted to a single Virtual OrganizationThe ability to work in a disconnected environment

start my job on a disconnected laptop; migrate it to grid when it becomes availablefrom laptops to fully deployed Virtual Organizations

Mobile workingSecurity on all levels

Page 27: Grid(Lab) Resource Management System

CGW 2003

What our users want... (cont.)

The infrastructure must provide capabilities to customise choice of service implementation (e.g. using efficiency, reliability, first succeeding, all)Advance reservation of resources,To be able to express their preferences regarding their jobs on one hand and to understand the resource and VO policies on the other hand,Policy information and negotiation mechanisms

what is a policy of usage of the remote resources?

Prediction-based information How long will my job run on a particular resource?What resources do I need to complete the job before deadline?

Page 28: Grid(Lab) Resource Management System

CGW 2003

Coalescing Binary Scenario

GridLabTest-bed

GW Data

Distributed Storage

Logical File Name

CB Search

Controller

GAT (GRMS, Adaptive)

GW Data

GAT (Data Management)

• Submit Job• Optimised Mapping

Email, SMS notification

Page 29: Grid(Lab) Resource Management System

CGW 2003

GridLab RMS approach

Grid resources are not only the machines, but also databases, files, users, administrators, instruments, mobile devices, jobs/applications ...Many metrics for scheduling: throughput, cost, latency, deadline, other time and cost metrics...Grid resource management consists of job/resource scheduling, security (authorization services,...), local policies, negotiations, accounting, ...GRM is both, user and resource owner driven negotiation process and thus, multicriteria decision making processWS-Agreement is badly neededTwo ongoing implementations: production keep-up and future full-feature

Page 30: Grid(Lab) Resource Management System

CGW 2003

GRMS - the VO RMS

GRMS is a VO (community) resource management systemComponent and pluggable architecture allows to use it as a framework for many different VOsComponents include:

Resource discovery (now MDS-based, other solutions easy to be added: now adding the GridLab testbed information system: see Ludek’s talk tomorrow)Scheduling (Multicriteria, economy, workflow, co-scheduling, SLA-based scheduling) - work in progressJob ManagementWorkflow ManagementResource Reservation

Page 31: Grid(Lab) Resource Management System

CGW 2003

GRMS - the plan

Job Receiver

Jobs Queue

BROKER ExecutionUnit

ResourceDiscovery

Scheduler

ResourceReservation

PredictionUnit

File TransferUnit

InformationServices

DataManagement

AuthorizationSystem

Adaptive

WorkflowManager

SLANegotiation

GRMS

Monitoring

GLOBUS, other

Local Resources (Managers)

Page 32: Grid(Lab) Resource Management System

CGW 2003

Current implementation

submitJob - submits new job,

migrateJob - migrates existing job,

getMyJobsList - returns list of jobs belonging to the user,

registerApplicationAccess - registers application access,

getJobStatus - returns GRMS status of the job,

getHostName - returns host name, on which the job is/was running

getJobInfo - returns a structure describing the job,

findResources - returns resources matching user's requirements,

cancelJob - cancels the job,

getServiceDescription - returns description of a service.

Page 33: Grid(Lab) Resource Management System

CGW 2003

GRMS - overview

ResourceDiscovery

Broker

JobManager

Resource Management System

Globus Infrastructure

•Data Management•Adaptive Components•GridLab Authorization Service

•MDS•GRAM•GridFTP•GASS

Grid Environment

GridLab Services

User Access Layer

(GAT) Application

GridLabPortal

Page 34: Grid(Lab) Resource Management System

CGW 2003

GRMS –detailed view

ResourceDiscovery

Broker

JobManager

WebService

Interface

JobQueue

User AccessLayer System

LayerServices

GridLabServices

WorkflowMgmt.

TaskRegistry

DB

Page 35: Grid(Lab) Resource Management System

CGW 2003

GRMS –detailed view

ResourceDiscovery

Broker

JobManager

JobQueue

User AccessLayer System

LayerServices

GridLabServices

WorkflowMgmt.

TaskRegistry

DB

WebService

Interface

WebService

Interface

WebService

Interface

WebService

Interface

WebService

Interface

WebService

Interface

WebService

Interface

WebService

Interface

Page 36: Grid(Lab) Resource Management System

CGW 2003

GRMS functionality

Ability to choose the best resources for the job execution, according to Job Description and chosen mapping algorithm;Ability to submit the GRMS Task according to provided Job Description;Ability to migrate the GRMS Task to better resource, according to provided Job Description;Ability to cancel the Task;Provides information about the Task status;Provides other information about Tasks (name of host where the Task is/was running, start time, finish time);

Page 37: Grid(Lab) Resource Management System

CGW 2003

GRMS functionality (cont.)

Provides list of candidate resources for the Task execution (according to provided Job Description);Provides a list of Tasks submitted by given user;Ability to transfer input and output files (GridFTP, GASS, WP8 Data Management System);Ability to contact Adaptive Components Services to get additional information about resourcesAbility to register a GAT Application callback informationAbility to submit a set of tasks with precedence constraints (work-flow of tasks and input/output files)

Page 38: Grid(Lab) Resource Management System

CGW 2003

GRMS modules

Broker ModuleSteers process of job submittion

Chooses the best resources for job execution (scheduling algorithm)

Transfers input and output files for job's executable

Resource Discovery ModuleFinds resources that fullfills requirements described in Job Description

Provides information about resources, required for job scheduling

Page 39: Grid(Lab) Resource Management System

CGW 2003

GRMS modules (cont.)

Job Manager ModuleAbility to check current status of job

Ability to cancel running job

Monitors for status changes of runing job

Workflow Management ModuleCreates workflow graph of tasks from Job Description

Put tasks to Job Queue

Controls the tasks execution according to precedence constraints

Page 40: Grid(Lab) Resource Management System

CGW 2003

GRMS modules (cont.)

Web Service InterfaceProvides GSI enabled web service interface for Clients (GAT Application, GridLab Portal)

Job QueueAllows to put the tasks into the queue

Provides way for getting tasks from queue accorging to configured algorithm (FIFO)

Task RegistryStores information about the task execution (start time, finish time, machine where executed, current status, Job Description)

Page 41: Grid(Lab) Resource Management System

CGW 2003

Job Description

Task executable file location

arguments

file argument (files which have to be present in working directory of running executable)

environment variables

standard input

standard output

standard error

checkpoint files

Page 42: Grid(Lab) Resource Management System

CGW 2003

Job Description (cont.)

Resource requirements of executable name of host for job execution (if provided no scheduling algorithm is used)

operating system

required local resource management system

minimum memory required

minimum number of cpus required

minimum speed of cpu

other parameter passed directly to Globus GRAM

Page 43: Grid(Lab) Resource Management System

CGW 2003

Job Description – new elements

Job Description consists of one or more Task descriptions

Each Task can have a section which denotes parent tasks

Page 44: Grid(Lab) Resource Management System

CGW 2003

Job Description - example

< grmsjob appid = MyApplication><task id=1>

<resource><osname> Linux </osname><memory> 128 </memory><cpucount> 2 </cpucount>

</resource>  <executable type="single" count="1">

<file name="String" type="in"> <url> gsiftp://rage.man.poznan.pl/~/Apps/MyApp </url>

</file><arguments>

<value> 12 </value> <value> abc </value>

</arguments><stdin>

<url> gsiftp://rage.man.poznan.pl/~/Apps/appstdin.txt </url></stdin><stdout>

<url> gsiftp://rage.man.poznan.pl/~/Apps/appstdout.txt </url></stdout>

</ executable ></task></grmsjob >

Page 45: Grid(Lab) Resource Management System

CGW 2003

Job Description – example 2

< grmsjob appid = MyApplication><task id=task1>

<resource> ...

</resource>  <executable type="single" count="1">

...</ executable >

</task><task id=task2>

<resource> ...

</resource>  <executable type="single" count="1">

...</ executable ><workflow>

<parent>task1</parent></workflow>

</task></grmsjob >

Page 46: Grid(Lab) Resource Management System

CGW 2003

Research focus of GRMS

Focus on the infrastructure is not enough for the efficient GRMFocus on policiesFocus on multicriteria aspects of the GRM

users, their preferences and applicationsresource owners’ preferencespreference models, multicriteria decision making, knowledge will be crucial for efficient resource management

Focus on AI techniques for GRMFocus on business models, economy grids

Cost negotiation mechanisms could be part of the SLA negotiation process (WS-Agreement)

contradictory in nature

Page 47: Grid(Lab) Resource Management System

CGW 2003

GRMS and SLA

Page 48: Grid(Lab) Resource Management System

CGW 2003

GRMS and SLA (cont.)

Page 49: Grid(Lab) Resource Management System

CGW 2003

End-users (consumers)having requirements concerning their applications (e.g. expect a good performance of their applications, expect a good response time) have requirements concerning resources (e.g. prefer machines with a big storage, machines with certain configurations)

Resource Administrators and Owners (providers)share resources to achieve some benefits

VO Administrator (criteria and preferences must be secure)

requires robust and reliable provisioning of resourcesmanages and controls VO by making global policies (community policies)

STAKEHOLDERS OF THE GRID RESOURCE MANAGEMENT PROCESS

Page 50: Grid(Lab) Resource Management System

CGW 2003

Multicriteria RM in GridLab

Gathering of informationapps requirements (resource requirements, environment, etc.)

user preferences (which criteria and how important)user support, preference modeling tools,

Selection phasechoose the best resources (schedule) based on the information provided and on the resource availability (estimates, predictions)

from simple matchmaking to multiple optimisation techniques

Execution phasefile staging, execution control, job monitoring, migration, usually re-selection of resources, application adaptation (application managers, adaptive services from GridLab)

Page 51: Grid(Lab) Resource Management System

CGW 2003

Many different research fields, e.g. Multicriteria Optimization, Project Scheduling, Artificial Intelligence, Decision Support SystemsWe consider a resource management problem as a multicriteria decision making process with various stakeholdersVarious stakeholders have different point of views and criteria (along with preferences)We need to aggregate somehow (negotiation and agreement processes are required) various criteria and stakeholders’ preferencesWe focus on a compromise solution rather then the optimal one (does it exist?)We want to satisfy many stakeholders rather than the particular one

Multicriteria approach in GRMS

Page 52: Grid(Lab) Resource Management System

CGW 2003

Memory

Storage

MULTICRITERIA (1)

Memory

Storage

End user 1 End user 2

Application 1(e.g. Data analysis )

Application 2(e.g. Data mining)

R1

R2

R3R4

R1

R2R3

R4

Hard constraints (e.g. RSL)<Mem = 100MB>, <Storage = 1G>

??? ???

Page 53: Grid(Lab) Resource Management System

CGW 2003

Memory

Storage

MULTICRITERIA (2)

MAX Z = 1*Mem + 2*Storage(where z is the objective function)

Memory

Storage

MAX Z = 2*Mem + 1*Storage(where z is the objective function)

R1

R2

R3R4

R1

End user 1 End user 2

Application 1(e.g. Data analysis )

Application 2(e.g. Data mining)

R2R3

R4

Page 54: Grid(Lab) Resource Management System

CGW 2003

We have added only two parameters to hard constraints: priority and min/max (optimization direction)

NEW : <Mem = 100MB><Priority = 2><Opt = Max> <Storage = 1G><Priority = 1> <Opt = Max>

End users are able to express their preferences in a very simple way

End users’ preferences are taken into account (compromise solutions are chosen)

PROPOSALS

Page 55: Grid(Lab) Resource Management System

CGW 2003

Concerning particular resources (e.g. memory, flops) or schedules (e.g. estimated processing time, maximum lateness)Specific for end-users (e.g. mean response time, mean tardiness, cost of computations), resource owners (e.g. machine idleness) and administrators (e.g. throughput, makespan)Time criteria (e.g. mean response time, makespan, mean tardiness), cost criteria (e.g. weighted resource consumption, cost of computations) and resource utilization criteria (e.g. load balancing, machine idleness)But... (in practice :-(,

- Lack or limited set of low level mechanisms which support e.g. advanced reservation,

- Negotiation protocols and agreements- Prediction tools which provide advanced analysis and estimations (e.g.

execution time, queue wait time),- Reliable information sources which describe behaviors of applications and

resources, etc.

MULTIPLE CRITERIA

Page 56: Grid(Lab) Resource Management System

CGW 2003

As a function: by means of parameters used in an utility function in order to model the importance of criteria expressed using the resource specification language (e.g. JSDL) or gathered from stakeholders during an interactive process (e.g. WS-Agreement)

As a relation:input parameters such as weights and thresholds are provided by stakeholders

As logic statements (decision rules):- if conditions on criteria then decision, - future directions we need to follow (machine learning methods, learning from examples and previous actions)...

More details in the Kluwer’s book...

PREFERENCE GATHERING

Page 57: Grid(Lab) Resource Management System

CGW 2003

Our approach extends and is based on many scheduling strategies...

Gathering of informationapplications requirements (resource requirements, environment, etc.)users’ preferences (which criteria and how important)

user support, preference modeling toolsSelection

choice of the best resources or schedules based on the information provided and on the resource availability (estimates, predictions)

from simple matchmaking to multiple optimization techniquesExecution

file staging, execution control, job monitoring, migration, usually re-selection of resources, application adaptation

STEPS OF MULTICRITERIA RESOURCE MANAGEMENT

Page 58: Grid(Lab) Resource Management System

CGW 2003

One central scheduler exists for multiple applications (e.g. LSF, PBS, Sun Grid Engine)

The goal is to match a set of application requirements to all available resources on the basis of various criteria

Usually multicriteria techniques are used for the evaluation of resource co-allocations (external mechanisms )

MC in JOB SCHEDULING

Page 59: Grid(Lab) Resource Management System

CGW 2003

• Each application is scheduled by an internal scheduler and forms a self-contained system

• The goal is to match particular application requirements with one (or some) good resource(s) based on various criteria

• Multicriteria techniques are used for the evaluation resources as well as resource co-allocations (internal mechanisms)

MC in APPLICATION LEVEL SCHEDULING

Page 60: Grid(Lab) Resource Management System

CGW 2003

Mathematical models:Assumptions:

R = 7 resourcesNEU = 3 end-users (eu1, eu2,eu3)

tasks t11, t21, t22, t31

resources are maintained by NRO = 3 resource owners (ro1, ro2, ro3),each task has to be assigned to one resourcestakeholders’ preferences are expressed in the form of utility functions

SIMPLE EXAMPLE (THEORY)

More details in the Kluwer’s book...

Page 61: Grid(Lab) Resource Management System

CGW 2003

The average execution time (T) - an average execution time of all tasks submitted by a given end-userThe cost of resource usage (C) - the sum of costs of the resources’ usage by end-user's tasksThe income (I) - the total income for a given resource owner

SELECTED CRITERIA

More details in the Kluwer’s book...

Page 62: Grid(Lab) Resource Management System

CGW 2003

Due to many hard constraints appearing in practice (specified by means of resource description language, policy rules in a scheduler, etc.), the number of feasible solutions can be decreased significantly.

In our case, due to specific requirements of tasks (t11, t21, t22, and t31) the set of available resources is limited to r1, r2, r3 and r4. In addition, the manager's task t11 is assigned automatically to the dedicated resource r3 (because invoking appropriate administration rules) so only r1, r2, and r4 are considered.

Once all hard constraints are met, we can examine the soft constraints and multicriteria analysis.

HARD CONSTRAINTS

More details in the Kluwer’s book...

Page 63: Grid(Lab) Resource Management System

CGW 2003

Gathering preferences from all stakeholders

• Evaluation of solutions (schedules) for each stakeholder using local utility functions

• Aggregation of local evaluations (in the example using global utility function)

• The best compromise solution is chosen (in the example with the best value of utility function)

SOFT CONSTRAINTS AND MULTICRITERIA ANALYSIS

More details in the Kluwer’s book...

Page 64: Grid(Lab) Resource Management System

CGW 2003

GRID SERVICES AND OGSI-AGREEMENT

OGSA defines Grid as a service-oriented environment

OGSA model represents all entities of Grid as a Grid service (resources, brokers, applications)

Interoperability between high-level community schedulers and local resource managers is supported,

There must be agreement on how these entities will interact with each other (e.g. many different resource managers could provide different functionalities)

Page 65: Grid(Lab) Resource Management System

CGW 2003

The goal is to negotiate the best contracts for a set of applications based on various additional criteria e.g.

level of service capability, quality, deployment and dynamic reconfiguration

level of commitment, service agreement and satisfaction

Multicriteria techniques are used for the evaluation of resources/services and brokering negotiations

MULTICRITERIA AND WS-AGREEMENT

End Users

VO Administrator

ResourceOwners

Page 66: Grid(Lab) Resource Management System

CGW 2003

In order to implement multicriteria scenarios (simple) we have to add some new tags (describing various criteria and preferences) to the existing job specifications: JSDL example:

 <jsdl:nameOfTerm wsp:Usage="wsp:Required">  <jsdl:criterion jsdl:type=type>

<jsdl:preference=value/> {or jsdl:preferences = a>b, b<d, c>d, a=c}</jsdl:criterion>

</jsdl:nameOfTerm>

where:jsdl:type is one of two types: jsdl:minimization or jsdl:maximizationjsdl:priority expresses an end user’s preference More complicated and advanced MC scenario could be implemented, new tags and

mechanisms are required)

SIMPLE EXAMPLE (MC in JSDL)

Page 67: Grid(Lab) Resource Management System

CGW 2003

Use of prediction techniques to support ‘high level criteria’:

Prediction of:a state of resourcesapplication requirements (e.g. memory, storage space)application performance (e.g. execution time)

Dealing with prediction errors (e.g. AI methods)

Active participation in GRAAP and JSDL Working Groups under GGFImplement our new ideas in existing resource management systems (e.g. GridLab – GRMS, Progress, SGI Grid, Clusterix)

FUTURE RESEARCH WORK

Page 68: Grid(Lab) Resource Management System

CGW 2003

Summary

GridLab’s RM is focused on the multicriteria aspects of resource management (local vs. global policies, user vs. resource owner etc.)

GRMS is currently deployed on the GridLab testbed

It supports the migration because of the bad performance scenario

new emerging scenarios we have in mind

Other deployments include: SGI Grid, Progress, HPC Europa, Clusterix (future), GriPhyN (workflow execution on Globus)

more info: www.gridlab.org -> WP9

Page 69: Grid(Lab) Resource Management System

CGW 2003

The GRM book

Published in October 2003 by Kluwer

www.kap.nl

Page 70: Grid(Lab) Resource Management System

CGW 2003

Thank you!