resource management system for distributed environment b4. nguyen tuan duc

Resource management system for distributed

environmentB4. Nguyen Tuan Duc

Background

Emerging need for resource management system of clusters / grids

Several systems exist, but have problems… Portable Batch System Sun Grid Engine ….

Goal

Flexible resource management system Support clusters, grids Fair-share scheduling Maximize utilization of resources Support parallel applications Reduce load aggregation

Agenda

Background Goal Related works Proposal method Problems

Related works

Portable Batch System (MRJ 1990s) Batch queuing system Automatic load-balancing Parallel jobs support Job accounting

Portable Batch System (PBS)

Sun Grid Engine

Batch queuing system by Sun Microsystems Same features with PBS, and Job checkpoint Several add-ons

Problems of batch queuing systems Resource utilization Load aggregation

Server accept too many requests from clients Limit of execution model

Cannot fork, since process created with fork() does not go into the queue

…

Saito Dai’s system (STDS)

Flexible Resource Management System for Widely Distributed Environment (2006) No load aggregation Job scheduling on each node Independent from execution model (fork, … OK) Support parallel jobs

STDS structure

Two main components Node searching system (graph searching) Scheduler (on each node)

Scheduler Daemon on each node CPU fair-sharing by ‘nice’

Node searching system Create graph from links Node search graph search

STD node searching system

Our approach

Similar to STD system Node searching system Scheduler on each node

But different in … Node search: no graph searching Scheduler: kernel scheduler with user accounting

(budget scheduler)

Scheduler: Budget scheduling Budget scheduling Normal queue & budget queue Normal queue for interactive processes

Linux 2.6 default scheduler Budget queue for CPU-hogging processes

Automatic detecting of CPU-intensive process http://www.logos.ic.i.u-tokyo.ac.jp/~duc/pre/1

107.ppt

Node searching system

Client-server model Daemon on each node Daemon reports CPU state (process number,

CPU utilization, …) directly to user Reports maximum price

From where user can submit jobs? From every where on the cluster, grids From their desktop, via the Internet

Need of a job submitting system

Node searching system (NSS)

User

Who will determine nodes?

User! Users choose nodes appropriated to their

jobs Parallel jobs: idle CPUs or CPUs with low-price

jobs Long-last jobs: idle CPU, set low-price

Node searching system (NSS) NSS should report to users:

CPU utilization Maximum price Load (process number, ..) …

Daemon on each node sends information about the node to client.

Client is on user’s machine No heavy load aggregation

Problems!!!

May be heavy load on user client NAT, Firewall

How client can connect to server?? Information need?

Only CPU utilization, maximum price, load, average-price?

resource management system for distributed environment b4. nguyen tuan duc

Documents