improving quality of service in grids through meta

Improving Quality of Service in Grids

Through Meta-Scheduling in Advance

A DISSERTATION FOR THE DEGREE OF DOCTOR OF PHILOSOPHY IN COMPUTER

SCIENCE TO BE PRESENTED WITH DUE PERMISSION OF THE COMPUTING SYSTEMS

DEPARTMENT FROM THE UNIVERSITY OF CASTILLA–LA MANCHA, FOR PUBLIC

EXAMINATION AND DEBATE

Author: Luis Tomás Bolívar

Advisors: Dr. María del Carmen Carrión Espinosa

Dr. María Blanca Caminero Herráez

Albacete, December 2011

Mejora de la Calidad de Servicio en Grids

mediante Meta-Planificación por

Adelantado

TESIS DOCTORAL

PRESENTADA AL DEPARTAMENTO DE SISTEMAS INFORMÁTICOS

DE LA UNIVERSIDAD DE CASTILLA-LA MANCHA

PARA LA OBTENCIÓN DEL TÍTULO DE

DOCTOR EN INFORMÁTICA

Autor: Luis Tomás Bolívar

Tutores: Dra. Dña. María del Carmen Carrión Espinosa

Dra. Dña. María Blanca Caminero Herráez

Albacete, Diciembre de 2011

Agradecimientos

A pesar de que estas cosas no se me dan bien, me gustaría agradecer en estas

líneas a todo aquel que ha aportado su granito de arena para que la realización

de esta Tesis sea posible.

En primer lugar me gustaría agradecer a Carmen y Blanca por todo su apoyo,

dedicación y tiempo invertido en este trabajo. Sin ellas no hubiera sido posible

la realización de esta Tesis, ya que fueron ellas las que me convencieron para

coger este camino y las que se han preocupado porque éste estuviera “allanado”.

También quisiera agradecerle a Agustín su ayuda durante todo el proceso de

elaboración de la Tesis, sobre todo en mis inicios.

Por otro lado, me gustaría hacer especial mención a mi familia, mi padre, mi

madre, mi hermana y Ainhoa. Todos ellos han contribuido a formarme, no sólo

como futuro doctor, sino como persona. Una vez más, sin ellos no sería lo que

soy, ni habría llegado hasta aquí. Mi mayor logro es teneros a mi lado. Gracias

por el apoyo incondicional que siempre me habéis dado.

También quiero aprovechar estas líneas para acordarme de todas aquellas

nuevas amistades hechas durante mi estancia en Umeå. A todos ellos me gus-

taría agradecerles su hospitalidad y ayuda. De todos ellos guardo un gran re-

cuerdo, ya que me hicieron sentir como en casa.

Finalmente, me gustaría agradecer también a todos mis colegas, amigos y

compañeros de trabajo, por todos esos ratos de risas y distracción que son más

que necesarios para desconectar y cargar las pilas.

Luis Tomás Bolívar

Diciembre de 2011, Albacete, España.

i

Abstract

Grids allow the coordinated use of heterogeneous computing resources within

large-scale parallel applications in science, engineering and commerce [1]. Since

organizations sharing their resources in such a context still keep their indepen-

dence and autonomy [2], Grids are highly variable systems in which resources

may join/leave the system at any time. This variability makes Quality of Ser-

vice (QoS) highly desirable, though often very difficult to achieve in practice.

One reason for this limitation is the lack of a central entity that orchestrates the

entire system. This is especially true in the case of the network that connects

the various components of a Grid system. Thus, without resource reservations

any guarantee on QoS is often hard to achieve. However, in a real Grid system,

reservations may not be always feasible, as not all the Local Resource Manage-

ment System (LRMS) permit them. There are also other types of resources, such

as network, which may lack a global management entity, thereby making their

reservation impossible. Because of that, the provision of QoS in Grid environ-

ments is still an open issue that needs attention from the research community.

One way of contributing to the provision of QoS in Grids is by performing

meta-scheduling of jobs in advance, which means that jobs are scheduled some

time before they are actually executed. In this way, it becomes more likely that

the appropriate resources are available to run the job when needed, so that QoS

requirements of jobs are met (i.e. jobs are finished within a deadline).

The main aim of this Thesis is to develop a system capable of managing QoS in

real Grid environments. To this end, this Thesis investigates the QoS provision

to Grid users by means of efficient meta-scheduling. New scheduling metrics

have been added to a current Grid meta-scheduler and a Grid meta-scheduling

layer, named Scheduling in Advance Layer (SA-Layer), has been developed. The

iii

SA-Layer provides the meta-scheduling in advance functionality in a real Grid

environment, making possible to deal with users’ QoS requirements.

This software has been developed in an incremental way. Thus, different

modules have been added and modified to increase and improve the functional-

ity of this layer. Initially, the implementation of red-black trees as an efficient

data structure to manage resource usage information was tackled. Then, dif-

ferent prediction techniques were developed to make a proper scheduling by

performing accurate enough estimations about status of resources and job du-

rations on them. After that, two rescheduling techniques (preventive and reac-

tive techniques) were implemented to deal with low resource usage due to the

fragmentation generated at the allocation process and owing to unfavorable pre-

vious decisions. Finally, as a result of the research stay made, the integration

of the SA-Layer and a job prioritization system, named FSGrid, is addressed.

This system provides SA-Layer with the information needed to take into account

different usage policies for users, projects and virtual organizations. In this way,

different QoS levels may be provided depending on the established policies.

To sum up, this Thesis has the following contributions:

• Use network resources as a first level resource together with computing

resources to schedule jobs. Consequently, network information has been

included into a current meta-scheduler in order to make it network–aware

when scheduling jobs.

• Use of autonomic computing ideas to perform efficient meta-scheduling of

jobs to computing resources.

• Develop an architecture that performs scheduling of jobs in advance by

making predictions about future status of resources (network inclusive)

and about real durations of jobs in them.

• Use of techniques to cope with low resource usage issues due to the result-

ing fragmentation generated at the allocation process.

• Include collaboration with another system to manage different QoS levels to

the different users and virtual organizations depending on the established

policy.

Those proposals, as well as their evaluation into a real Grid environment

based on the Globus Toolkit and the GridWay meta-scheduler, have been devel-

iv

oped and their performance has been evaluated. To this end, a testbed where

experiments are carried out has been built up by using non-dedicated machines

across several administrative domains.

v

Resumen

Los Grids permiten el uso coordinado, compartido y a gran escala de recur-

sos heterogéneos en aplicaciones paralelas en ciencia, ingeniería y comercio [1].

Sin embargo, las organizaciones que comparten dichos recursos siguen man-

teniendo su independencia y autonomía [2], lo que hace que estos sistemas

sean altamente variables, pudiendo los recursos entrar y salir del sistema en

cualquier momento. Dicha variabilidad hace muy difícil la provisión de calidad

de servicio (QoS) en la práctica. Una razón para esta limitación podría fun-

damentarse en la falta de una entidad central que gestione el sistema. Esto

es especialmente remarcable en el caso de la red que interconecta los diversos

componentes presentes en un entorno Grid. Por otro lado, la reserva de recursos

en un entorno Grid no siempre es posible ya que no todos los recursos propor-

cionan esta funcionalidad o no se dispone de derechos para ello. Además, hay

otros tipos de recursos que pueden no estar gestionados por una sola entidad,

como la red de interconexión, con su consiguiente dificultad para realizar una

reserva. Todo esto dificulta aún más las garantías de QoS. Es por eso que dicha

provisión de QoS en entornos Grid todavía necesita ser un punto a estudiar y

solucionar por la comunidad científica.

Una posible solución para la provisión de QoS en entornos Grid puede ser la

realización de una planificación de trabajos por adelantado. Esto significa que

un trabajo es planificado con antelación al tiempo en el cual será ejecutado. De

esta forma es más probable que el recurso apropiado para ejecutar la aplicación

esté disponible cuando se le necesite y por consiguiente la QoS requerida por el

trabajo sea cumplida (el trabajo finaliza antes que el deadline establecido por el

usuario).

El objetivo principal de esta Tesis es el desarrollo de un sistema que sea capaz

de proporcionar cierta QoS en entornos Grid reales mediante una planificación

vii

eficiente. Para ello se han incluido nuevas métricas en un meta-planificador

existente y se ha desarrollado una capa de meta-planificación sobre éste, lla-

mada SA-Layer. Dicha capa es la encargada de proporcionar la funcionalidad de

meta-planificación por adelantado, haciendo posible el manejo de los diferentes

niveles de QoS solicitados por los usuarios.

SA-Layer ha sido implementado de forma incremental, de forma que se le

han ido añadiendo y modificando diferentes módulos que incrementan y mejo-

ran su funcionalidad. Primero se ha implementado el uso de árboles rojo–negro

como estructura de datos a usar para gestionar la información sobre el uso de

los recursos de una forma eficiente. Después se han incluido diferentes técnicas

de predicción, cuyo objetivo es realizar predicciones lo suficientemente precisas

sobre el estado de los recursos y las duraciones de los trabajos en ellos. Una

vez que el proceso de planificación es lo suficientemente preciso, dos técnicas de

replanificación de tareas ya planificadas (una reactiva y otra preventiva) se han

implementado para solucionar los problemas relacionados con la fragmentación

generada en el proceso de planificación que pueden llevar a un uso de los re-

cursos bajo. Finalmente, y como resultado de la estancia realizada, se ha inte-

grado dicho sistema con otro de prioritización de trabajos, llamado FSGrid. Este

sistema proporciona a SA-Layer la información necesaria para poder tener en

cuenta las diferentes políticas de uso establecidas para los usuarios, proyectos

y organizaciones virtuales.

Más específicamente, esta Tesis presenta las siguientes contribuciones:

• Inclusión de información de red en el meta-planificador GridWay para que

éste la tenga en cuenta a la hora de elegir el recurso en el que ejecutar cada

trabajo.

• Uso de ideas sobre computación autónoma para realizar una planificación

de tareas más eficiente.

• Desarrollo de una arquitectura para la planificación de trabajos por ade-

lantado que lleva a cabo predicciones sobre el estado de los recursos (red

incluida) y sobre la duración de los trabajos en ellos.

• Implementación de técnicas para aliviar los problemas de fragmentación

durante el proceso de planificación.

viii

• Colaboración con otro sistema para gestionar diferentes QoS teniendo en

cuenta los diferentes usuarios, proyectos y organizaciones virtuales aten-

diendo a las políticas establecidas.

En esta Tesis se presentan todas estas propuestas, así como su evaluación

en un entorno Grid real basado en el Toolkit de Globus y en el meta-planificador

GridWay. Dicho entorno Grid esta formado por máquinas no dedicadas pertene-

cientes a diferentes dominios administrativos.

ix

Contents

1 Introduction 1

1.1 Grid computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 Grid Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.1.2 Middleware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.2 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.3 Objectives of this Thesis . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.4 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.4.1 The Globus Toolkit . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.4.2 The GridWay Meta-scheduler . . . . . . . . . . . . . . . . . . . 14

1.4.3 Extensions to GridWay . . . . . . . . . . . . . . . . . . . . . . 16

1.4.4 Integration with other systems . . . . . . . . . . . . . . . . . . 17

1.5 Structure of this Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2 QoS Provision and Meta-Scheduling in Grids 21

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.2 Models for addressing Quality of Service in Grids . . . . . . . . . . . 22

2.2.1 Best Effort Model . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.2.2 QoS Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.2.3 Economic Model . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.3 Proposals for Addressing Quality of Service in Grids . . . . . . . . . 24

2.3.1 Scheduling Techniques . . . . . . . . . . . . . . . . . . . . . . 25

2.3.2 Advance Reservation . . . . . . . . . . . . . . . . . . . . . . . . 26

2.3.3 Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.3.4 Prediction Techniques . . . . . . . . . . . . . . . . . . . . . . . 29

2.3.5 Fragmentation Problems . . . . . . . . . . . . . . . . . . . . . 30

2.3.6 Co-allocation and Rescheduling Techniques . . . . . . . . . . 31

xi

2.3.7 Autonomic Computing . . . . . . . . . . . . . . . . . . . . . . . 32

2.3.8 Service Level Agreements . . . . . . . . . . . . . . . . . . . . . 33

2.3.9 Fairness Resource Usage . . . . . . . . . . . . . . . . . . . . . 35

2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3 Including Metrics to Improve QoS at the Meta-Scheduling Level 39

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.2 Autonomic Network–aware Meta-scheduling (ANM) . . . . . . . . . . 41

3.3 Implementation of ANM . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.3.1 Extending GridWay to be network–aware . . . . . . . . . . . . 45

3.3.2 Autonomic scheduler . . . . . . . . . . . . . . . . . . . . . . . 46

3.3.3 Predicting and tuning resource performance . . . . . . . . . . 50

3.4 Experiments and results . . . . . . . . . . . . . . . . . . . . . . . . . 53

3.4.1 Experiment Testbed . . . . . . . . . . . . . . . . . . . . . . . . 53

3.4.2 Workload . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3.4.3 Performance evaluation . . . . . . . . . . . . . . . . . . . . . . 56

3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

4 Adding Support for Meta-Scheduling in Advance: The SA-Layer 65

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

4.2 Network–aware meta-scheduling in advance . . . . . . . . . . . . . . 68

4.3 Meta-Scheduling in advance implementation . . . . . . . . . . . . . 73

4.3.1 Gap Management . . . . . . . . . . . . . . . . . . . . . . . . . 75

4.3.2 Data Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

4.3.3 Job Migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

4.3.4 Predictor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

4.4 Prediction Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

4.4.1 TCT Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

4.4.2 ETTS Technique . . . . . . . . . . . . . . . . . . . . . . . . . . 83

4.4.3 RT Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

4.4.4 ExS Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

4.4.5 Exponential smoothing predictions . . . . . . . . . . . . . . . 87

4.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

4.5.1 Testbed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

4.5.2 Workload . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

4.5.3 Experiments and results . . . . . . . . . . . . . . . . . . . . . 93

xii

4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

5 Optimizing Resource Utilization through Rescheduling Techniques 99

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

5.2 Scheduling Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

5.3 Tackling fragmentation . . . . . . . . . . . . . . . . . . . . . . . . . . 102

5.3.1 Reactive techniques: Replanning Capacity (RC) . . . . . . . . 105

5.3.2 Preventive techniques: Bag of Task Rescheduling (BoT-R) . . 107

5.4 Fragmentation metrics . . . . . . . . . . . . . . . . . . . . . . . . . . 112

5.4.1 Trigger Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

5.4.2 Filter Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

5.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

5.5.1 Testbed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

5.5.2 Workload . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

5.5.3 Rescheduling techniques . . . . . . . . . . . . . . . . . . . . . 119

5.5.4 Fragmentation metrics . . . . . . . . . . . . . . . . . . . . . . 123

5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

6 Improving Grid QoS by means of Adaptable Fair Share Scheduling 131

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

6.2 Improving end-user QoS: Sample Scenario . . . . . . . . . . . . . . . 133

6.3 FSGrid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

6.4 Integrated Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 141

6.5 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 142

6.5.1 Testbed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

6.5.2 Workload . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

6.5.3 FSGrid Convergence Rate . . . . . . . . . . . . . . . . . . . . . 143

6.5.4 Quality of Service . . . . . . . . . . . . . . . . . . . . . . . . . 145

6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

7 Conclusions, Contributions and Future Work 149

7.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

7.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

7.3 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

7.3.1 Journal papers . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

7.3.2 International conference papers . . . . . . . . . . . . . . . . . 154

xiii

7.3.3 National conference papers . . . . . . . . . . . . . . . . . . . . 157

7.3.4 Technical reports . . . . . . . . . . . . . . . . . . . . . . . . . . 158

7.3.5 Submitted works . . . . . . . . . . . . . . . . . . . . . . . . . . 158

7.3.6 Related contributions . . . . . . . . . . . . . . . . . . . . . . . 159

7.3.7 Additional contributions . . . . . . . . . . . . . . . . . . . . . 161

7.4 Funds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

7.4.1 National projects . . . . . . . . . . . . . . . . . . . . . . . . . . 162

7.4.2 Regional projects . . . . . . . . . . . . . . . . . . . . . . . . . . 163

7.5 Collaborations with other research groups . . . . . . . . . . . . . . . 164

7.6 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

A Acronyms 167

Bibliography 171

xiv

List of Figures

1.1 Grid layers model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2 Workflow sample. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3 Core Services of Globus. . . . . . . . . . . . . . . . . . . . . . . . . . 14

1.4 GridWay usage model. . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.1 Example scenario. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.2 Conceptual view of the extensions introduced to GridWay . . . . . . 46

3.3 Autonomic control loop for adapting the TOLERANCE parameter.

The “X" in tX_real and tX_estimate refers to the set {net, cpu} . . . . . . 51

3.4 Grid testbed topology. . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

3.5 Visualization Pipeline (VP) test. . . . . . . . . . . . . . . . . . . . . . 56

3.6 3node test Average Time. . . . . . . . . . . . . . . . . . . . . . . . . . 58

3.7 3node test. QoS not fulfilled. . . . . . . . . . . . . . . . . . . . . . . . 60

3.8 VP Test Average Completion Time. . . . . . . . . . . . . . . . . . . . 61

3.9 Resource Usage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

4.1 Meta-Scheduling in Advance Process . . . . . . . . . . . . . . . . . . 70

4.2 Scheduling Order. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

4.3 The Scheduler in Advance Layer (SA-Layer). . . . . . . . . . . . . . . 74

4.4 Idle periods regions [3]. . . . . . . . . . . . . . . . . . . . . . . . . . . 76

4.5 Example of a red–black tree. . . . . . . . . . . . . . . . . . . . . . . . 78

4.6 Workload characteristic. . . . . . . . . . . . . . . . . . . . . . . . . . 92

4.7 Comparison of the different estimation techniques from the Users’

viewpoint. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

4.8 QoS not fulfilled per accepted Job. . . . . . . . . . . . . . . . . . . . 95

4.9 Comparison of the different estimation techniques from the Sys-

tem’s viewpoint. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

xv

5.1 The Scheduler in Advance Layer (SA-Layer). . . . . . . . . . . . . . . 104


5.3 Comparison between the scheduling techniques for Workload 1. . . 120

5.4 Comparison between the scheduling techniques for Workload 2. . . 121

5.5 Percentage of Rejected Jobs for Workload 3. . . . . . . . . . . . . . . 124

5.6 Relationship among checked, submitted and canceled rescheduling

for Workload 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

5.7 Resources Usage without fragmentation metrics for Workload 3. . . 126

5.8 Resources Usage when using fragmentation metrics (BoT) for Work-

load 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

6.1 Scheduling Process using SA-Layer. . . . . . . . . . . . . . . . . . . . 134

6.2 Meta-Scheduling in Advance Process. . . . . . . . . . . . . . . . . . . 135

6.3 Scheduling Process with SA-Layer and FSGrid integrated. . . . . . 137

6.4 FSGrid Architecture [4]. . . . . . . . . . . . . . . . . . . . . . . . . . . 138

6.5 A FSGrid policy tree. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

6.6 SA-Layer and FSGrid systems integrated. . . . . . . . . . . . . . . . 142


6.8 FSGrid convergence rates for an isolated policy tree subgroup. . . . 144

6.9 Failure rates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

xvi

List of Tables

3.1 Characteristics of the resources . . . . . . . . . . . . . . . . . . . . . 55

3.2 Percentage of resource usage by 3node tests. . . . . . . . . . . . . . 59

3.3 Percentage of improvement by using the autonomic implementation

with ExS (ANM-ExS). . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5.1 Combination of Fragmentation Metrics. . . . . . . . . . . . . . . . . 123

xvii

List of Algorithms

1 Resource selection algorithm used in GridWay . . . . . . . . . . . . 45

2 CAC algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3 Scheduling algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4 Estimation of Execution Time (tricpu_estimated(j)) . . . . . . . . . . . . . 53

5 Total Completion Time (TCT) . . . . . . . . . . . . . . . . . . . . . . . 83

6 Execution and Transfer Times Separately (ETTS) . . . . . . . . . . . 84

7 Estimation of execution time (ExecT_Estimation) . . . . . . . . . . . 84

8 Estimation of transfer time (TransT_Estimation) . . . . . . . . . . . 84

9 ETTS extended with Resource Trust (RT) . . . . . . . . . . . . . . . . 86

10 Estimation of Execution Time (ExS Estimation) . . . . . . . . . . . . 87

11 Replanning Capacity Algorithm . . . . . . . . . . . . . . . . . . . . . 106

12 BoT-R Trigger executed every L period . . . . . . . . . . . . . . . . . 109

13 BoT-R Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

xix

CHAPTER

1Introduction

This chapter introduces the topic of interest of this Thesis, the management of

Quality of Service (QoS) in Grids. It presents the objectives to be accomplished,

the methodology used, and finally, the structure of the Thesis.

1.1 Grid computing

The term Grid emerged to denote a highly distributed computing infrastructure

for advance science and engineering [5] which aims at providing computational

power in the same way as power grid does. Consequently, computational Grids

try to provide users with reliable, pervasive and low-cost access to computational

power.

The main difference between Grid computing and the previous distributed

computing techniques is that Grid computing is focused on the resource sharing

across different organizations (worldwide). The resources that can be integrated

within a Grid are, for example, clusters, storage, or networks, but also other

engineering equipment such as telescopes. Consequently, these resources are

heterogeneous, dispersed and may join or leave the Grid dynamically at any

moment [2]. This heterogeneity makes the Grid able to handle a large variety of

different applications.

1

2 Chapter 1. Introduction

The first research motivation was to provide the technologies needed to en-

able the resource sharing since their development was driven by the need of the

scientific community to collaborate over the network. They needed sharing re-

sources such as large data sets, computational resources, software or scientific

instrumental devices (e.g., telescopes). Since then, a wide variety of scientific

applications were developed to leverage the aggregate capability of resources,

and have rapidly increased in complexity and size. These applications are char-

acterized by high needs of data sharing and/or computing power. Due to this

fact, Grid computing emerged as the next generation of parallel and distributed

computing methodology. A well-known example of such applications is the Grid-

based worldwide data-processing infrastructure deployed for the Large Hadron

Collider (LHC) experiments at CERN [6]. This experiment is generating around

10 PB of date per year which have to be processed (some of them several times)

and stored in different data centers around the world.

Thanks to this adoption by the scientific community, Grids have become an

essential infrastructure for resource–intensive scientific and commercial appli-

cations [2] [7] as they enable the sharing and dynamic allocation of distributed

and high-performance computational resources. The associated ownership and

operating costs are reduced and hence flexibility and collaboration among di-

verse organizations are promoted.

In a formal definition, we could say that a Grid is a flexible, secure and coor-

dinated way of sharing resources amongst different organizations, institutions

and companies, building which is known as Virtual Organization (VO). Here is

where the specific and real problem of the Grid concept lies, the coordinated

sharing of resources across dynamic and multi–institutional VOs [5] when prob-

lems resolution.

A Grid environment is created to control the needed resources. The usage of

these resources (CPU cycles, data, software programs, . . . ) is usually character-

ized by its availability outside of its local administrative domain. This means the

creation of a new administrative domain, with different policies, which build a

VO. The efficient usage of these resources is the main aim of the Grid technol-

ogy. However, they are scattered and heterogeneous, and have to be operated

together as a system. Apart from that, they need to be available the most time

and giving the high level performance that the applications need.

1.1. Grid computing 3

Nowadays, apart from making possible the usage of heterogeneous resources

dispersed worldwide, the Grid must provide some Quality of Service (QoS) as Grid

users may require it. This QoS could be measured in terms of response time,

number of jobs finished per unit time [8], jobs finished within its deadlines or

fairshare resources usage. However, the variability of Grid systems makes QoS

highly desirable, though very complex to achieve due to the large scale of inter-

connected networks. Providing end–to–end QoS is really difficult without making

resources reservations. However, these reservations are not always possible into

a Grid environment since not all the resources have this functionality or permit

them. Even there are other types of resources, such as networks, which may be

scattered across several administrative domains, making their reservation (e.g.,

regarding bandwidth capability) not feasible.

1.1.1 Grid Architecture

The architecture of Grid systems [5] is defined with the aim of providing a set

of entities and a nomenclature which may be useful to properly define each ele-

ment in the system, clarifying the functionality of each one and the relationships

amongst them. With the objective of defining a comprehensible and coherent ar-

chitecture it is firstly needed to identify the services that every Grid system will

need as well as their main properties and characteristics. Furthermore, the

protocols that a Grid needs to make possible the communication among the

different elements must be taken into account.

Interoperability [5] is a cornerstone in Grid environments. This means hav-

ing common protocols being the Grid architecture mainly a protocol architec-

ture. They define the basic mechanisms used by the VO users to deal with the

resources. So, the architecture must also be defined by using the maximum

standardization possible.

On the other hand, an open architecture based on standards facilitates the

interoperability, portability and code sharing. Thus, standard protocols make

easier to define standard services which provide better capabilities. For instance,

Application Programming Interfaces (APIs) and Software Development Kits (SDKs)

can be implemented to provide the programming abstractions needed to build

an usable Grid. This technologies and architectures are called “middlewares”.


Figure 1.1. Grid layers model.

Hence, the global architecture of the system may be split into different pieces

depending on the different levels where each component works. This leads to

a layer architecture model, as Figure 1.1 depicts, which may be compared with

the Open System Interconnection (OSI) model. The main details of each layer are

explained next.

Fabric

In the lowest level there are services regarding local resources control. This layer

is called Fabric or infrastructure layer and may be related with the data link

layer of the OSI model. This layer provides shared access to resources through

Grid protocols. For instance, computational resources, data centers, network

resources, sensors, laboratory instruments, etc. A resource may also be a logic

entity, like a distributed file system.

Connectivity

The next level is the Connectivity layer, whose main aim is providing the meth-

ods and communication protocols among resources of the previous layer. Thus,

this layer defines core communication and authentication protocols required for

Grid-specific network transactions. Communication protocols enable the ex-

change of data among Fabric layer resources. Authentication protocols are built

on communication services to provide cryptographically secure mechanisms to

verify the identity of users and resources.


Resource

On top of the Connectivity layer, communication and authentication protocols

are built by the Resource layer with the aim of defining protocols (and APIs and

SDKs) for the secure negotiation, initiation, monitoring, control, accounting, and

payment of sharing operations on individual resources. This way, the Resource

layer calls functions of the Fabric layer to access and control local resources.

It must be noted that the Resource layer protocols are concerned entirely with

individual resources and hence ignore issues of global state and atomic actions

across distributed collections. In this layer is where low level middleware works

as, for instance, Globus [9]. The next layer is developed to address the issues

related to manage resources as a group.

Collective

The last layer is called Collective or Resources layer and it is focused on the

coordination of the multiple accessible resources from the lower layer instead of

on interactions with a single resource. In this way, the Collective layer contains

the protocols and services (APIs and SDKs) that are not associated with any

specific resource but rather are global in nature and capture interactions across

collections of resources. High level middleware works in this layer, such as

meta-schedulers like GridWay [10].

Applications

On the other hand, another important aspect when working in Grid environ-

ments are the applications itself and their classification. For instance, Buyya [11]

makes an exhaustive classification of jobs by splitting it into simple jobs, and

workflows. The simple jobs means the ones which just make a simple task. The

workflows are made by many dependent or independent jobs. The independent

multijobs are those which are made of several simple jobs that may be processed

in parallel. Finally, the dependent multijobs are made of several jobs where some

of them have dependences on others. For instance, some jobs may need the data

that other jobs have to generate. One example of this kind of task is depicted in

Figure 1.2, where some jobs have to wait until others complete their executions.


Figure 1.2. Workflow sample.

1.1.2 Middleware

Grids are highly variable and heterogeneous systems where searching and using

the resources is a hard task for users. This is because users are in different

domains and under different access policies from the resources where their jobs

are executed. Moreover, in large-scale Grids, with many potentially available

resources, this process is not manually feasible. Hence, the Grid infrastructure

must provide the needed services for automatic resource brokerage which take

care of the resource selection and negotiation process [1]. This infrastructure is

named meta-scheduler [12] and hides this process from the user. So, the user’s

experience of the Grid is determined by the functionality and performance of

these meta-scheduler systems.

A Grid Middleware is a connectivity software which provides a group of ser-

vices that enable the possibility of executing distributed applications over het-

erogeneous platforms. Thus, middlewares avoid the complexity when manag-

ing heterogeneous and distributed resources over different communication net-

works from the user point of view.

An important management function is task scheduling [9], which may be

defined as the process that takes the decisions related to resources of multiple

administrative domains. In the case of Grid systems, task scheduling has two

objectives. One objective is the efficient use of resources, similar with schedulers

found in traditional operating systems. The second objective, not less important,

is related to the VO concept and aims to respond to the requirements stated by


the users in a VO concerning the performance of tasks execution, such as the

response time. This is why the scheduling function has been split into two

components: one which is closer to the resources (the local scheduler), and a

second one (the meta-scheduler) closer to the application.

Scheduling in distributed systems has been significantly improved owing to

the innovations proposed in Grid systems and VO management. They try to

find a suitable computing resource where each job will be executed – this be-

ing called meta-scheduling. Basically, a Grid middleware is the software layer

which provides access to the resources shared amongst the Grid constellation,

converting a radically heterogeneous system into another one virtually homoge-

neous. The main characteristic of Grid meta-schedulers is that they do not own

the resources, and because of that, they cannot directly manage the resources

of a specific site since they do not have control over them.

Grid meta-schedulers have to take decisions based on the actual information

they have or can obtain. They may have information about the job to be exe-

cuted, its characteristics and requirements, as well as some other information

helping the meta-scheduler to take better decisions, such as the users prefer-

ences. On the other hand, it is also needed information regarding the Grid itself:

about the actual status, capacity, . . . . They may help in order to perform an

efficient scheduling. Usually, the meta-schedulers obtain information from the

Grid Information System (GIS) [13], which is in charge of gathering information

about the individual resources.

Three different steps may be identified within the scheduling process [9]:

1. Resource discovery: obtains the list of available resources.

2. Information gathering: obtains information about available resources and

chooses a suitable one (ideally, the best one).

3. Job execution: sends files needed for the execution of the job, executes the

job, cleans up the temporally files and recovery the output files generated.

In a Grid environment, users usually send their jobs through a meta-sche-

duler by providing a job template. From that point, the meta-scheduler is in

charge of interacting with the Grid to make the execution of the job possible.

This means that this entity is the one which selects the resource/s to execute


the application, sends the input data, deals with security issues (authentication

and authorization), monitors the job execution and its possible migrations.

Thus, meta-schedulers are a very important field in the Grid computation, as

they are in charge of providing the QoS to end users. This QoS may be provided

through a good resource selection for each application with the aim of reducing

the time needed to execute the job, improving the throughput, or ensuring their

time constraints. A good scheduling not only makes possible a better QoS from

the users point of view, but also better usage of resources, as well as allowing

the possibility of executing a greater number of jobs. Therefore, a good meta-

scheduler should adjust its scheduling strategies depending on the changing

status of the system and the kind of jobs to be scheduled.

However, scheduling in Grid systems is very complicated. The resource het-

erogeneity, the size and number of tasks, the variety of policies, and the high

number of constraints are some of the main characteristics that contribute to

this complexity. The design of scheduling algorithms for a heterogeneous com-

puting system interconnected with an arbitrary communication network is one

of the actual concerns in distributed system research. A large number of tools

are available for local scheduling: Portable Batch System (PBS) [14], Condor [15],

Sun Grid Engine [16], and Load Sharing Facility (LSF) [17]. These tools are in-

cluded in the category of centralized schedulers. Instead, the meta-schedulers

are the subject of projects under development, like GridWay [10] and Globus

CSF [18]. A problem that must be solved for this type of scheduling is the scal-

ability. This aspect is more important in the context of heterogeneous systems

(that require a simultaneous management of multiple clusters, amongst others)

and of the diversity of middleware tools.

Grid middlewares may be classified depending on the services that they pro-

vide and which layer they work on. Some of the well known middlewares used

at Grid environments are:

• Globus [9]: The Globus R©Toolkit is an open source software toolkit used

for building grids. It is being developed by the Globus Alliance and many

others all over the world. It is the “de facto” standard due to its adoption

by the scientific community.


• Legion [19]: An open specification and prototype for a worldwide virtual

computer. It is a research project homed at the University of Virginia (USA).

• GLite [20]: gLite provides a framework for building grid applications tapping

into the power of distributed computing and storage resources across the

Internet. Born from the collaborative efforts of more than 80 people in 12

different academic and industrial research centers as part of the Enabling

Grids for E-sciencE (EGEE) Project [21].

• Condor-G [22]: Condor-G provides the grid computing community with a

powerful, full-featured task broker. Used as a front-end to a computa-

tional grid, Condor-G can manage thousands of jobs destined to run at

distributed sites. It provides job monitoring, logging, notification, policy

enforcement, fault tolerance, credential management, and it can handle

complex job-interdependencies.

• GridWay [10]: It is a meta-scheduler that enables large-scale, reliable and

efficient sharing of computing resources, within a single organization or

scattered across several administrative domains. Moreover, GridWay sup-

ports most of existing Grid middlewares. More details in 1.4.2.

• Unicore [23]: Uniform Interface to Computing Resources offers a ready-

to-run Grid system including client and server software. UNICORE makes

distributed computing and data resources available in a seamless and se-

cure way in intranets and the internet.

The optimization of the scheduling process for Grid systems tries to provide

better solutions for the selection and allocation of resources to current tasks.

The scheduling optimization is very important because it is a main building

block for making Grids more available to user communities. Moreover, QoS is

a requirement for many Grid applications. The optimization methods for Grid

scheduling are the main subject of this thesis. As the scheduling problem is

NP-Hard [24], approximation algorithms are considered which are expected to

quickly offer a solution, even if it is only near–to–optimal.


1.2 Motivations

There are several research projects whose main aim is to manage QoS in Grid

systems – the main ones are reviewed in Chapter 2. However, most of them have

a common drawback. Usually, they do not take into account the interconnec-

tion network that links the computational resources when taking the scheduling

decisions. They are focused on reservation of resources, such as processing,

storage or even network resources, but not on the mapping of jobs to computing

resources taking into account the network status.

Since they usually make their meta-scheduling decisions focusing just on the

computing power (and utilization) of the available resources, the meta-scheduler

might decide that the most suitable computing resource to run a user’s job is the

most powerful one. However, if its interconnecting network is overloaded, then

other less powerful computing resource with a less loaded network connection

might be more suitable to run that job. In this case, this less powerful comput-

ing resource would be a better choice. Hence, the network is a key requirement

when managing QoS in Grids in the meta-scheduling of jobs to computing re-

sources – the process of finding a suitable computing resource to execute each

job. A non-efficient meta-scheduling can lead to bad performance. Thus, the

aim of this Thesis is to improve the QoS in Grids by means of efficient meta-

scheduling which considers the network as a key parameter.

On the other hand, and even more important than that, physical resource

reservations are not always feasible as not all the resources permit them or they

do not provide this functionality. Apart from that, there are resources which can-

not be reserved as they belong to other administrative domains, or even there are

other types of resources, such as network, which may be scattered across sev-

eral administrative domains, making the reservation of their capabilities (such

as bandwidth) rather difficult, if not impossible.

Hence, the proposed meta-scheduling system is based on meta-scheduling

in advance decision instead of reservations in advance. This means that the

system selects the resource and the time period to execute a job some time

before the job is actually executed, but without making any physical reservation

of the resources. So, the system needs to estimate the future status of the Grid

resources (network inclusive), the duration of jobs into the resources at some

1.3. Objectives of this Thesis 11

point into the future (making different estimations for the network transfers and

the executions itself) and to keep a trace of the previous scheduling decisions in

order not to overlap future executions into resources.

In a nutshell, this Thesis is based on improving QoS through meta-scheduling

in advance of jobs. To do that, all the challenges that appear when performing

this kind of scheduling are addressed.

1.3 Objectives of this Thesis

The main objective of this Thesis is to equip a Grid system with all the functional-

ity needed to deliver QoS to Grid users. To this end, it is important that proposals

are network-aware, since the features and behavior of the underlying intercon-

nection network should be taken into account when making meta-scheduling

decisions. Developing an entity in charge of deal with the user QoS require-

ments is also essential. Hence, this entity manages the time constraints of jobs

and tries to ensure they are going to fulfill them when taking scheduling deci-

sions.

The study has been carried out by using a real Grid tested. An additional

objective was hence setting up the environment and maintaining it. We have

chosen to carry out our research over real Grid environments in order to develop

an open–source middleware for the Grid community (Scheduling in Advance

Layer (SA-Layer), http://www.i3a.uclm.es/raap/gridcloud/SA-Layer). In

this way, the evaluation results consider natural behavior, the heterogeneity and

the dynamism of Grid resources, which, otherwise, would be rather difficult to

emulate.

These global objectives can be divided into the following partial objectives:

• Setting up and maintaining a Grid environment by using Globus and Grid-

Way as low and high level middlewares, respectively.

• Compilation of literature approaches aimed at the provision of QoS in Grids,

as well as identification of their weak points that need to be solved. Gather-

ing some other approaches similar to our proposals or related to the tech-

niques used to overcome the challenges found.

http://www.i3a.uclm.es/raap/gridcloud/SA-Layer


• Modification of an existing meta-scheduler to make it network-aware. Study

of other metrics that could improve the scheduling decisions – the mapping

between jobs and resources.

• Development of meta-scheduling in advance proposals for improving QoS

provision, such as efficient algorithms to select a suitable resource to exe-

cute a job or implementing efficient data structures to store the information

about previous scheduling decisions.

• Development of prediction techniques to estimate the future status of re-

sources and interconnection networks.

• Development of heuristics to estimate the time needed to complete the ex-

ecution of a job in a resource at one specific time in the future.

• Development of techniques to deal with the problems found at the meta-

scheduling in advance process. Mainly job rejections due to fragmentation

and/or unfavorable previous decisions.

• Addition of different levels of QoS depending on specific resource usage

policies for the users, projects and virtual organizations.

1.4 Methodology

Research on the architecture for addressing QoS in Grids has been carried out

on a real Grid environment. To build that Grid environment we have used two

different middlewares which work at different layers. Globus Toolkit 4 (GT4) [9]

has been used as a low level middleware, which is the de facto standard regard-

ing this layer. On top of that, we have installed the GridWay meta-scheduler [10],

which works in the collection layer –high level middleware.

Once the Grid environment was built based on those middlewares, we de-

cided to extend the GridWay meta-scheduler to make it network-aware and with

the aim of improving its functionality and QoS provision. Then, we have im-

plemented another layer (named SA-Layer) on top of GridWay to deal with QoS

issues – job times constraints. This layer provides the functionality needed to

perform scheduling in advance decisions without physical reservations of re-

sources. Finally, we have increased the QoS provided by SA-Layer through the

1.4. Methodology 13

communication with another existing system, named FSGrid [4]. This system

provides SA-Layer with the information needed to manage different levels of QoS

to the different users, depending on the established policy.

Next, a more in depth explanation about the Globus and GridWay middle-

wares is presented as they are the softwares over which this Thesis has been

developed. Also, a brief explanation about the methodology used for developing

these proposals is presented, namely, the addition of new functionality into and

over GridWay and the interconnection with the FSGrid scheduler.

1.4.1 The Globus Toolkit

In 1995, during the conference SuperComputing’95, it was demonstrated that it

is possible to execute several distributed applications from different fields among

17 United Stated centers connected through a high speed network of 155 Mbps.

That experiment was called I-Way and it supposed the starting point of several

projects whose main aim were the distributed computational resources shar-

ing [5]. From this moment, the book “The Grid: Blueprint for a New Computing

Infrastructure”, written by Ian Foster and Carl Kesselman, was the first step to

establish the main ideas of how this new technology should be carried out.

The Globus Toolkit [9] emerged from these ideas. It is an open source project

developed at the Argonne National Laboratory and leaded by Ian Foster with the

collaboration of Carl Kesselman’s group from the University of Southern Califor-

nia. Globus is a basic software to construct a computational Grid. Thanks to

its evolution and adoption by the scientific community, Globus has become the

standard “de facto” in Grid technologies.

This toolkit is basically a group of services and software libraries which deal

with the fundamental issues related to security, resources access, resource

management, data movement, resources discovery, and so forth. So, the Globus

Toolkit was built to remove barriers that prevent collaborations among different

organizations or institutions. Its core services, interfaces and protocols let the

users a remote resources access in the same way as if they were into their own

building but maintaining the local control about who and when can use those

resources.


Figure 1.3. Core Services of Globus.

In a nutshell, Globus is not a resource intermediate or broker, neither a user

or application tool, but it is a group of libraries, services, commands and APIs

which build a low level middleware that makes possible to share resources lo-

cated into different administrative domains and under different security policies.

The core services offered by Globus are (see figure 1.3):

• Security: Grid Security Infrastructure (GSI [25]).

• Resource Management: Grid Resource Allocation Management (GRAM and

WS-GRAM [9]).

• Information Services: Grid Resource Information Protocol (GRIP [13]).

• Data Transfers: Grid File Transfer Protocol (GridFTP [26]).

1.4.2 The GridWay Meta-scheduler

The GridWay project [27] started at September 2002. It is a high level middle-

ware, which may use Globus or gLite [20] as a low level middleware, among

others. Due to that fact, GridWay can be used in every infrastructure based

on Globus. The GridWay project is being developed by the Distributed Systems

Architecture Research Group at Complutense University of Madrid [28].

The GridWay meta-scheduler [10] enables large-scale, reliable and efficient

sharing of computing resources: clusters, supercomputers, stand-alone servers,

etc.. It supports different Local Resource Management System (LRMS) (e.g., PBS,

1.4. Methodology 15

SGE, LSF, Condor, . . . ) within a single organization or scattered across several

administrative domains. It also provides a single point of access to all resources

in an organization, from in-house systems to Grid infrastructures and Cloud

providers. As a result, GridWay can be used on main production Grid infras-

tructures and it can dynamically access to Cloud resources.

The first version of this meta-scheduler was developed to research in adaptive

and dynamic schedulers and it was only distributed by request and in a binary

format. The first open source version (GridWay 4.0) and its web site project were

presented on February 2005. The last version, GridWay 5.8, is the result of the

knowledge and experience gained through years of research and development,

and due to the community of users.

Nowadays, there are a big number of commercial and open source broker

systems, each one of them with different computational infrastructures under

them and with different execution profiles. However, GridWay stands out over

the rest due to the fact that it has been designed to work over Globus services,

providing a high functionality and reliability in this kind of infrastructures. As

Figure 1.4 depicts, GridWay over Globus provides a decoupling between applica-

tions and the bottom layer of local management systems. In this figure it must

be observed that users send their jobs through GridWay, which is in charge of

managing and mapping them into the computational resources, by using the

Globus middleware. Finally, GridWay returns the result of their executions.

Consequently, the GridWay framework is a component for meta-scheduling into

a Grid environment addressed to final users and Grid applications developers.

GridWay carries out all the scheduling and execution steps in a transparent

way, and it adapts the execution to the changing behavior of the Grids. To do

that, GridWay provides mechanisms for fail recovery, dynamic scheduling, on

demand job migration and the opportunistic one.

As far as job submission concerns, the users that want to send jobs to the

Grid by using the GridWay meta-scheduler have to generate a job template. This

template includes the needed information to execute the jobs, such as input and

output files names and locations, executables, as well as other management pa-

rameters related to the scheduling process, the performance, the fail tolerance,

and so forh.


Figure 1.4. GridWay usage model.

1.4.3 Extensions to GridWay

The QoS framework presented in this Thesis has been implemented on top of

GridWay, although some of the capabilities have been implemented within the

GridWay itself.

The GridWay extension is related to make it network-aware. To this end, two

new techniques to choose the resources are included, which are explained in

Chapter 3. The GridWay meta-scheduler has been chosen among others (such

as [18] [29] [30]), for several reasons: (1) the availability of its source code; (2) it

has a modular structure that allows the easy addition of new criteria to perform

the filtering and sorting of the candidate resources. This is of great aid when

trying to tailor these criteria to some other needs not initially considered by the

GridWay developers.

On the other hand, the implementation made on top of GridWay is in charge

of the meta-scheduling in advance decisions. This is a modular framework be-

tween GridWay and the users, which have been implemented in a incremen-

tal way, providing first the techniques needed for scheduling jobs in the future

without physical reservations. Then, the heuristics needed for predicting the

future status of resources and the duration of jobs into them were implemented

(Chapter 4). Ultimately, the techniques needed to deal with fragmentation and

1.5. Structure of this Thesis 17

unfavorable previous decisions in order to improve the resources usage were

developed – presented in Chapter 5.

1.4.4 Integration with other systems

The next step is the integration of the above development with other existing

systems with the aim of managing more advanced QoS. To this end, the FS-

Grid system (detailed in Chapter 6) is used, which provides our system with a

fairshare resource usage. When both systems work together, it is possible to

address QoS not only in term of jobs finished within their deadlines, but also

depending on the previous and current resource usage made by users. In this

way, it is possible to provide different QoS levels to the different users (or VOs)

depending on the policy established, which may be also changed dynamically.

1.5 Structure of this Thesis

In order to cover the points mentioned in Section 1.3, this Thesis has been

structured as follows:

• Chapter 1: The introduction chapter briefly describes the topic of interest

of the Thesis. Motivation, objectives, and organization of the document are

also described in this chapter.

• Chapter 2: The provision of QoS in Grids by means of efficient meta-

scheduling in advance is the topic of interest of this Thesis. This chapter

reviews proposals developed for the provision of QoS in Grids. This has

been done paying special attention to those proposals which consider the

network and advance reservations.

On the other hand, other research works related to the techniques devel-

oped for the QoS provision are presented. In this way, this chapter studies

several research works regarding efficient data structures, different predic-

tion techniques, fragmentation and co-allocation issues, autonomic com-

puting, service level agreements and fairness usage of resources.


• Chapter 3: In this chapter, a first proposal to improve the performance

provided by GridWay is described, paying special attention to the net-

work status. The implementation extends the widely used GridWay meta-

scheduler and relies on Exponential Smoothing (ExS) to predict the ex-

ecution and transfer times of jobs. An autonomic control loop (which

takes into account CPU usage and network capability) is used to alter

job admission and resource selection criteria to improve overall job com-

pletion times and throughput. This Autonomic Network-aware Meta-sche-

duler (ANM-ExS) combines concepts from Grid meta-scheduling with au-

tonomic computing, in order to provide users with a more adaptive job

management system. The architecture involves consideration of the status

of the network when reacting to changes in the system – taking into ac-

count the workload on computing resources and the network links when

making a meta-scheduling decision. Thus, the architecture provides meta-

scheduling of jobs to computing resources and connection admission con-

trol, so that the network does not become overloaded. The implementation

has been tested using a real testbed involving heterogeneous computing

resources distributed across different national organizations. Performance

evaluation illustrates the ability of ANM-ExS to schedule heterogeneous

jobs onto computing resources more efficiently than the conventional Grid

meta-scheduling algorithms used by GridWay.

• Chapter 4: In this chapter the meta-scheduling in advance framework,

named SA-Layer, is presented, together with all the predictions techniques

developed to make possible this kind of scheduling and with the aim of

addressing the QoS required by the users. This QoS is defined in terms

of times constraints, in our case, job start time (first time when a certain

job can start its execution) and job deadline (time when the job must have

finished its execution). Details on how the scheduling is performed, and

about how to select the resources and the time periods to execute the jobs

are presented. Moreover, different prediction techniques are described and

evaluated.

• Chapter 5: Once the SA-Layer is capable of managing scheduling in ad-

vance decisions in a correct and accurate way, the next chapter is based

on presenting the techniques needed to deal with the poor resource utiliza-

1.5. Structure of this Thesis 19

tion due to the fragmentation. The fragmentation appears as a well known

effect in every scheduling process: jobs may be rejected even if the remain-

ing capacity is enough to execute the job. On the other hand, other jobs

may be rejected owing to the fact that the meta-scheduler system is not

capable of foreseeing the future. Thus, jobs may be rejected due to un-

favorable previous decisions. This chapter presents two techniques which

try to avoid both rejection problems, named Replanning Capacity (RC) and

Bag of Task Rescheduling (BoT-R). Finally, a performance evaluation sec-

tion highlights the benefits of both techniques from the resource usage

viewpoint and consequently, from the QoS users perspective.

• Chapter 6: In this chapter, the integration between our meta-scheduling in

advance system and another one in charge of a fairshare resource usage is

presented. This second system is named FSGrid [4]. The integration of both

systems makes possible to manage QoS in terms of jobs finished within its

deadlines, and also in terms of fairshare usage of the resources from the

users point of view by taking into account the established policy. Moreover,

it is possible to define usage policy not only at user level, but also at project

or virtual organization levels. Finally, this chapter presents a performance

evaluation which highlights how our system manages different QoS taking

into account the policy set when using FSGrid system. The improvement

in the FSGrid convergence rate obtained thanks to the predictions made by

SA-Layer is also presented.

• Chapter 7: This chapter summarizes conclusions from the work carried

out in this Thesis, as well as scientific contributions, that is, national and

international publications. Finally, future work guidelines that can be ac-

complished from this Thesis are also presented.

CHAPTER

2QoS Provision and

Meta-Scheduling in Grids

This chapter makes a review of several related works into the different fields

that had to be addressed to improve the Quality of Service (QoS) provided to

users into a Grid environment though our network–aware meta-scheduling in

advance system.

2.1 Introduction

The rapid evolution of Grid Computing and the development of new middleware

services make Grid platforms increasingly used not only for best effort scien-

tific jobs but also in industrial and business applications [31]. Due to this fact,

the desire for QoS support has been growing. However, providing this QoS over

current Grid infrastructures is rather difficult as they were originally designed

without any support for QoS. Grid computing first aim was the intent to al-

low different organizations to share heterogeneous resources connected through

the network for collaborative purpose, building what was called VOs. These

VOs provide an abstraction level to manage large tasks but without any require-

ments on completion time or constraints. Software infrastructures required for

resource management and other tasks such as security, information dissemina-

21

22 Chapter 2. QoS Provision and Meta-Scheduling in Grids

tion and remote access are provided through Grid toolkits such as Globus [9]

and Legion [19].

As opposite to Cloud computing, where usually the resources are under the

control of a single management entity and dedicated, in a Grid infrastructure,

resources are shared, non dedicated and the global management policy has to

cooperate with and respect local policies. The management of QoS on the Grid

is therefore a complex problem that spans over all aspects of the Grid, involving

different layers of the Grid architecture [31]. Thus, we propose an approach,

based on a number of cooperating modules and on a specific QoS-management

layer in the Grid architecture, called SA-Layer, to address these issues.

Grids are dynamic and inter-domain environments where the assumptions

on the system behaviors have also to be dynamic and continually updated. Suf-

fice it to say that resources are heterogeneous and belong to different owners,

whilst users change continuously, as well as their job requirements. All these

facts makes QoS support much more complex than in other environments. As

reliability is strongly related to QoS provisioning, Grid systems must assure a

good level of reliability in spite of the complexity, heterogeneity and high dy-

namism of the Grid resources.

2.2 Models for addressing Quality of Service in Grids

From a QoS perspective, three different main models may be distinguished in

the Grid environments [31].

2.2.1 Best Effort Model

The Best Effort model is based just on providing a universal and common plat-

form to share resources across several administrative domains. Thus, an effi-

cient mapping between jobs and resources or the provision of any QoS is not

considered. Resources offered as best effort do not impose any constraint on the

resource owners, as resources are shared when idle.

Condor-G [22] has been one of the first Grid meta-schedulers. It has been

designed as a natural evolution of the Condor scheduler [15], for working over

2.2. Models for addressing Quality of Service in Grids 23

VOs and in an inter-domain way. Through the Condor-G agent, the Grid users

see the Grid as an entirely local resource. Condor-G acts as a scheduler, dis-

patching the submitted jobs to resources, and managing their execution (sus-

pend/resume) and monitoring the state of executing jobs. Nonetheless, Condor-

G shows some design and technical limits. First, being centralized it does not

scale well on large Grids and constitutes a single point of failure. Moreover, it

makes scheduling decisions assuming a total knowledge of jobs and resources,

which is a wrong assumption in dynamic environments like Grids. In general,

such drawbacks are the main cause of the poor efficiency of the first meta-sche-

duler generation.

2.2.2 QoS Model

Although no explicit support of QoS is defined at Grid level, the introduction of

standards, together with the need of QoS, had led to the definition of the first

effective architectures that provide some QoS functionalities, working over the

middleware. Examples of mechanisms dealing with QoS management support

are Grid Quality of Service Management (G-QoSM) [32] or Globus Architecture for

Reservation and Allocation (GARA) [33].

GARA is one of the seminal works providing support to QoS though making

physical reservation of Grid resources. G-QoSM framework is aimed at providing

QoS for applications in a computational Grid. As a framework, it works over the

middleware (Globus Toolkit [9]), which it is used for obtaining basic Grid func-

tionalities. G-QoSM presents a scalable and distributed architecture, replicated

in every administrative domain for managing the QoS at different abstraction

levels. G-QoSM supports all kinds of QoS parameters and functionalities for

optimizing the management of SLAs with explicit QoS constraints, and it is able

to manage applications with strict QoS requirements, like computational steer-

ing. However, the support of G-QoSM for QoS can be further enhanced. A first

drawback consists in the assumption that the user is able to indicate precisely

its QoS requirements in terms of quantitative low–level resources only. Another

limitation is that it relies only on middleware for providing QoS at resource level.

Finally, it does not provide support for managing deadlines on application exe-

cutions.


2.2.3 Economic Model

The main market–based Grid paradigm is defined owing to the growing demand

of QoS by applications. This fact leads to the development of signed contract,e.g.,

by using a Service Level Agreement (SLA) [34]. Under this scenario, owners may

offer their resources at distinct QoS classes (e.g., gold, silver and bronze as

outlined in [35]), requesting different prices for them. The users who pay for

them acquire guarantees of obtaining the access to those resources with the

requested QoS level.

On the other hand, there are approaches for taking into account economics

issues. For instance, the GRACE infrastructure [36] is a scalable component

that works over a basic middleware like Globus Toolkit, providing additional

functionalities with support for economics. In particular, it provides support to

resource discovery based on cost, and APIs for managing a cost for resources

and a budget for users. GRACE is the first approach that introduces explic-

itly a market in the Grid environment. However, economy management is quite

simple, there is no implementation of advanced economy models, and only the

stakeholders are final users and resource owners; in this sense, GRACE intro-

duces economic parameters, such as cost and price, but it does not support a

comprehensive and complex market–based economic system yet. Even though

the approach could support quantitative QoS parameters as extensions, QoS is

not directly managed.

2.3 Proposals for Addressing Quality of Service in Grids

QoS provisioning is the topic where this dissertation is focused on. Having a

good QoS model, which ensure the QoS provided to the users, is the base for

building a strong economic model. Hence, several extensions to the basic func-

tionality of a Grid middleware to enable QoS support are proposed. This QoS-

management layer is aimed at overcoming the limitations of the current ap-

proaches to support QoS on Grids. It lies between the users and the middle-

ware, and is devoted to explicitly manage QoS support and functionalities. It

constitutes a universal interface for both users/applications with QoS requests

and owners which offer QoS on their resources.

2.3. Proposals for Addressing Quality of Service in Grids 25

Next subsections detail a compilation of literature approaches whose main

aim is the QoS provisioning into a Grid environment. They try to address QoS

by using different methods, such as advance reservations or co-allocation of

jobs, among others. They are presented together with their weak points that

need to be solved.

On the other hand, there are also showed other research works related to the

techniques that this Thesis uses to try to manage the QoS in Grid environments.

For instance, a suitable data structure to manage the information about future

usage of resources is needed, and to this end, several of them have been studied

and presented, such as Grid Advanced Reservation Queue (GarQ) [37] or red–

black trees [38]. Following this trend, several research works regarding efficient

data structures, different prediction techniques, ways of measuring the frag-

mentation at the allocation process, co-allocation and rescheduling techniques,

autonomic computing, service level agreements and fairness usage of resources

have been reviewed, as next sections detail.

2.3.1 Scheduling Techniques

In most Grid systems the pending jobs are stored into queues until the meta-

scheduler has available resources to execute them. However, each system may

develop different scheduling algorithms to process this queue [39], such as, First

Come First Serve (FCFS), Shortest Job First (SJF), Earliest Deadline First (EDF)

or EASY Backfilling. They choose the jobs to be executed taking into account

different parameters, such as their init time, the number of resources or the

job execution duration. However, these classic scheduling algorithms are not

ready to support the Grid dynamism and because of that, they cannot provide

any guarantee of jobs being executed. Accordingly, they cannot ensure that a

certain job is going to be executed before a certain deadline. Consequently, no

QoS is provided at all.

A number of research projects have studied the QoS provision in Grid envi-

ronments, such as, GARA [33], G-QoSM [32], Grid Network-aware Resource Bro-

ker (GNRB) [40], [41] or [42]. They use several schedulers for mapping the users’

jobs to resources. For instance, the schedulers used by GARA and G-QoSM are

Dynamic Soft Real Time Scheduler (DSRT) [43] and PBS [44]. However, these


schedulers only pay attention to the load of the computing resources, making

possible to choose a powerful unloaded computing resource with an overloaded

network. This may lead to a deterioration of the performance received by users,

especially when the job is network demanding. The network is thereby a key

component within Grid systems that needs attention when making tasks such

as scheduling, migrating or monitoring [45].

Surprisingly, many of the above efforts do not take network capability into

account when scheduling tasks. Therefore, other meta-schedulers available to-

day, such as GridLab Resource Management System (GRMS) [29], Community

Scheduler Framework (CSF) [18], Grid Service Broker [30], Grid Network Bro-

ker (GNB) [46] or GridWay [10] need to be analyzed. However, they also have

some drawbacks. GRMS is obsolete, as the last Globus version it deals with is

2.4. CSF provides reservations based on a number of features, including the

network, but it is a centralized engine and is not intended for bulk data trans-

fer [47]. Grid Service Broker already includes network information provided by

Network Weather Service (NWS) [48] to perform the meta-scheduling, but it re-

quires information on the effective bandwidth between all the data hosts and all

the compute hosts in the system to perform network–aware scheduling, which

makes the proposal difficult to scale. GNB [46] is an autonomic network–aware

meta-scheduling framework, but it is only tested by means of simulations. Fi-

nally, GridWay is straightforward to install and use, and its modular architecture

allows extensions to be implemented easily. However, the network is not among

the parameters it uses to perform meta-scheduling.

2.3.2 Advance Reservation

On the whole, both Grid resources and interconnection networks may be un-

stable. This means that their performance may vary over time, which may lead

to jobs failing or to very large execution time. Due to this fact, a Grid system

needs an efficient scheduling algorithm. For that purpose, an interesting point

is to try to ensure that a specific resource is available when an application needs

it. In this way, some QoS may be provided. To this end, as several researches

said [49] [50], it is necessary to reserve the resource usage.


In the main, the resources that may be reserved or requested are the com-

putational ones, storage systems, bandwidth, . . . , or a combination of some of

them. Moreover, this reservations could be classified into two different kinds,

immediate and in advance, depending on the start time requested.

Thus, an advance reservation is defined as a “possibly limited or restricted del-

egation of a particular resource capability over a defined time interval, obtained by

the requester from the resource owner through a negotiation process [51]”. More-

over, this process may be split into two steps:

1. Scheduling decision: it is the phase where the resource and the time

period to execute the job are selected. However, in this step, there is no

physical reservation of the resources.

2. Negotiation of the reserve: it is the phase where the physical reserva-

tion of the resource takes place. To this end, the LRMS must provide this

functionality, such as Maui [52].

Several projects have aimed at exploring advance reservation of resources

(among others, GARA [53], Grid Capacity Planning [54], or Vertically Integrated

Optical testbed for Large Application (VIOLA) [55]).

Globus Architecture for Reservation and Allocation (GARA) [53] architecture

provides one of the seminal works on Grid resource management system with

improved features like advance reservation of resources, and a first support to

QoS. Reservation becomes important after the signature of an SLA contract

to avoid the resources involved in the contract to be allocated by other users.

More specifically, GARA provides a uniform way to allow users and applications

to manage both reservations and allocations. Since then, advance reservations

have been studied in numerous contexts, such as clusters (Maui Scheduler [52]).

Among the systems that allow resource reservations in a Grid we can find Grid

Capacity Planning [54], that provides users with reservations of Grid resources

through negotiations, co-allocations and pricing. Another important system is

VIOLA [55], which includes a meta-scheduling framework that provides co-allo-

cation support for both computing and network resources. However, all these

techniques have the same main drawback, not all the resources can be reserved

in a Grid environment as not all of them belong to the same administrative


domain (under ownership of a different administrator) or not all the resources

provide this functionality.

In addition to this main drawback, support for reservation in the under-

lying infrastructure is currently limited, in spite of being a required feature

to meet QoS guarantees in Grid environments, as several contributions con-

clude [53] [54]. Nevertheless, there is a performance penalty imposed by the us-

age of advance reservations (typically decreased resource utilization) which has

been studied in [56]. By contrast, Qu [57] describes a method to overcome this

shortcoming by adding a Grid advance reservation manager on top of the local

scheduler(s), which is similar to the way that our proposals are implemented.

Some of the most recent major works on advance reservations in Grids are

[54], [58] and [59]. In [54] a cost–aware resource model is presented in which

reservation for each application task is performed separately by negotiating with

the resource provider. In [58], Elmroth et al. present a resource selection al-

gorithm in which the computational resource is selected based on the results

of computing several benchmarks in each computational resource and network

performance predictions. Authors of [59] propose a multiobjective genetic al-

gorithm formulation for selecting the set of resources to be provisioned that

optimizes applications performance while minimizing costs.

2.3.3 Data Structures

Owing to the limitations that reservation in advance presents, this work aims

at performing scheduling in advance rather than reservations in advance of re-

sources [60]. Using this kind of scheduling requires an underlying infrastructure

capable of managing all the information in an efficient way. To this end, there are

several data structures detailed in the literature that present the strengths that

we need. A survey can be found in [37]. It is worth mentioning Grid Advanced

Reservation Queue (GarQ) [37] , which is a combination of Calendar Queue [61]

and Segment Tree [62], for administering reservations efficiently. However, in

this Thesis, red–black trees are used as they provide us with efficient access to

the information about resource usage, as it has been demonstrated in [38]. One

of the advantages if that this data structure stores only the free time periods

of each resource, in contrast to GarQ that stores all the information about the


reservations made. In this way, in case of large number of jobs submitted, the

number of free time periods to store will be lower.

However, in order to perform scheduling in advance two main issues arise.

First, predictions on the level of use of resources are needed. Second, realloca-

tion techniques are needed to modify job allocations when new jobs cannot be

accepted. Related approaches in these two topics are reviewed the next.

2.3.4 Prediction Techniques

Regarding job durations into resources and its waiting times in queues, there

are also several works whose main aim is estimating those times. For instance,

Queue Bounds Estimation from Time Series (QBETS) [63] tries to estimate the

probability of a job waiting no longer than startDeadline minutes if it is submit-

ted at time T . Another interesting work, based on this, is the Virtual Advance

Reservations Queues (VARQ) [64], which implements a reservation by determin-

ing when (according to predictions made by QBETS) a job should be submitted

to a batch queue so as to ensure it will be running at a particular point in future

time.

As opposite to [3] [38], where authors assume that users have prior knowl-

edge on jobs duration, such prior knowledge is not considered in the present

work. Thus, estimations on the completion times of jobs need to be calculated.

With the aim of taking accurate meta-scheduling in advance decisions, the pro-

posed system needs to perform predictions about the future resource status and

about job duration into resources. A survey of some prediction techniques can

be found in [65]. Examples include applying statistical models to previous execu-

tions [66] and heuristics based on job and resource characteristics [67]. In [66],

it is shown that although load exhibits complex properties, it is still consistently

predictable from past behavior. In [67], an evaluation of various linear time se-

ries models for prediction of CPU loads in the future is presented. In this work,

a technique based on historical data is used, since it has been demonstrated to

provide better results compared to linear functions [60].

The prediction information can be derived in two ways [68]: application–

oriented and resource-oriented. For the application–oriented approaches, the

running time of Grid tasks is directly extrapolated by using information about


the application, such as the running time of previous similar tasks. For the

resource–oriented approaches, the future performance of a resource, such as

CPU load and availability, is predicted by using historical information. Then,

these data are used to forecast the running time of a task, given the information

on the resource requirement of the task. In this work, a combination of these two

approaches is used. First, application–oriented approaches are used to sort out

the execution times of the applications. After that, resource–oriented approaches

calculate the time needed to perform the network transfers, and modify the es-

timations of the application execution time considering the predicted status for

the resource where the application is going to be executed.

2.3.5 Fragmentation Problems

On the other hand, fragmentation is a well known effect in every resource allo-

cation process, which decreases the resource utilization, as studied in [56]. So,

whenever a resource allocation fails, even though enough free capacity is avail-

able, fragmentation is easily spotted as cause. But, how the fragmentation can

be quantified in a system requiring continuous allocations, like time schedulers

or memory, is not a trivial issue. Owing to the fact that it presents similari-

ties with memory, different approaches focused on memory allocation have been

review, as such information could help to compare the effects of a scheduling

decision [69].

In [70] [71] the characteristics of dynamic memory allocators were studied.

They have to deal with problems such as finding a free block for satisfying a

malloc() request, choosing one block out of many possible ones, splitting a

block which is larger than the requested one, coalescing two or more adjacent

freed blocks, demanding more memory from the operating system, e.g. with

sbrk(), to serve a malloc() request.

However, the domain of memory management does not map onto the do-

main of Grid resources very well because there is no match for sbrk() (which

increments the data segment size) within the Grid domain. In addition the

main memory can be considered as homogeneous which is not true for Grid

resources. Moreover, a Grid environment even combines two–dimensions: time

and resource dimension. For the main memory, apart from locality effects, it


does not matter whether object 1 is in cell 1 and object 2 is in cell 2 or vice

versa. However, for Grid resources it does matter whether reservation 1 is in the

time interval 1 and reservation 2 is in time interval 2 as time interval 2 could be

too late for reservation 1. Analogously, for the other dimension it does matter

whether reservation 1 is assigned to resource 1 and reservation 2 is assigned to

resource 2 as resource 2 may not be capable of handling reservation 1.

To summarize, a drawback which is inherent to all approaches studied so

far is their limitation to one dimension. Grid resources have two dimensions:

time and resource capacity, e.g., number of CPUs, its power or bandwidth. To

this end, the correlation between the measured fragmentation of a schedule

and the future rejection rate was analysed in [69]. That paper presents a new

way to measure the fragmentation of a system and shows that the proposed

fragmentation measure is a good indicator of the state of the system. However,

as they measure the fragmentation of single resources and not of the system

as a whole, further research is needed to address fragmentation issues in the

Grid meta-scheduling domain. To this end, we have studied several metrics to

quantify existing fragmentation in Grid systems. This information can be used

to decide when some scheduling actions (e.g., rescheduling of already scheduled

tasks) have to be triggered in order to reduce the fragmentation when and where

necessary. Thus, the resource usage may be improved and the QoS offered is

consequently improved.

2.3.6 Co-allocation and Rescheduling Techniques

In the literature there are also techniques related to co-allocation of jobs, as well

as rescheduling of them. The first one sends the jobs to more than one compu-

tational resource. In this way, the system tries to ensure that the QoS requested

is fulfilled in expenses of more computational cost. The second kind of tech-

nique is based on changing the computational resource where the job was sent

to, with the objective of moving it to a better one – or at least more appropriate

in that moment. By making this last action the fragmentation may be avoided

or minimized and the number of jobs that may be accepted is increased.

One related work dealing with reallocation of jobs is [72]. This work inves-

tigates how the precision of available information affects resource provisioning


in multiple site environments and uses backfilling to perform that provision of

resources. However, their scenario does not totally map into ours since we take

into consideration not only deadline but also start time constraints for jobs.

Thus, in our case, backfilling is implicit. Every time the system receives a job

execution request with a start time lower than other already scheduled jobs,

if there are enough free time slots to allocate the job, it will be allocated and

consequently executed before the previously scheduled jobs.

The Phosphorus Project [73] is another interesting approach for the provi-

sion of QoS in Grids. The corresponding routing and scheduling algorithms aim

at satisfying two or more QoS requirements, by co-allocating resources either

concurrently or successively taking into account dependencies between commu-

nication and computational tasks.

Another interesting work is presented in [74], where an algorithm to perform

resource selection based on performance predictions is proposed, providing sup-

port for advance reservations as well as co-allocation of multiple resources. This

work also presents an algorithm for displacing already made reservations based

on making co-allocation of jobs, but not among jobs belonging to different users,

which is our case. Apart from that, their performance prediction techniques are

based on benchmarking comparisons, which have to be provided by the users

while in our system these performance predictions are transparently calculated

and the users do not need to know this information about the jobs to submit.

Regarding analysis of task reallocation in Grids, another research is pre-

sented in [75]. Authors present different reallocation algorithms and the study

of their behaviors in the context of a multi-cluster grid environment. However,

unlike our work, that work is center in a dedicated Grid environment.

2.3.7 Autonomic Computing

Apart from moving jobs from one resource to another and trying to submit jobs

to more than one resource, it is really important that the systems are capable of

adapting their behavior to the current status of the environment in an autonomic

way, so that jobs can be efficiently mapped to computing resources and the

overall behavior of the system could be improved. This is known as autonomic

computing.


An autonomic system requires sensor channels to detect the changes in the

internal state of a system and the external environment where the system is

situated. On the other hand, mechanisms to react to and counter the effects

of the changes in the environment by changing the system and maintaining

equilibrium are also neede. Sensing, Analyzing, Planning, Knowledge and Exe-

cution are thus the keywords used to identify an autonomic computing system.

A common model based on these ideas was identified by IBM Research and de-

fined as Monitor Analyze Plan Execute (MAPE) [76]. There also exist a number of

other models for autonomic computing [77], [78]. There has consequently been

significant work already undertaken towards autonomic Grid computing, such

as [79] [80] [81] [82].

In [79], an architecture to achieve automated control and management of net-

worked applications and their infrastructure based on Extensible Markup Lan-

guage (XML) format specification is presented. Liu and Parashar [80] present

an environment that supports the development of self-managed autonomic com-

ponents, with the dynamic and opportunistic composition of these components

using high–level policies to realize autonomic applications, and provide runtime

services for policy definition, deployment and execution. In [81], an autonomic

job scheduling policy for Grid systems is presented. This policy can deal with

the failure of computing resources and network links, but it does not take the

network into account in order to decide which computing resource will run each

user application. Only idle/busy periods of computing resources are used to

support scheduling. Ultimately, in [82], a simple but effective policy was formu-

lated, which prioritized the finishing and acceptance of jobs over their response

time and throughput. It was determined that due to the dynamic nature of the

problem, it could be best resolved by adding self-managing capabilities to the

middleware. Using the new policy, a prototype of an autonomous system was

built and succeeded in allowing more jobs to be accepted and finished correctly.

2.3.8 Service Level Agreements

In this QoS context, Service Level Agreements (SLAs) may be the containers of

QoS information, in a formal way, between the user and the owner. More specifi-

cally, SLAs are documents that define a contract between a service requester (the


Grid user) and a service provider (the resource, as a Grid Service), constituting

a basis for the definition of QoS.

Nowadays, SLAs are a hot topic. Many efforts have been done on several

fields, like their management [83], QoS implications [84], semantic and virtual-

ization exploitation [85] and specially on their standardization. The most impor-

tant improvement within SLAs has been the WS-Agreement specification [86],

which is considered the “de–facto” standard. The structure and mechanisms to

deploy SLAs over a system are described from a global point of view. Thanks

to the recent revision of the WS-Agreement specification [87], a new negotiation

protocol has been defined, introducing the renegotiation concept as a multiple

message interaction between user and service provider to achieve better agree-

ments. But WS-Agreement is not the only available specification, SLAng [88]

and Web Service Level Agreement (WSLA) [89] are alternatives to it.

Owing to the Service Level Agreements importance, many projects are inter-

ested on its implementation [90]. Most of them implement the WS-Agreement,

like SLA@SOI [91], AssessGrid [92] or Brein [93]. The first one is focused on

the implantation of SLAs into Service Oriented Infrastructures (SOIs) [91] from a

generic point of view. AssessGrid and Brein have a common purpose, which

is to promote Grid computational environments into business environments

and society. However, AssessGrid is focused on risk assessment for trustable

Grids whilst Brein is focused on an efficient handling and management of Grid

computing based on artificial intelligence, semantic web and intelligent sys-

tems. Another important project within this matter is WS-AGreement for Java

(WSAG4J) [94], which is a generic implementation of the WS-Agreement specifi-

cation developed by the Fraunhofer SCAI Institute as a development framework.

It is designed for a quick development and debug of services and applications

based on WS-Agreement.

It should be noted that not all projects implement WS-Agreement for their

SLA management. An example is NextGrid [95], which is focused on business

Grid exploitation.

An example of a middleware as an SLA manager is GRUBER [96]. It could

be seen as an Over–Middleware application, implemented both in pre-WS and in

WS versions. Whilst Condor-G [22] works on a very simplified model of the Grid


structure (as a simple set of resources and jobs), GRUBER considers the Grid as

a three–level hierarchy of users, groups and VOs, so that each user belongs to

one group only, and a group is a part of a VO only. The first GRUBER version

is centralized like Condor-G, whereas a distributed version (DI-GRUBER) grants

higher scalability. GRUBER works like previous meta-schedulers, discovering

and matchmaking resources and jobs independently of user and owner needs,

without any QoS support. However, GRUBER is important for QoS-based Grids

as an SLA manager; in fact, it can be implemented under a QoS framework,

providing more effective services, particularly for SLA negotiation, than basic

middlewares or previous architectures like GARA [33].

2.3.9 Fairness Resource Usage

Finally, if a Grid system wants to manage QoS through SLA contracts, it has to

ensure a fairness usage of resources with the aim of being able to address this

QoS to all users being aware of the previous usage of the system resources made

by users.

To this end, a number of mechanisms for fairshare-based job prioritization

exist. For instance, the Fair Share scheduler [97], which extends the concept of

resource allocation fairness to user level in uni-processor environments. Exist-

ing resource management and scheduling systems such as Simple Linux Utility

for Resource Management (SLURM) [98] and Maui [52] incorporate their own ver-

sions of fairshare mechanisms, but they are typically limited to enforcing usage

quotas and operating on usage data from within ownership domains.

Surveys of Grid fairshare scheduling and resource allocation mechanisms

are presented in [99] and [100]. The former provides a classification of allocation

mechanisms based on categories such as volunteer, agreement-based, and eco-

nomic mechanisms, whilst the latter provides a study based on mathematical

analysis of different strategies for share scheduling in uniprocessor, multipro-

cessor, and distributed systems.

Fair Execution Time Estimation (FETE) scheduling [101] constitutes a version

of Grid fairshare scheduling where jobs are scheduled based on completion time

predictions, as in our case (similar to scheduling in time-sharing systems). How-

ever, this work is focused on minimizing risk for missed job deadlines while in


our case, the predictions are used for making an accurate scheduling of jobs

in advance, and also to try to reach a fairshare resource usage as soon as pos-

sible, even without having finished the execution of any job. Moreover, FETE

proposal is evaluated by using a simulated environment assuming that tasks

get a fair share of the resource’s computational power. Additional algorithms for

fair scheduling focused on Grids environments are presented in [102].

Finally, there is another fairshare job prioritization system, named FSGrid [4],

which is used in this Thesis with the aim of providing different levels of QoS to

the different users. This system provides a distributed system for decentralized

fairshare job prioritization that operates on global (Grid-wide) usage data and

provides fairshare support to resource site schedulers operation across owner-

ship domains. Moreover, it calculates job execution prioritizations not only for

users but also for projects and virtual organizations.

2.4 Summary

The improvement of Quality of Service (QoS) in Grids by means of efficient meta-

scheduling is the topic of interest of this Thesis. This chapter reviews proposals

developed for addressing QoS in Grids over time and trends. This has been done

paying special attention to those proposals which have something in common

with each one of the issues that we had to face up to in the development of our

system.

Among them, GARA [33] and G-QoSM [32] can provide QoS on a variety of

resources (namely, computing resources, storage, and network), though none of

them use the network as a parameter to perform the mapping of jobs to com-

puting resources. To this end, techniques for autonomic computing have been

developed to take into consideration the network information, as well as other

information regarding previous resources status, when making the mapping be-

tween jobs and resources.

On the other hand, the fact that advance reservations are not always fea-

sible in the resources of Grid environments was the reason for developing a

meta-scheduler in advance systems which does not make physical reservation

of resources. Due to the needs of this scheduling technique, some related work

2.4. Summary 37

regarding efficient data structures and prediction techniques are detailed. The

efficient data structures are needed to provide scalable and efficient mapping

between resources and jobs by taking into account the previous scheduling de-

cisions. The prediction techniques are needed to know the future status of re-

sources and the duration of jobs when they are going to be executed on them.

The remaining proposals are centered on different techniques used for man-

aging the QoS at different levels. They are related to the techniques used in the

development of our meta-scheduling in advance system. There are proposals for

measuring the fragmentation generated in the scheduling process, for managing

SLAs, or for improving the fairness in the resource usage among users, projects

and virtual organizations.

CHAPTER

3Including Metrics to Improve QoS

at the Meta-Scheduling Level

One of the key motivations of computational and data Grids is the ability to

make coordinated use of heterogeneous computing resources which are geo-

graphically dispersed. Consequently the performance of the network linking all

the resources present in a Grid has a significant impact on the performance of

an application. It is therefore essential to consider network characteristics when

carrying out tasks such as scheduling, migration or monitoring of jobs. This

chapter focuses on an implementation of an autonomic network–aware meta-

scheduling architecture that is capable of adapting its behavior to the current

status of the environment, so that jobs can be efficiently mapped to computing

resources.

3.1 Introduction

Computational and data Grids allow the coordinated use of heterogeneous com-

puting resources within large–scale parallel applications in science, engineering

and commerce [1]. Since organizations sharing their resources in such a context

still keep their independence and autonomy [2], Grids are highly variable sys-

tems in which resources may join/leave the system at any time. This variability

39

40 Chapter 3. Including Metrics to Improve QoS at the Meta-Scheduling Level

makes QoS highly desirable, though often very difficult to achieve in practice.

One reason for this limitation is the lack of a central entity that orchestrates the

entire system. This is especially true in the case of the network that connects

the various components of a Grid system.

Achieving an end-to-end QoS is often difficult, as without resource reserva-

tion any guarantees on QoS are often hard to achieve. Furthermore, in a real

Grid system, reservations may not be always feasible, since not all the LRMS

permit them. There are also other types of resource properties, such as band-

width, which lack a global management entity thereby making their reservation

impossible.

However, for applications that need a timely response (i.e., distributed en-

gine diagnostics [103] or collaborative visualization [104]), the Grid must provide

users with some assurance about the use of resources – a non-trivial subject

when viewed in the context of network QoS. In a Grid, entities communicate

with each other using an interconnection network – resulting in the network

playing an essential role in Grid systems [33].

In [46], authors proposed an autonomic network–aware Grid meta-schedu-

ling architecture as a possible solution. This architecture takes into account the

status of the system in order to make meta-scheduling decisions – paying spe-

cial attention to the network capability. This is a modular architecture in which

each module works independently of others, thereby providing an architecture

that can be adapted to new requirements easily. It also be noted that this archi-

tecture was proposed from a formal point of view. In this way, it was checked

by means of simulation. In this Thesis, the aforementioned architecture has

been implemented into a real Grid environment, as an extension to the Grid-

Way meta-scheduler [10], with case studies and performance results provided

to demonstrate how it can be used. A scheduling technique that makes use of

ExS [105] to calculate predictions on the completion times of jobs is also devel-

oped. Thus, the main contributions of this chapter are: (1) an implementation of

an architecture to perform autonomic network–aware meta-scheduling based on

the widely used GridWay system; (2) a scheduling technique that relies on ExS

to predict the completion times of jobs; (3) a performance evaluation carried out

using a testbed involving workloads and heterogeneous resources from several

organizations.

3.2. Autonomic Network–aware Meta-scheduling (ANM) 41

The chapter is structured as follows: Section 3.2 discusses a scenario in

which an autonomic scheduler can be used and harnessed. Section 3.3 contains

details about the implementation based on an extension to the GridWay meta-

scheduler. Section 3.4 presents a performance evaluation of our approach, and

a summary of the chapter is presented in Section 3.5.

3.2 Autonomic Network–awareMeta-scheduling (ANM)

The availability of resources within a Grid environment may vary over time –

some resources may fail whereas others may join or leave the system at any time.

Additionally, each Grid resource must execute a workload that combines locally

generated tasks with those that have been submitted from external (remote) user

applications. Hence, each new task influences the execution of existing applica-

tions, requiring a resource selection strategy that can account for this dynamism

within the system. This is the reason why the provision of QoS in a Grid sys-

tem has been explored by a number of research projects, such as GARA [33] or

G-QoSM [32], which use as schedulers DSRT [43] and PBS [44], respectively.

However, these schedulers only pay attention to the load of the computing re-

source, thus a powerful unloaded computing resource with an overloaded net-

work could be chosen to run jobs, which decreases the performance received by

users, especially when the job requires a high network Input/Output (I/O). As

the network is a key component within a Grid system due to the coordinated use

of distributed resources, attention should be paid when carrying out tasks such

as scheduling, migration, or monitoring [45].

Under those conditions, developing an autonomic system that react (adapting

its behavior) depending on the system status is a must. Conceptually, an auto-

nomic system requires: (a) sensor channels to sense the changes in the internal

state of a system and the external environment in which the system is situated,

and (b) motor channels to react to and counter the effects of the changes in the

environment by changing the system and maintaining equilibrium.

Hence, Sensing, Analyzing, Planning, Knowledge and Execution are thus the

keywords used to identify an autonomic computing system, as identified by IBM

Research defining MAPE [76]. There has been significant work already under-


Figure 3.1. Example scenario.

taken towards autonomic Grid computing. For instance, an architecture to

achieve automated control and management of networked applications and their

infrastructure based on XML format specification is presented in [79]. Another

example is [81], where an autonomic job scheduling policy for Grid systems is

presented. This policy can deal with the failure of computing resources and

network links, but it does not take the network into account in order to decide

which computing resource will run each user application.

In contrast, our autonomic approach uses a variety of parameters to make

a resource selection, such as network bandwidth, CPU usage or resource good-

ness, amongst others. A motivating scenario for an autonomic network–aware

meta-scheduler architecture is depicted in Figure 3.1 and includes the following

entities [46]:

• Users, each with a number of jobs/tasks to run.

• Computing resources, which may include clusters running a LRMS, such

as PBS [44].

• GNB (Grid Network Broker), an autonomic network–aware meta-scheduler.

• GIS (Grid Information System), such as [106], which keeps a list of available

resources.

• Resource monitor(s), such as Ganglia [107] or Iperf [108], which provide

detailed information on the status of the resources.

3.2. Autonomic Network–aware Meta-scheduling (ANM) 43

• BB (Bandwidth Broker), such as [109], which is in charge of the adminis-

trative domain and has direct access to routers, mainly for configuration

and topology discovering purposes.

• Interconnection network, such as a Local Area Network (LAN) or the In-

ternet.

The interaction between components within the architecture is as follows:

1. Users ask the GNB for a resource to run their jobs. Users provide fea-

tures of jobs (the “job template”), that includes the input/output files, the

executable file and a deadline, amongst other parameters.

2. The GNB performs two operations for each job:

(a) It performs Connection Admission Control (CAC) by both (a) filtering

out the resources that do not have enough capacity to accept the job

and (b) verifying whether the required QoS can be fulfilled by execut-

ing the job on the selected resource – i.e. whether the execution can

be finished before the deadline set by the user. An estimation of job

execution time is undertaken to support this [110] [111] (explained in

the next section).

(b) It chooses the most appropriate resource for its execution, i.e. one that

exhibits the best tolerance (explained in the next section).

Choosing a tolerance parameter and supporting CAC constitute the main

autonomic capabilities of the system. The tolerance parameter is dynam-

ically adjusted based on estimated and real execution time of jobs on a

particular resource, and admission control is subsequently used to limit

allocation of jobs to particular resources. The autonomic control loop in-

volves a dynamic adjustment of the tolerance parameter to improve job

completion times and resource utilization. Hence, if the selected resource

is either not available or has excessive workload, the resource with the next

best tolerance is checked. This process is repeated until a suitable resource

is found or until a certain number of resources are checked. Finally, if it is

not possible to allocate the job, it will be dropped, since its QoS cannot be

fulfilled given current resource availability.


To achieve this, the GNB first obtains a list of resources from the GIS and

subsequently gets the current load on each of these from the resource mon-

itor.

3. When found, the GNB submits the job to the selected computing resource.

4. Finally, after job completion, the GNB will get the output sent back from

the resource, and will forward it to the user.

Once the GNB has selected a resource to execute the job (step 2), the value

of tolerance is updated. This value indicates how accurately completion times of

jobs can be predicted. To achieve this, the GNB first estimates the job completion

time (prior to actual execution), taking into account data transfer time and CPU

time. Subsequently on job completion, the estimated value is compared with the

real value of job completion time. The difference between these represents the

accuracy with which such estimation can be achieved – in practice often limited

due to some sites and administrative domains not sharing information on the

load of their resources. Resource contention provides another obstacle causing

host load and availability to vary over time, making the completion time estima-

tion difficult [68]. Consequently, it is necessary to estimate how trustworthy a

specific resource is likely to be, or even if it will be available to execute the job.

The resource that is selected to execute a job is the one that has provided the

most predictable behavior up to the point the schedule is generated. Information

on the status of already scheduled jobs is used to obtain network and CPU

tolerances. The CPU execution time is calculated by means of using Exponential

Smoothing functions (ExS) [111] to tune resource status estimations, which are

in turn calculated by using information about past executions of similar tasks.

The information on network latency is obtained by means of Iperf [108], and

information on jobs already scheduled is obtained by means of the GIS – in our

case, from Globus Grid Resource Allocation Management (GRAM) [106].

The monitoring of the network and computing resources is carried out with

a given frequency referred to as the monitoring interval. As the GNB performs

scheduling of jobs to computing resources in between two consecutive moni-

toring intervals, it must take into account the jobs already scheduled on those

resources – i.e., calculate the effective bandwidth taking account of the existing

workload.

3.3. Implementation of ANM 45

3.3 Implementation of ANM

The GNB has been implemented as an extension to the GridWay meta-scheduler.

To achieve this, it was first necessary to make GridWay network–aware, as well

as performing the needed adaptations to develop a scalable and suitable solu-

tion for real Grid environments. Details about how this has been undertaken

along with a description about predictions of network and CPU performance are

provided in this section.

3.3.1 Extending GridWay to be network–aware

GridWay [12] has been modified to take into account the status of the network

when ordering resources in the meta-scheduling process [112]. This value is

calculated by using the Iperf tool [108]. Similarly, monitoring data from com-

puting resources is obtained by Ganglia (already present in GridWay) and the

GIS provided by the Globus Toolkit [9].

GridWay performs meta-scheduling by requiring the user to provide a job

template which specifies the features of the job, including the executable file

and the input and output files, amongst others. The job template has two tags

that specify the criteria used by GridWay for selecting resources to run the job,

namely, REQUIREMENTS and RANK. With the REQUIREMENTS tag, the user can set

the minimal requirements needed to run the job, thus applying a filter on all the

resources known to GridWay. Once the REQUIREMENTS tag is processed, the set

of resources that fulfill the REQUIREMENTS are sorted according to the criteria

posed by the RANK tag. The process is depicted in Algorithm 1. For both tags,

several characteristics such as CPU type & speed, operating system, memory

available, etc. can be specified. Many of these values are gathered through

the Globus GIS module, while others (dynamic ones, such as amount of free

memory) are monitored through Ganglia [107].

Algorithm 1 Resource selection algorithm used in GridWay1: R: set of resources known to GridWay2: R

Req = {r ∈ R / r fulfills REQUIREMENTS condition}3: return r

′ ∈ RReq / ∀r ∈ R

Req, RANK(r′) ≥ RANK (r) & r′ 6= r


Figure 3.2. Conceptual view of the extensions introduced to GridWay

For this implementation, the BANDWIDTH attribute has been introduced into

GridWay. It refers to the effective network bandwidth in the path between the

GridWay node and each computing resource, i.e. the path traversed by the data

(I/O files) needed by the job to run. If an application has a large amount of input

data, it must correspondingly choose an appropriate network path based on the

value of this attribute. Both the REQUIREMENTS and the RANK expressions can

utilize the BANDWIDTH attribute. Thus, the user can filter and/or sort resources

by also taking into account the effective bandwidth from the GridWay node to

each resource. Figure 3.2 illustrates the extensions introduced in GridWay to

be network–aware. The elements in gray shade have been added in this Thesis.

Details about these extensions can be found in [112], where a performance eval-

uation that highlights the improvement obtained by making GridWay network

aware is included. Besides, an evaluation of the tuning of the network tools is

presented in [113].

3.3.2 Autonomic scheduler

Autonomic behavior in GridWay has been implemented by means of (1) per-

forming Connection Admission Control (CAC); (2) adding a new attribute named

TOLERANCE that GridWay uses to perform the filtering and sorting of re-

sources, reacting to changes in the state of the system; and (3) using Exponential

Smoothing (ExS) [105] to tune the predictions on the duration of jobs. This sec-

tion presents details on how these have been implemented.


Algorithm 2 CAC algorithm.

1: R: set of resources known to GridWay {ri / i in [1..n] }2: RCAC : set of resources that fulfill the CAC algorithm3: CPUfree(ri): the percentage of free CPU of resource ri4: j: a job5: deadline(j): deadline of job j

6: tricompletion(j): estimated completion time of job j in resource ri7: MaxRes: maximum number of resources to check8: RCAC= ∅;9: i = 1;

10: while (CPUfree(ri) ≥ thresholdCPU ) AND (i ≤ MaxRes)) do11: if (tricompletion(j) < deadline(j)) then12: RCAC = RCAC + ri13: end if14: increment (i)15: end while

Connection admission control (CAC)

The Connection Admission Control Algorithm (see Algorithm 2) checks resources

(with R being the set of computing resources of the same VO) to identify those

with enough CPU capacity (thresholdCPU ) on which the job can be executed

within its deadline (line 11). If the predicted completion time for the job is lower

than the deadline, the resource is chosen (line 12). Otherwise, the next resource

is checked. This process is repeated until all the resources are checked (line 10).

Not all the known resources have to be checked, for efficiency and scalability an

upper limit may be defined (MaxRes). If RCAC is empty, then the job is rejected

– or alternatively a negotiation process is started.

Scheduling Algorithm

Once the target set of resources (RCAC ) has been calculated by the CAC algo-

rithm, the scheduling algorithm sorts them by taking into consideration their

TOLERANCE (the way of estimating this value is explained in next subsection),

from the lowest to the highest one, as outlined in Algorithm 3.

As discussed next, resources with a high TOLERANCE value are less pre-

dictable, hence even though a job execution time on that resource may be within

the deadline, this does not mean that the deadline will actually be met. It must

be noted that predictions about durations of jobs have to be used, such as

in [110] [111].


Algorithm 3 Scheduling algorithm.1: j: a job2: RCAC : set of resources that fulfill the CAC criteria3: rexe: resource where j will be submitted4: for all ri in RCAC do5: if Toleranceri < Tolerancerexe then6: rexe = ri7: end if8: increment (i)9: end for

On the other hand, and due to performance and scalability reasons, the CAC

and scheduling algorithms are tightly coupled. This means that they are exe-

cuted together in such a way that when a resource which could execute the job

within its deadline is found, the job is submitted to that resource and the pro-

cess stops. Hence, when the CAC is filtering resources, the list of resources to

check have already been sorted by the scheduling algorithm taking into account

their TOLERANCE values.

Including the TOLERANCE attribute in GridWay

In order to filter and order the list of resources known to GridWay, considering

the accuracy of previous scheduling decisions, a new attribute has been added

to the job template. TOLERANCE has been implemented as a new attribute

that can be used both in the RANK and REQUIREMENTS tags, in the same way

as BANDWIDTH. The TOLERANCE attribute reflects the accuracy of predicting

job completion times for each resource. On performing scheduling of jobs to

computing resources and to address QoS of users (e.g. finish jobs before a

deadline), predictions on the completion time of jobs must be calculated. This

includes predictions on the transfer times (transfer of input and output files,

along with the executable file) and execution time.

The calculation of the TOLERANCE attribute is motivated by [46], where

each time a job has to be scheduled both transfer and execution latency of that

job are calculated for each resource known to the meta-scheduler. However this

is a time consuming process and therefore not scalable, so it should only be

carried out after choosing a resource (and not for all resources). Additionally,

the calculation of the TOLERANCE attribute in [46] relies on the millions of


instructions that a job has, which is a measure of the size of the job provided

by the simulator but really hard to obtain in practice. This term has been sub-

stituted by the average execution time of jobs of the same type, which is a more

realistic metric to measure in practice (Section 3.3.3 explains this).

In our approach, the GNB performs scheduling for each job request and uses

the value of the TOLERANCE attribute to filter and sort the set of resources

known to GridWay. TOLERANCE values are calculated (as outlined in Equa-

tion 3.1) after every job completion and associated with the resource on which

the job has been executed. Subsequently, the GNB orders resources based on

their TOLERANCE, from the lowest to the highest, i.e. from the most pre-

dictable to the less one.

TOLERANCE(ri) = TOLERANCEricpu + TOLERANCEri

net (3.1)

TOLERANCErinet =

trinet_real − trinet_estimated

MB(3.2)

TOLERANCEricpu =

tricpu_real − tricpu_estimated

tricpu_real(j)(3.3)

The terms TOLERANCErix , x = {net, cpu}, represent the accuracy of the pre-

vious predictions carried out by the GNB for the resource ri, with i ∈ [1, n].

For TOLERANCErinet, the last measurement of network bandwidth between the

GridWay node and ri is considered, collected from the last update of this mea-

sure before the execution of the job. With this information along with the total

number of bytes to be transferred (MB), an estimation of transfer time of the job

(trinet_estimated) is calculated. After job execution, the actual time needed to com-

plete the transfers (trinet_real) can be obtained. Finally, with these two times, the

updated network tolerance for the resource where the execution took place (ri)

is calculated. The value of TOLERANCErinet reflects how accurate the prediction


on the transfer time has been for the given job. Similarly, TOLERANCEricpu is

calculated for each job after its completion.

Equations 3.2 and 3.3 show the actual formulas used for the last completed

job, where MB represents the size of the job in mega-bytes, and tricpu_real(j) rep-

resents the average execution time of a certain job (j) on a specific resource

(ri). To estimate future values of TOLERANCE an approach similar to the one

used by the Transport Control Protocol (TCP) for computing retransmissions time-

outs [114] may be used. Hence, we can consider:

D = TOLERANCE(ri)− Toleranceri(t) (3.4)

Toleranceri(t+ 1) = Toleranceri(t) +D ∗ δ (3.5)

where δ reflects the importance of the last sample in the calculation of the next

TOLERANCE (Toleranceri(t + 1)). TOLERANCE is only considered for those

resources known to GridWay to have enough available capacity to accept more

jobs. The GNB keeps a TOLERANCE value for the network and CPU capacity

of computing resource and modifies these in response to changes in the system.

Figure 3.3 illustrates the autonomic control loop for modifying the TOLERANCE

parameter, as outlined above. The ∆ means the difference between the predicted

times and the real times. Hence, it is related to the Equations 3.2 and 3.3.

3.3.3 Predicting and tuning resource performance

Two types of predictions are necessary, namely (1) predictions on the transfer

times, and (2) predictions on the execution times. These are explained next.

Calculating the network performance

Once the scheduler has sorted the available resources by using their toler-

ance values, it is necessary to estimate the effective bandwidth between two


Figure 3.3. Autonomic control loop for adapting the TOLERANCE parameter.The “X" in tX_real and tX_estimate refers to the set {net, cpu}

end points in the network – these being between the GNB and the computing

resource where the job will be executed. This prediction is used for the CAC

algorithm. Estimation of link bandwidth was implemented in [112], in which the

Iperf tool monitors the available bandwidth from the GNB to all the computing

resources it knows with a given frequency. However, this only provides the ef-

fective bandwidth at the moment when monitoring is performed, but may not

be the bandwidth when a schedule needs to be defined (as other jobs may have

been scheduled and are being transferred at that moment). It is therefore neces-

sary to infer the effective bandwidth between two monitoring intervals – similar

to [46].

We achieve this by considering the number of jobs that are being submitted

to the selected resource at the point at which a schedule needs to be defined.

Hence the effective bandwidth of the path between the GNB and a computing

resource r at time t+ x can be calculated as follows:

eff_bw(ri)t+x =Bw(ri)t

#PrologJobs+ 1(3.6)

where Bw(ri)t is the last measured value (at time t) of the available bandwidth be-

tween the GNB and the resource ri selected to execute the job; and #PrologJobs

is the number of jobs that are submitting data from the GNB to that resource. In

order to take into account all the data being transferred, this number is updated


with the new incoming connection (“+1” in the Equation 3.6). Note that x must

be between (0, 1), which means that the estimation is made between two real

measures.

Once we have the effective bandwidth of a network path, the latency of the

data transfers for the job over a network path can be calculated by dividing the

I/O file size of the job in MB (megabytes) by the effective bandwidth. These

values (I/O file sizes) are known since I/O files are specified in the job template.

In this way, the estimated time to complete the transfers is obtained by using

Equation 3.7.

trinet_estimated =sizeF ilesIn

eff_bw(ri)t+x

+sizeF ilesOut

Bw(ri)t(3.7)

It must be noted that Bw(ri)t is used for calculating the time needed to com-

plete the output transfers (epilog step) since we know the number of jobs that are

sending input files but we cannot ensure how many jobs will be sending back the

output files when the job being submitted is completed. Additionally, the des-

tination resource of this output file transfers does not have to be the same for

all output transfers. Thus, a Grid meta-scheduler cannot have complete knowl-

edge of the network structure, making it necessary to make assumptions about

effective bandwidth available in the future. These assumptions are obtained by

using an Exponential Smoothing function, which is explained in Section 4.4.5.

Calculating CPU latency

Predictions of job execution time are quite difficult to obtain since there are per-

formance differences between Grid resources and their performance character-

istics may vary for different applications (e.g. resource A may execute an appli-

cation P faster than resource B, but resource B may execute application Q faster

than A). With the aim of estimating as accurately as possible the time needed

to execute the jobs on the selected resources, we apply the techniques devel-

oped in [110] [111]. These techniques use application–oriented prediction tech-

niques to sort out the execution time of the application and resource–oriented

3.4. Experiments and results 53

Algorithm 4 Estimation of Execution Time (tricpu_estimated(j))

1: R: set of resources known to GridWay {ri / i in [1..n] }2: j: the job to be executed3: tricpu_real(j)k: the k − th execution time for the application j in the resource ri4: DB_Resourcesri : the filtered database with the information about the status

of the resource ri5: CPUfree(ri): the mean percentage of free CPU in the resource ri between now

and the deadline of the j, calculated by using the Exponential Smoothing(EsS) function

6: Overload: the extra time needed due to the CPU usage at the chosen resourceri

7: tricpu_estimated(j) =∑

nk=1

tricpu_real

(j)k

n

8: Overload = tricpu_estimated(j) ∗ (1− CPU_free(ri))9: tricpu_estimated(j) = tricpu_estimated(j) +Overload

10: return tricpu_estimated(j)

approaches to recalculate the execution time of the job depending on the pre-

dicted CPU status of the resource.

Predictions on execution times are performed as explained in Algorithm 4

and are based on the average of previous executions of an application on a

particular resource (line 7) – this estimation takes into account the different

input parameters. This average is calculated for each job type, and information

related to previous execution of a specific job is used to determine an average

execution time. After that, the prediction on the future status of the CPU of each

resource is calculated by means of an Exponential Smoothing function. Finally,

the mean execution time is determined by predicting future CPU status of each

resource (line 9). More detailed information about this process is presented in

Section 4.3.4.

3.4 Experiments and results

This section describes the experiments conducted to test the usefulness of this

work, along with the results obtained.

3.4.1 Experiment Testbed

The evaluation of the autonomic implementation has been carried out in a real

Grid environment. The testbed consists of resources located at two different Uni-


Figure 3.4. Grid testbed topology.

versities, as illustrated in Figure 3.4. At the University of Castilla–La Mancha,

(UCLM, Albacete, Spain) there are resources located in two different buildings.

In one building, named Instituto de Investigación en Informática de Albacete (I3A),

there is one machine which performs the scheduling of tasks and several com-

putational resources (10 desktop computers belonging to other users). In a sec-

ond building, named Escuela Superior de Ingeniería Informática (ESII), there is

a cluster machine with 88 cores and with the PBS [14] scheduler, which is also

shared with other users. All these machines belong to the same administrative

domain (University of Castilla–La Mancha (UCLM)) but they are located within

different subnets.

On the other hand, there is another computational resource at the National

University of Distance Education (Universidad Nacional de Educación a Distancia,

UNED, Madrid, Spain), which is also a desktop computer. Thus, the network

which links this computational resource with the UCLM resources is the Inter-

net. Table 3.1 outlines the main characteristics of these computing resources.

Note that these machines belong to other users, so they have their own local

background workload (including the network load). Each non-cluster machine


Domain Machine Hardware GlobusCPU RAM (version)

UCLM (I3A) GridWayI3A.uclm.es 2 Intel Pentium 4 CPU 3.00 GHz 2 GB v. 4.0.5UCLM (I3A) R1 2 AMD Opteron 244, 1.80 GHz 1 GB v. 4.0.3UCLM (I3A) R2 2 AMD Opteron 244, 1.80 GHz 1 GB v. 4.0.3UCLM (I3A) R3 2 AMD Opteron 244, 1.80 GHz 1 GB v. 4.0.3UCLM (I3A) R4 2 AMD Opteron 244, 1.80 GHz 1 GB v. 4.0.4UCLM (I3A) R5 2 AMD Opteron 244, 1.80 GHz 1 GB v. 4.0.8UCLM (I3A) R6 2 Intel Pentium 4 CPU 3.20 GHz 3 GB v. 4.0.7UCLM (I3A) R7 Intel Core 2 Duo CPU 2,66 GHz 2 GB v. 4.0.8UCLM (I3A) R8 2 AMD Opteron 244, 1.80 GHz 1 GB v. 4.0.7UCLM (I3A) R9 2 Intel Pentium 4 CPU 3.00 GHz 2 GB v. 4.0.4UCLM (I3A) R10 2 Intel Pentium 4 CPU 3.00 GHz 1 GB v. 4.0.8UCLM (ESII) Cluster1 22 AMD bipro dual core Opteron CPU 2.4 GHz 4 GB v. 4.0.8

UNED Uned R1 Intel Core 2 Duo CPU 2,80 GHz 4 GB v. 4.0.8

Table 3.1. Characteristics of the resources

is the desktop computer of a member of the staff (and not a dedicated machine)

at UCLM or National University of Distance Education (UNED), so they may have

different CPU and network background workloads, which are not defined in the

testbed.

3.4.2 Workload

To evaluate our implementation we use one of the GRASP [115] benchmarks,

named 3node. The 3node test consists of sending a file from a source node to

a computation node, which performs a search pattern, generating an output file

with the number of successes. The output file is sent to the result node. This test

is meant to mimic a pipelined application that obtains data at one site, computes

a result on that data at another, and analyses the result on a third site.

Furthermore, this test has parameterizable options to make it more com-

pute intensive (compute_scale parameter), which means that the run time is

increased, and/or it can become more network demanding (output_scale param-

eter), which means that the files to be transferred are bigger. This versatility

is the reason why we have chosen this test to measure the performance of our

approach. With these two parameters, it is possible to generate different types of

jobs. Therefore, in order to emulate the workload used in [46], the compute_scale

parameters takes the value 10 and the output_scale 1. Besides, the input file size

is 48 MB, and these values of output_scale create output files whose size is the

same as the input file size.


Figure 3.5. Visualization Pipeline (VP) test.

To better validate and evaluate this implementation, one of the NAS Grid

Benchmarks (NGB) [116], named Visualization Pipeline (VP), has also been used.

This test has different workflow dependencies. Some jobs are more computation-

ally intensive whilst others are network demanding. Therefore, the VP allows us

to explore a big spectrum of running conditions. Figure 3.5 shows the workflow

of this test. VP represents chains of compound processes, like those encoun-

tered when visualizing flow solutions as the simulation progresses. It comprises

three NAS Parallel Benchmarks (NPB) [117] problems, namely BT, MG, and FT,

which fulfill the role of flow solver, post processor, and visualization module, re-

spectively. This triplet is linked together into a logically pipelined process, where

subsequent flow solutions can be computed while postprocessing and visual-

ization of previous solutions are still in progress [116]. The red circular nodes

depict BT jobs, blue square nodes are MG jobs and black trapezium nodes are

FT jobs. All of them (BT, MG and FT ) are defined in the NGB benchmarks.

3.4.3 Performance evaluation

This section compares the following meta-scheduling schemes: (1) the original

GridWay meta-scheduler [10], which chooses the first discovered resource with

enough free CPU to execute a job (labeled as GW in the figures), (2) the GridWay

meta-scheduler using the CPU power to select resources (labeled as GW-MHZ), (3)

the network–aware GridWay extension presented in [112] (labeled as GW-Net),

(4) the autonomic network–aware meta-scheduler with ExS disabled (labeled as


ANM), and (5) the autonomic network–aware meta-scheduler with ExS enabled

(labeled ANM-ExS). For the last two schemes, the CAC functionality has been

disabled in order to make a fair comparison with the other scheduling techniques

which do not have this feature, which means that it does not matter if the QoS

could be reached or not when accepting jobs.

GridWay has been used in several research articles to compare meta-sche-

duling techniques. Among others, Vazquez-Poletti et al. [118] present a com-

parison between EGEE and GridWay over EGEE resources. This comparison is

both theoretical and practical through the execution of a fusion physics plasma

application on the EGEE infrastructure, and shows the better performance of

GridWay over LCG-2 Resource Broker. Several theoretical comparisons, among

others [34] [119] [120] compare GridWay with other meta-scheduling techniques,

highlighting the fact that this is a valuable and versatile tool to manage dynamic

and heterogeneous Grids.

To evaluate the performance of the aforementioned scheduling techniques in

our environment, we emulate a workload similar to [46] by using the 3node test.

To do this, we simulate 5 different users. Each of them submits its jobs with

one type of scheduling technique. User requests consist of 1000 jobs of 3node

type with the parameters set as explained in Section 3.4.2. Results of these

submissions are presented in Figure 3.6. Figure 3.6 (a) represents the average

completion time of the 3node test with each scheduling technique. Figure 3.6 (b)

represents a boxplot about the average execution time of the 1000 3node ex-

ecutions. The last one is a convenient way of graphically depicting groups of

numerical data through their five-number summaries: the smallest observation

(minimum), lower quartile (Q1), median (Q2), upper quartile (Q3), and largest ob-

servation (maximum).

As we can see on both figures, the average completion time is lower when

using ANM and ANM-ExS, in spite of having the CAC functionality disabled. The

best results are obtained for ANM-ExS. This scheduling technique achieves a

time reduction of 26.11 % over GW, of 17,47 % over GW-MHZ and of 15.75 % over

GW-Net. Moreover, using the Exponential Smoothing predictions also results in

a gain of 9.35 % over the completion times obtained by ANM. Furthermore, as

Figure 3.6 (b) depicts, there are other important results that highlight the better

behavior of ANM-ExS technique. These are:


GW GW-MHZ GW-Net ANM ANM-ExS0

50

100

150

200

Com

ple

tion

Tim

e (

Secon

ds)

GW GW-MHZ GW-Net ANM ANM-ExS0

100

200

300

400

500

600

Com

ple

tion

Tim

e (

Secon

ds)

(a) Average Completion Time (b) Average Completion Time Boxplot

Figure 3.6. 3node test Average Time.

• Median time reduction: 33.92 % compared with GW, 24.16 % compared with

GW-MHZ and 26.62 % compared with GW-Net.

• Maximum time reduction: there exists a clear reduction obtained on this

metric, since ANM-ExS selects a resource whose behavior is more predictable.

Because of that, the probability of choosing a bad resource which delays

the execution is quite low. Thus, the maximum time reduction obtained is

of 53.94 % over GW, of 66.75 % over GW-MHZ and of 55.53 % over GW-Net.

Moreover, comparing ANM and ANM-ExS, the latter performs better since the

completion time predictions are more accurate when using Exponential Smooth-

ing and this makes the TOLERANCE value more reliable. Thus, ANM-ExS ob-

tains a reduction of 9.96 % over ANM for the median time, and of 53.61 % for

the maximum time. The worst case is therefore clearly improved by using the

ANM-ExS compared to other techniques.

It must be noted that for the GW scheduling technique, the box is narrower

since the selected resource is the first discovered resource (whenever possible).

Consequently, most of the jobs are executed on the same resource, so the time

needed to complete the executions is more uniform.

On the other hand, the resource usage is also improved by using autonomic

behavior. Information about resource usage is presented in Table 3.2. The first

line represents the percentage of resources used for executing the 3node test

with each type of scheduling technique. The second line shows the usage of the


GW GW-MHZ GW-Net ANM ANM-ExS

% of used resources 25 % 38.4 % 50 % 75 % 62.5 %Maximum % of resource usage 79 % 42.5 % 32 % 42 % 46 %Minimum % of resource usage 21 % 1 % 12 % 1 % 8 %

Table 3.2. Percentage of resource usage by 3node tests.

most saturated resource – the one which each scheduling technique sends more

jobs to. Finally, the last line represents the load submitted to the least used

resource. There is a higher number of hosts used when ANM and ANM-ExS are

running (as the first row depicts). Also, the load is spread over the resources

in such a way that there are not overloaded resources (as second row depicts).

Finally, those resources whose behavior is not predictable are less used. From

these results it can be seen that the use of Exponential Smoothing improves the

predictability of the resources.

Despite the fact that ANM uses more resources than ANM-ExS (which may

make us think that ANM balances load more efficiently), this does not necessary

mean that ANM uses resources more efficiently. On the one hand, it could be

better to submit more jobs to a resource having a better behavior, even if the

load is not totally balanced. It may also not be advantageous to keep balancing

the load since it may mean that worse resources (i.e. those that do not have the

exactly desired capability) are used – rather than focusing our load on the best

resources. Furthermore, in some cases ANM may not be accurate enough about

its predictions and an unsuitable resource may be selected. This fact can be

deducted from the minimum percentage of resource usage for ANM, as Table 3.2

shows. In that case, the resource was selected because of its TOLERANCE.

However, the resource did not present such a predictable behavior since the

Exponential Smoothing function was not used.

Regarding the QoS perceived by the users, measured as the number of jobs

that are executed fulfilling the deadline set by the users, another experiment

has been conducted taking into account the previous results. In this case, we

enable the CAC system for the ANM-ExS (labeled ANM-ExS CAC) and compare its

results against the previous ones by setting different deadlines for the submitted

jobs. Figure 3.7 depicts the number of jobs that would not have fulfilled the QoS

if the deadline had been set to 300, 180 and 120 seconds, respectively. As the

other techniques do not have CAC capability, we use the information obtained


GW MHZ GW-Net ANM ANM-ExS ANM-ExS CAC0

20

40

60

80

100

% o

f Jo

bs w

hic

h n

ot

fulfi

ll t

he d

ead

lin

e

300 Sec. 180 Sec. 120 Sec.

Figure 3.7. 3node test. QoS not fulfilled.

in the previous test to know how many jobs would have finished on time. Hence,

we count the number of jobs for which the execution time was lower than the

deadline. It must be noted in the ANM-ExS CAC all the jobs rejected due to

the CAC algorithm are computed as jobs which not fulfill the QoS requirements.

The same information for both ANM techniques is also presented to highlight the

improvement obtained by using the CAC algorithm. As this figure depicts, for

a 300 seconds deadline, the differences are negligible, since almost all the jobs

can finish their execution before the deadline. In this case, the worst behavior is

presented by GW-MHZ due to the way in which resources are selected. Sometimes

resources with low network connectivity are selected, hence the time needed to

complete the transfers is high, which leads to missing the established deadline.

For a deadline of 180 seconds, it is again not useful to focus purely on CPU

speed. Moreover, it is also highlighted that the autonomic behavior is better con-

sidering the three cases that use it (ANM, ANM-ExS and ANM-ExS CAC). However,

ANM-ExS CAC seems to work worse than when the CAC is disabled. This is due

to the fact that there may be jobs that are not accepted since it is estimated that

their deadline cannot be met, which is not happening when CAC is disabled.

Additionally, there may be the case of jobs whose estimations for completion

time are a bit larger than the deadline (e.g., the estimation says 181 seconds

and the deadline is 180 seconds). These jobs are rejected when CAC is enabled.

However, when CAC is disabled, this job is executed, and its execution time may

be within the deadline.

Nonetheless, these scenarios do not involve deadlines that are hard to fulfill.

However, as shown in Figure 3.7, for a 120 seconds deadline the behavior is


35 20 T_VP0

200

400

600

800

1000

1200

1400

1600

1800

2000

2200

2400

2600

2800

3000

3200

Com

ple

tion

Tim

e (

Secon

ds)

GWMHZGW-NetANMANM-ExS

Figure 3.8. VP Test Average Completion Time.

different. In this case, it is really important to reject jobs when it is clear that

their QoS cannot be fulfilled. This way, there will be fewer executions, and it is

more likely that the remaining jobs finish on time. For these reasons, ANM-ExS

CAC shows an improvement of 34.7 % over ANM-ExS.

Next, an evaluation of the performance received by users emulating a more

realistic situation is presented. In this test, several jobs were submitted to the

Grid testbed during a long time interval. Moreover, jobs are submitted at the

same time for all the meta-scheduler configurations. Thus, there is a compet-

itive behavior among all the jobs submitted in all the tests. Furthermore, the

duration of this interval is not fixed and depends on the way tests are submit-

ted. Different cases have been analyzed, being more or less demanding over the

Grid environment. This test illustrates how different Grid workloads affect the

user performance. Three different user behaviors which imply different stresses

on the Grid system have been used, namely, case 1, in which VP tests (from the

NGB suit [116]) are submitted every 35 minutes; case 2, which consists in sub-

mitting VP tests every 20 minutes; and case 3, where one VP test is submitted

when the previous one has just already finished (labeled as T_VP). Hence, case

1 is the less stressing, and case 2 is the most stressing. For all the cases, 5 VP

tests were submitted.

In this experiment, the metric used for evaluating the performance obtained

by the user is the average completion time of all VPs. Figure 3.8 depicts the

results for each submission frequency for each scheduling technique, and Ta-


35 Min. 20 Min. T_VP Average

ANM-ExS vs GW 31.53 % 25.19 % 18.37 % 25.03 %

ANM-ExS vs GW-MHZ 17.09 % 21.72 % 5.21 % 14.67 %

ANM-ExS vs GW-Net 10.92 % 9.03 % 6.97 % 8.98 %

ANM-ExS vs ANM 8.6 % 4.8 % 3.27 % 5.56 %

Table 3.3. Percentage of improvement by using the autonomic implementationwith ExS (ANM-ExS).

ble 3.3 presents a summary of such information. As can be seen in Figure 3.8,

the best performance for all the submission frequencies are again obtained by

ANM and ANM-ExS, even taking into account that the CAC is disabled for a fairer

comparison. Hence, the stability of the behavior of a resource is the best way of

choosing the resource to run a job.

On the other hand, these results also highlight the usefulness of using the

Exponential Smoothing function to estimate the time needed to complete the

execution of a job. This leads to predictions which are more accurate and it is

possible to obtain an improved value of the resource TOLERANCE. Hence, the

resource selection process is better and the time needed to complete a job is

decreased. For these reasons, the ANM-ExS technique obtains the best results.

The main differences arise when ANM-ExS is used, although the largest dif-

ference is between using network–aware (GW-Net, ANM, and ANM-ExS) and non

network–aware techniques (GW and GW-MHZ) due to the fact that VP is very net-

work demanding. Moreover, the largest differences are obtained for 35 minutes

submission frequency, since at this rate there is less load in the Grid. There are

more free resources and it is possible to select a better resource owing to the fact

that the system has more idle resources to choose from.

To sum up, these results highlight the usefulness of using the TOLERANCE

parameter to perform better selection of resources, and consequently, in the

QoS delivered to users. Moreover, the benefits of using Exponential Smoothing

to predict the future status of resources are also illustrated. This way, we ob-

tain, on average, around 25 % completion time reduction compared with GW (as

presented in Table 3.3), around 15 % compared with GW-MHZ and almost 10 %

compared with the network–aware GridWay implementation. Furthermore, the

improvement provided by the use of Exponential Smoothing function means a

completion time reduction of more than 5 % over ANM.


Figure 3.9. Resource Usage.

Finally, from a system point of view, the autonomic techniques (ANM-ExS and

ANM) also present better behavior as the workload is better balanced over the

resources. This can be seen in Figure 5.8, which depicts the percentage of jobs

submitted to each resource when using each technique. It must be noted that

when using GW or GW-MHZ, the resource usage is not balanced as resources are

selected based on the order in which they are discovered or their CPU speed

(which are static parameters), rather than just taking into account available ca-

pacity (such as percentage of free CPU) to be able to execute the incoming job.

Moreover, if GW-Net is used, the resource usage is also not balanced since only

the resources with better bandwidth are selected. However, when using auto-

nomic techniques (ANM-ExS and ANM), the scheduling of jobs is more balanced

since almost all the resources are used.

Additionally, some resources are also more used than others due to the fact

that they have a different performance. This means that from the TOLERANCE

point of view, they have a better behavior because they present a more pre-

dictable performance. This is specially true when ExS is used, since the time

needed to complete the jobs is better estimated due to the fact that it takes into

account the predicted status for the resource. This makes the resources us-

age slightly more balanced when using ExS. Hence, it is a better solution to

choose the more predictable resources with the aim of reducing the time needed

to complete the jobs.


3.5 Summary

This chapter presents a working implementation of an architecture which com-

bines concepts from Grid scheduling with autonomic computing [121] in order

to provide users with a more adaptive job management system. The architecture

involves consideration of the status of the network when reacting to changes in

the system – taking into account the load on computing resources and the net-

work links when making a scheduling decision. This architecture was originally

presented and tested by means of simulations in [46]. This work presents an

implementation based on GridWay [10] – an open source Grid meta-scheduler.

The architecture provides scheduling of jobs to computing resources so that

the network does not become overloaded. In order to perform the implementa-

tion of the autonomic network–aware meta-scheduler, a first step was the ex-

tension of GridWay meta-scheduler to make it network–aware. Subsequently,

the autonomic behavior is implemented by means of (1) adding a new attribute

TOLERANCE that GridWay uses to perform the filtering and sorting of re-

sources; and (2) performing Connection Admission Control (CAC). The term

TOLE− RANCE and the CAC were originally introduced and tested by means of

simulations in [46], but both had to undergo adaptation when used in the Grid-

Way implementation. Moreover, the use of Exponential Smoothing alongside the

CAC algorithm is a novelty of this work.

The main contributions of this chapter are: (1) an implementation of the

architecture to perform autonomic network–aware meta-scheduling based on

GridWay; (2) a scheduling technique that relies on ExS to predict the completion

time of jobs; (3) a performance evaluation carried out using a real testbed involv-

ing several workloads and heterogeneous resources from several organizations.

Several ways of performing the scheduling of jobs to computing resources

are evaluated, namely GW, GW-MHZ, GW-Net (presented in [112]), ANM (presented

in [46]) and ANM-ExS (novelty of this work). This evaluation uses different work-

loads and heterogeneous resources belonging to different organizations, showing

that the autonomic behavior based on Exponential Smoothing improves the per-

formance received by users and yields a better load balance among resources.

CHAPTER

4Adding Support for Meta-

Scheduling in Advance:

The SA-Layer

As it has been stated through all this dissertation, the provision of Quality of

Service (QoS) in Grid environments is still an open issue that needs attention

from the research community. One way of contributing to the management

of QoS in Grids is by performing meta-scheduling of jobs in advance, that is,

jobs are scheduled some time before they are actually executed. In this way, it

becomes more likely that the appropriate resources are available to run the job

when needed, so that QoS requirements of jobs are met (i.e. jobs are finished

within a deadline).

This chapter presents a framework built on top of Globus and the GridWay

meta-scheduler to improve QoS by means of performing meta-scheduling in

advance. This framework manages idle/busy periods of resources in order to

choose the most suitable resource for each job, and uses red–black trees for this

task. Furthermore, no prior knowledge on the duration of jobs is required, as

opposed to other works using similar techniques.

65

66 Chapter 4. Adding Support for Meta-Scheduling in Advance: The SA-Layer

This framework uses heuristics that consider the network as a first level re-

source, and presents an autonomous behavior so that it adapts to the dynamic

changes of the Grid resources. The autonomous behavior is obtained by means

of computing a trust value for each resource and performing job rescheduling in

case of resource failure. All this set of features make this framework suitable for

real Grids.

4.1 Introduction

The heterogeneous and distributed nature of the Grid along with the different

characteristics of applications complicate the brokering problem. To further

complicate matters, the meta-scheduler typically lacks total control and even

complete knowledge of the status of the resources. This poses a heavy challenge

for the provision of QoS.

Current scheduling systems adopt three different approaches to tackle these

problems [122]: scheduling based on just–in–time information [10] from Grid In-

formation System (GIS) [13], performance prediction [123], and dynamic resche-

duling at run time [124]. These approaches are not exclusive. For instance,

it is possible to use a mixture of several approaches like doing performance

prediction and dynamic rescheduling at run time. Getting resource static infor-

mation, such as CPU frequency, memory size, network bandwidth or file sys-

tem is feasible. But runtime information, such as CPU load, available memory,

and available network bandwidth, is more difficult to obtain. This is because

of performance fluctuation which in turn is due to contention among shared

resources.

One key idea to solve the scheduling problem is to ensure that a specific re-

source is available when a job requires it. This is the reason why reserving or

scheduling the use of resources in advance becomes essential. Reservation in

advance can be defined as a restrictive or limited delegation of a particular re-

source capacity for a defined time interval [51]. The objective of such reservation

in advance is to provide Quality of Service (QoS) by ensuring that a certain job

uses the resources it needs when they are requested. However, incorporating

such mechanisms into current Grid environments has proven to be a challeng-

4.1. Introduction 67

ing task due to the resulting resource fragmentation [38], in spite of enabling

QoS agreements with users and increase the predictability of a Grid system [58].

Our work is based on meta-scheduling in advance in Grids rather than reser-

vations in advance, as reservations may not always be possible. The meta-sche-

duling in advance algorithm can be defined as the first step of the reservations

in advance algorithm, in which the resources and the time periods to execute

the jobs are selected (and the system keeps track of the decisions already made

and the usage of resources) but making no physical reservation.

This chapter presents the next main contributions. First, a framework built

on top of Globus [9] and the GridWay meta-scheduler [10] to manage QoS by

means of performing meta-scheduling in advance is presented. The use of this

framework allows jobs to be executed within their deadlines. Second, a proposal

of a new autonomic network–aware algorithm to tackle the scheduling in advance

problems. Thereby, the heuristics presented are concerned with the dynamic be-

havior of the Grid resources, their usage, the variable availability of resources,

and the characteristics of the jobs. Hence, no prior knowledge on the job dura-

tion is considered, as opposite to [3]. Thus, estimations on the completion times

of jobs need to be calculated. The autonomous behavior is obtained by means of

computing a trust value for each resource and performing rescheduling of failed

jobs. The resource trust means the accuracy of the previous estimations made

for jobs executed in each resource. Third, heuristics to calculate predictions on

the completion time of jobs are presented, which consider the network as a first

level resource.

The chapter is organized as follows. In Section 4.2 a brief overview of the

meta-scheduling in advance problem is presented. Section 4.3 explains the

framework to perform meta-scheduling in advance, which is the main contri-

bution of this chapter, paying special attention to the blocks that implement the

key functionalities mentioned in the paragraph above. The prediction techniques

developed are detailed in Section 4.4 whilst Section 4.5 presents the experiments

carried out for evaluating these proposals. Finally, the summary of the chapter

is outlined in Section 4.6.


4.2 Network–aware meta-scheduling in advance

Grid resources may vary dynamically as they may fail, join or leave the Grid

at any time. Moreover, this dynamism is also affected by the fact that every Grid

resource needs to execute local tasks as well as tasks from Grid applications.

It must be noted that from the Grid applications point of view, all the tasks

from both local users and Grid users are loads on the resource. Therefore,

everything in the system has to be evaluated by its influence on the execution of

the applications.

Owing to this fact, several projects have aimed at exploring advance reser-

vation of resources (among others, GARA [53], Grid Capacity Planning [54],

or VIOLA [55]), as they have been shown to increase the predictability of the

system [125]. A Grid reservation in advance process can be divided into two

steps [51]:

1. Meta-scheduling in advance: Selection of a resource to execute the job,

and the time period when the execution will be performed, but without any

physical reservation.

2. Negotiation for resource reservation: Consists on the physical reserva-

tion of the resources needed for the job, which may not always be possible.

There are two concepts: requesting a reservation and committing a reserva-

tion. A reservation request contains the start time and the requested length

of the reservation. For committing a reservation, the meta-scheduler up-

loads a commit message containing the job id. At this moment, the job

starts its execution in the previously selected computing resource.

Nevertheless, support for reservation in the underlying infrastructure is cur-

rently limited [53] [54] as they impose some performance penalty [56], typically

decreased resource utilization. Owning to these limitations that reservations in

advance present, this Thesis is based only on the first point, the meta-schedu-

ling in advance step, since reservations in advance may not always be possible

in a real Grid environment. Many resources cannot be reserved, due to the fact

that not all the Local Resource Management System (LRMS) permit them. Apart

from that, there are other type of resources such as bandwidth (e.g. the Inter-

net), which belong to several administrative domains making their reservation

4.2. Network–aware meta-scheduling in advance 69

more difficult or even impossible. This is the reason to perform meta-scheduling

in advance rather than advance reservations in order to address QoS in Grids.

This means that, the system keeps track of the meta-scheduling decisions al-

ready made in order to make future decisions without overlapping executions

but also without making any physical reservation. So, assuming a stable situa-

tion in which all the resources are available, if only Grid load exist, this would

be enough to manage QoS since the meta-scheduler would not overlap jobs on

resources. As many resources have a local load besides Grid load, monitoring

and prediction techniques are needed.

The algorithms for meta-scheduling in advance need to be efficient so that

they can adapt themselves to dynamic changes in resource availability and user

demand without affecting system and user performance. Moreover, they must

take into account resource heterogeneity as Grid environments are typically

highly heterogeneous. For this reason, it could be useful to employ techniques

from computational geometry to develop an efficient heterogeneity–aware sche-

duling algorithm [3]. In our research, the techniques proposed in [38] which

convert the information about resource usage into a geometric way are used to

select efficiently the resource and the time period which are suitable for meeting

the QoS requirements of the job. On the other hand, the network is also taken

into account in the meta-scheduling process since it has a major impact on jobs

performance, as studied in [46] [112] [126] [127], among others. Thereby, our

research focuses on low–cost computational heuristics to perform meta-schedu-

ling in advance that consider the network as a first level resource.

In this work we focus on applications where jobs do not have workflow de-

pendencies [11]. In this type of applications the user provides both the input

files and the application itself. However, with the start time and the deadline of

each job, we can set a specific workflow among jobs.

Taking into account these assumptions, an scheduling in advance process

follows the next steps (see Figure 4.1):

1. First, a user sends a request to the meta-scheduler at his local adminis-

trative domain. Every request must provide a tuple with information on

the application and the input QoS parameters: (in_file, app, t_s, d). in_file

stands for the input files required to execute the application, app. In this


Figure 4.1. Meta-Scheduling in Advance Process

approach the input QoS parameters are just specified by the start time, t_s

(earliest time jobs can start to be executed), and the deadline, d (time by

which jobs must have been executed) [38].

2. The meta-scheduler communicates with the Gap Management entity and

executes a gap search algorithm. This algorithm obtains both the resource

and the time interval to be assigned for the execution of the job. The heuris-

tic algorithms presented here take into account the predicted state of the

resource (both for computational resources and interconnection networks),

the jobs that have already been scheduled and the QoS requirements of the

job.

3. If it is not possible to fulfill the user’s QoS requirements using the resources

of its own domain, communication with meta-schedulers from other do-

mains starts.

4. If it is still not possible to fulfill the QoS requirements, a renegotiation

process is started between the user and the meta-scheduler in order to

define new QoS requirements.

4.2. Network–aware meta-scheduling in advance 71

As it can be seen in Figure 4.1, there may be one or more meta-schedulers

in each domain. However, all of them have to communicate with the same Gap

Management, which is in charge of the resource usage of that domain. This

entity (Gap Management) should be replicated in order to avoid the single point

of failure problem. It is also possible to split the resources in the local domain

into several subdomains if needed when the number of resources in a domain

grows too high. So, this model is scalable as it is possible to assign some re-

sources to one Gap Management and other resources to another one. Each

meta-scheduler asks to its local Gap Management entity to allocate the jobs. In

this way, the Gap Management of each domain only has to maintain the status

of its own resources, whilst the meta-scheduler is the entity in charge of asking

other meta-schedulers in the case that it is not possible to allocate the job in the

local domain.

In the case that the local domain lacks the resources to allocate the job, a

communication with meta-schedulers of other domains starts, as the third step

depicts. In order to perform the inter-domain communications efficiently, tech-

niques based on Peer-to-Peer (P2P) systems (as proposed by [128] [129] [130] [131],

among others) can be used. In this way, the meta-scheduler at each domain

knows some of the meta-schedulers at other domains, and can forward jobs to

them when needed.

In the case that the QoS required by the user cannot be addressed (not even

in other domains), a renegotiation process may start (fourth step). As a result of

the renegotiation process, the user could resubmit the job with less strict QoS

requirements, or try again later with the same requirements, or just quit the

execution of the job.

This renegotiation, as well as the overall interaction with users, can be con-

ducted by means of Service Level Agreement (SLA). A scheme for advancing and

managing QoS attributes contained in Grid SLAs contracts, following the Open

Grid Service Architecture (OGSA), can be implemented. For instance, in [132] it is

introduced an Execution Management Service which collaborates with both the

application services and the network services in order to provide an adjustable

quality of the requested services. Hence, in the proposed framework, an imple-

mentation where the components that manage and control the job submissions

interact with an SLA-related service can be used for negotiating (and renegoti-


Figure 4.2. Scheduling Order.

ating when needed) the desired QoS with the user who submitted the job. A

proposal based on these ideas with the aim of providing this capability to our

system, has been presented in [35] [133] [134].

It must be noted that, under this scenario, jobs are not executed in the order

they are scheduled, as Figure 4.2 depicts. This order depends on several facts,

such as the time constraints of jobs, the previous allocated jobs, the status of

the resources, and so forth. As it may be seen in Figure 4.2, the first submitted

job is not going to be executed first due to its start time restrictions. Those

restrictions lead to Job 1 being allocated into “resource A” at slots 7 to 12. After

that, Job 2 arrives, but its job start time restriction is “slot 1”. So, as it just

needs 5 slots to be executed in “resource A”, it may be executed before “Job 1”,

using slots 1 to 5. The same reasoning applies to “Job 5”.

On the other hand, in this meta-scheduling in advance process, the accu-

racy of available information regarding the resources is very important for the

scheduling tasks to be performed efficiently. However, the independence and

autonomy of domains is an obstacle due to the fact that some domains may

not want to share private information, such as information on the load of their

resources. Another obstacle is the resource contention in Grid environments

which causes host load and availability to vary over time. Hence, the prediction

of this information is a key parameter, even though it is quite difficult due to the

dynamic behavior of the Grid [68].

4.3. Meta-Scheduling in advance implementation 73

The prediction information can be derived in two ways [68]: application–

oriented and resource–oriented. For the application–oriented approaches, the

running time of Grid tasks is directly predicted by using information about

the application, such as the running time of previous similar tasks. For the

resource–oriented approaches, the future performance of a resource such as the

CPU load and availability is predicted by using the available information about

the resource, and then such predictions are used for predicting the running time

of a task, given the information on the resource requirement of the task.

In our case we use a mixture between these two approaches. We use applica-

tion–oriented approaches to sort out the execution time of the application and

resource–oriented approaches to calculate the time needed to perform the net-

work transfers and to tune the estimations made about job execution times.

4.3 Meta-Scheduling in advance implementation

One of the main contributions of this Thesis is the implementation of a frame-

work for network–aware meta-scheduling in advance, which is detailed in this

section. It is implemented as a layer on top of the GridWay meta-scheduler,

in the same way as Qu [57] described a method to overcome the shortcoming of

performance penalty by adding a Grid advance reservation manager on top of the

local scheduler(s). First, the structure of the framework is presented. Next, the

data structures used for managing this information, followed by the policies for

allocating jobs in resources are shown. Subsequently, the fault tolerance sup-

port is explained. Finally, the prediction techniques are discussed, within their

details on how the autonomic behavior of the framework has been implemented

by means of the aforementioned resource trust.

Our proposal is implemented as an extension to the GridWay meta-schedu-

ler [10], called Scheduling in Advance Layer (SA-Layer). This is an intermediate

layer between the users and the on-demand Grid meta-scheduler, as Figure 4.3

depicts. The SA-Layer is a modular component that uses functions provided by

GridWay in terms of resource discovery and monitoring, job submission and ex-

ecution monitoring, etc., and allows GridWay to perform network–aware meta-

scheduling in advance decisions. As the original parameters supported by Grid-


Figure 4.3. The Scheduler in Advance Layer (SA-Layer).

Way do not consider the network condition, GridWay has been extended to in-

tegrate network status information into the meta-scheduling process [112], as

explained in Chapter 3.

The SA-Layer stores information in databases concerning previous applica-

tion executions (called DB Executions), and the status of resources and network

over time (called DB Resources). The memory overhead of the SA layer is negli-

gible. In fact, the information is stored in a compact way. For instance, it there

are two or more execution of an application with the same input paramenters,

the information about execution and transfers times is summarized and only the

average time is saved. In this way, those files only have summarized information

about the resources and the job execution times.

A new parameter has been added to GridWay’s job template, named JOB_IN-

FORMATION. In this new parameter the user may indicate some information about

the job. First, if the user knows the size of the files to transfer in order to both

start and finish the execution of the job. It must be highlighted that this informa-

tion is not compulsory. After that, the user may set other characteristics related

to the jobs, such as job arguments, which enable a more accurate prediction for

the job execution time.

On that purpose, the execution time of jobs in a given resource is estimated

by the Predictor module, taking into account the characteristics of the jobs, the

power of the CPU of the resources and the network future status. In addition,


the Resource Trust component calculates the trust in resources in order to tune

the predictions depending on the information about the accuracy of the latest

jobs execution estimations for those resources. By processing this information

about applications and resources, a more accurate estimation of the completion

time of the job in the different computational resources can be performed.

In this implementation, resource usage is divided into time slots, whose du-

ration is a customizable parameter. Then, we have to schedule the future usage

of resources by allocating jobs in resources at a specific time (taking one or more

time slots). For this reason, data structures to keep track of the usage of slots

are needed, along with allocation policies (carried out by Gap Management mod-

ule in Figure 4.3) to find the best slots for each job. Furthermore, the way how

the framework has been implemented avoids deadlocks, as once a job is sched-

uled to be executed at a specific resource, if its deadline expires, the job will be

dropped. Thus, if a job has time–slots assigned but cannot use them (e.g. the

local user is using the machine), this will not affect other jobs being submitted

to the same resource.

This framework also presents an autonomic behavior which allows it to adapt

itself to changes in the Grid system. This autonomic behavior is made of two

different functionalities, namely Resource Trust and Job Rescheduling. The main

characteristics of the SA-Layer components are explained next.

4.3.1 Gap Management

The Gap management module represents the information of the tree data struc-

ture in a geometrical way. This module is in charge of using and keeping up–

to–date the information stored in the data structure. Each job is represented by

a single point in the plane as Figure 4.4 depicts. Labeled points represent the

idle periods (gaps) with start and finish time. The job coordinates are starting

time and ending time. P represents the earliest start and end times, whilst P’

represent the latest ones, for the current job. Thus, the line between P and P’

represents the periods when this new job can be scheduled. All the points above

and to the right of this line represent possible gaps to allocate the job. Notice

that the job allocation influences how many jobs can be scheduled due to the

generated fragmentation. In this work, fragmentation refers to the free time slots


Figure 4.4. Idle periods regions [3].

in between two consecutive allocations. Different ways of searching and allocat-

ing jobs into resources can be developed considering both the already scheduled

jobs and the generated fragmentation.

A First Fit policy has been considered, which selects the first free gap found

that fits the new job. It can create big fragmentation, as a result of which many

jobs may be rejected. There also exist other techniques like Best Fit. This last

policy selects the free gap which leaves less free time slots after allocating the

job. The fragments are smaller, but it is harder to use those free slots to allocate

new jobs.

Although Best Fit usually outperforms First Fit, it is more computationally

complex since all the resources must be searched to find the most suitable gap.

As opposed to it, First Fit does not have to search all the resources, which makes

it more scalable. Furthermore, this Thesis has also worked on re-scheduling

techniques that can fix the fragmentation created by First Fit and enhance the

overall performance of the system.

On the other hand, the idle periods are split into subsets (the strips in the

Figure 4.4) enabling the natural implementation of a variety of strategies for

selecting one among multiple feasible idle periods and reduce the complexity of

the gap selection. There is a red–black tree for each strip. In this way, only the

gaps within the limits of each strip are in the tree of that strip. The size of these

strips is a customizable parameter. If this parameter has a high value, there will

be a great number of gaps per tree and fewer number of trees. By contrast, if


this parameter has a low value, there will be a greater number of trees but with

fewer number of gaps.

Apart from that, Castillo explains in [38] that the trees can be divided into

two regions, named R1 and R2, as Figure 4.4 depicts. R1 region represents the

gaps which start at or before the job’s ready time. Therefore, any idle period

in this region can accommodate the new job without delaying its execution. R2

region represents the gaps which start later than the job’s ready time.

It is important to recall that a job scheduled in an idle period will create at

most two new idle periods: one between the beginning of the gap and the start

of the job (the leading idle period), and one after the end of the job and the end

of the original idle period (the trailing idle period). Consequently, the leading idle

period will have zero length at any point in the region R2, since the start time of

this gap is later than the job start time. Thus, R2 region is searched first in order

to reduce the generated fragmentation. Note also that the later the starting time

of a gap is, the longer the execution of the new job will be delayed. So that, this

region is searched from top to bottom in order to minimize the job turnaround

time. On the other hand, if there is not any available gap in R2 region, then a

feasible gap is searched in R1 region from bottom to top. Again, the reason is to

generate less fragmentation.

4.3.2 Data Structure

One of the most important aspects in the registration of the resource free time

slots is the data structure used. So, the scheduling technique presented here

needs to have a suitable data structure to be able to manage all the information

efficiently. A suitable data structure yields better execution times and reduces

the complexity of algorithms. Furthermore, the data structure will also influence

on the scalability of those algorithm.

There are several structures for managing this information needed by the

scheduler, and a survey can be found in [37]. For instance, Grid Advanced

Reservation Queue (GarQ) [37] is a combination of Calendar Queue [61] and

Segment Tree [62], for administering reservations efficiently. In this work, red–

black trees are used as they provide us with efficient access to the information

about resource usage, as has been demonstrated in [38]. This data structure


Figure 4.5. Example of a red–black tree.

stores the free time periods of each resource in contrast to GarQ which stores

the information about the reservations made. As a result, when the number of

jobs submitted is high, the number of free time periods would be lower.

That is the reason why the data structure used in this work is red–black

trees [135]. The objective of using this type of trees is to develop techniques that

efficiently identify feasible idle periods for each arriving job request, without

having to examine all idle periods [3]. A red–black tree [135] is a special type of

binary tree where each node has a color attribute – which can be either red or

black (see Figure 4.5). This kind of trees has additional requirements over the

ordinary requirements imposed on binary search trees.

These constraints enforce a critical property of red–black trees: the longest

path from the root to any leaf is no more than twice as long as the shortest path

from the root to any other leaf in that tree. So, these trees are roughly balanced,

and as a result of that, inserting, deleting and finding values require worst–case

time proportional to the height of the tree (O(log n)). Thereby, this theoretical

upper bound on the height allows red–black trees to be efficient in the worst–

case, unlike ordinary binary search trees. Besides, this data structure is more

scalable since it is possible to have several red–black trees, each one keeping

information about the resources usage of a certain time period (the strips in

Figure 4.4).

The red–black trees used in this framework differ from [3] in the informa-

tion stored in the leaves, namely the mean instead of the median. So, a low

computational cost implementation is obtained for a real Grid system.


4.3.3 Job Migration

In order to make the system fault tolerance, the SA-Layer system also needs

a mechanism to deal with resource failures (whatever the problem could be)

to try to build a reliable system. This feature is very important in Grids as

resources may join and leave the Grid at any time, and failures of resources are

the rule rather than the exception. Hence they should be taken into account

in order to provide a reliable service [136]. This feature improves the autonomic

behavior of the framework because of the variable availability of resources. Thus,

when a resource quits the system (e.g. the resource fails or it is shutdown), the

jobs scheduled on it (including currently running jobs) have to be reallocated to

other hosts. The way how jobs are rescheduled is the same as when they were

submitted the first time.

This task is performed by the Job Migrator module (see Figure 4.3). This

module is in charge of monitoring currently active resources and it is checked

every time slot. When this module detects that a resource is no longer available,

it performs the next steps:

1. Select the jobs scheduled for the unavailable resource.

2. Delete those scheduling decisions and releases the reserved time slots.

3. Re-schedule the jobs to other resources whenever possible (by ussing the

Job Rescheduler module).

As it can be seen, this module is highly related to the Job Rescheduler mod-

ule, which is explained in next chapter (Section 5.3).

4.3.4 Predictor

This type of scheduling needs to know the job duration into resources and its

waiting times in queues. To this end, there are works which aim at estimat-

ing this queue waiting times to be able to make the scheduling process right.

For instance, Queue Bounds Estimation from Time Series (QBETS) [63], can es-

timate the probability that a job will wait no longer than startDeadline minutes

if it is submitted at time T . Based on this, the Virtual Advance Reservations


Queues (VARQ) [64] implements a reservation by determining when (according

to predictions made by QBETS) a job should be submitted to a batch queue so

as to ensure it will be running at a particular point in future time.

On the other hand, scheduling in advance needs to perform predictions about

the future network status and about job duration into resources. A survey of

some predictions techniques can be found in [65]. Techniques for such predic-

tions include applying statistical models to previous executions [66] and heuris-

tics based on job and resource characteristics [46] [67]. In [66], it is shown

that although load exhibits complex properties, it is still consistently predictable

from past behavior. In [67], an evaluation of various linear time series models

for prediction of CPU loads in the future is presented.

However, predictions of job execution time are quite difficult to obtain since

there are performance differences between Grid resources [68]. Furthermore,

their performance characteristics may vary for different applications (e.g. re-

source A may execute an application P faster than resource B, but resource B

may execute application Q faster than A). However, as many resources have a

local load besides Grid load, monitoring and prediction techniques are needed.

In this Thesis (a) application–oriented approaches are used to sort out the

execution time of the application. This means that the running time of Grid

tasks is directly predicted by using information about the applications, such

as the running time of previous similar tasks. Moreover, (b) resource–oriented

approaches are also used to calculate the time needed to perform the network

transfers. This means that the future performance of a resource is predicted (by

using the available information about the resource) and used for predicting the

running time of a task. Then, (c) again a resource–oriented technique is used

to tune the predictions. However, to do that, some information about status of

resources and previous job executions need to be stored. For these reasons, in

the SA-Layer there are two databases which store the information about pre-

vious resource status (DB Resources) and about previous job executions (DB

Executions).

In this way, the algorithm proposed by Castillo [3] is extended to take into ac-

count the heterogeneity of Grid resources by means of prediction techniques. A

straightforward implementation of this algorithm is used as a comparison model.

4.4. Prediction Techniques 81

Then, different extensions to Castillo algorithm have been implemented within

the proposed framework in an incremental way with the aim of making the job

durations estimations as much accurate as possible. First, we have developed

an implementation which calculates the total completion time of jobs in each

resource based on log data from previous executions (named Total Completion

Time (TCT)). In a second phase, that implementation has been modified to esti-

mate the total completion time of jobs by taking into account the times needed

for executing the jobs and the time needed for transferring the files separately

(named Execution and Transfer Time Separately (ETTS)). Third, based on the sec-

ond implementation, we have implemented some heuristics to take into account

the resource trust to tune the prediction on the execution time of jobs (named

Resource Trust (RT)). Finally, based on the previous one, we have developed

a technique to make predictions about the future status of the Grid resources

(network inclusive) based on exponential smoothing functions (named Exponen-

tial Smoothing (ExS)). This information is used for adjusting the job duration

estimations by considering the possible future status of resources and intercon-

nection networks. The main points of these prediction heuristics are explained

in next section.

4.4 Prediction Techniques

Before explain in detail each techniques, it is important to highlight that, for all

the implementations, predictions are only calculated when a suitable gap has

been found in the host. In this way, there is no need to calculate the prediction

times for all the hosts in the system – which would be quite inefficient. Also

note that two applications are considered to belong to the same application type

when they have the same input and output parameters – in terms of number,

type and size.

One of the main advantages of all the techniques presented here is that they

pay attention to the heterogeneity of Grid resources and do not assume that

users have prior knowledge on the duration of jobs. Therefore, the information

needed to estimate the times (completion times, execution times and transfer

times) is stored in the Databases (DBs). The information related to previous

executions of the applications is stored in DB_Executions, and includes the com-


pletion time, the execution time, and the input and output transfer times. If

there is not any historical data about the execution of a certain application, a

defect time value is assigned to this first execution of this application in the case

that the user does not provide any information about that. As a result of that, at

least one execution of each type of application needs to have been executed to be

able to obtain a somehow reliable prediction. On the other hand, DB_Resources

stores the information related to the previous status of the resources (network

bandwidth included). This data base keeps a trace of the status of the CPU, the

RAM memory, and the bandwidth.

With the aim of clarifying the notation of the following algorithms, a summary

of the common notation used is outlined next, being:

• R: set of resources known {ri / i in [1..n] }

• j = the job to be executed

• n = the number of samples of the completion times for the application j in

the resource ri

• tricompletion(j)k = the k − th completion time for the application j in the re-

source ri

• triexecution(j)k = the k− th execution time for the application j in the resource

ri

• triestimated(j) = the execution time estimation for the application j in the re-

source ri

• trireal(j) = the real execution time for the application j in the resource ri

• initT = the start time of the job

• d = the deadline for the job

• size = the number of bytes to be transferred

• sizeIN = the number of input bytes to be transferred

• sizeOUT = the number of output bytes to be transferred

• Bw(ri, t) = the bandwidth for the resource ri for the t minute of the day

before


Algorithm 5 Total Completion Time (TCT)

1: for each ri having a gap do

2: TCTri =∑

nk=1

tricompletion

(j)k

n

3: end for

• RT (ri) the trust value for the resource ri

• RT rij the total time to execute j in resource ri

• CPU_free(ri, initT, d) the mean percentage of free CPU in the resource ri

between initT and the deadline d, calculated by using the Exponential

Smoothing (ExS) function

• Overloadri : the extra time needed due to the CPU usage at the chosen re-

source ri

4.4.1 TCT Technique

The first implementation mentioned above takes into account the heterogeneity

of Grid resources by calculating the total completion time of jobs in each re-

source based on log data about past executions. Because of this, it is called

Total Completion Time (TCT). These estimations consider transfer and execution

times (including execution and queueing times) altogether.

This implementation is explained in Algorithm 5. It shows that, for each

resource having a gap in which the application j can be executed (line 1), an

estimation on the completion time of that application on that resource is calcu-

lated. This estimation is calculated following line 2 and works as follows. An

estimation of the next execution of the application j in the resource ri is calcu-

lated using the stored completion times of the executions of j in ri (which are

stored in the module DB_Executions), by working out the average of all of these

stored executions.

4.4.2 ETTS Technique

The second implementation considers the times needed to execute the jobs and

the time needed to transfer the files separately [110]. Thus, it is called Execution

and Transfer Time Separately (ETTS), and is explained in Algorithm 6. As before,


Algorithm 6 Execution and Transfer Times Separately (ETTS)

1: for each ri having a gap do2: Prolog = TransT_Estimation(ri, initT, d, sizeIN )3: Epilog = TransT_Estimation(ri, initT, d, sizeOUT )4: ExecT = ExecT_Estimation(j, ri)5: ETTSri = Prolog + ExecT + Epilog

6: end for

Algorithm 7 Estimation of execution time (ExecT_Estimation)

1: ExecT_Estimation =∑

nk=1

triexecution

(j)kn

2: return ExecT_Estimation

Algorithm 8 Estimation of transfer time (TransT_Estimation)

1: MeanBw =∑

dt=initT

Bw(ri,t)

(d−initT )

2: TransT_Estimation = sizeMeanBw

3: return TransT_Estimation

resources are explored in the seek of a suitable gap for the execution of this job,

and for each gap found (line 1), estimations on the transfer times of the input

files (line 2), output files (line 3), and execution time (line 4) in the resource ri

having the gap are calculated. After that, the prediction on the total completion

time of the job in the resource ri is calculated by adding the aforementioned

estimations (line 5).

For the execution time, an estimation is calculated using information of pre-

vious executions, as it is depicted in Algorithm 7. This algorithm uses all

the execution times records in the database (which are stored in the module

DB_Executions) for the application j in a resource ri to calculate the mean exe-

cution time for j in ri – this includes execution and queueing times.

The way of calculating the transfer times is outlined in Algorithm 8. In this

case, the mean bandwidth of the day before for the time interval where the job

will be allocated is calculated (line 1). The measures for this time interval are

stored in DB_Resources. In this work, the day before log is used for calculating

the future bandwidth because, as presented in [66], the future status is pre-

dictable from past behaviors, and it is an easy technique that works well enough.

Moreover, these estimations leave a margin (they are overestimated 20 % more)

in order to ensure that the transfers can finish on time if the predictions are not

enough accurate. The time needed to complete the transfers is estimated using

this information, along with the total number of bytes to transfer (line 2).


On the other hand, estimating those times into a separated way give us the

possibility of overlapping network transfers and job execution times into re-

sources. This means that when a resource is executing a job, the files needed

by the next job to be executed in that resource can be being transferred too.

4.4.3 RT Technique

Predictions of job execution times could be inaccurate as resource performance

can quickly change depending on use (owner user load and/or other Grid user

loads). So, the system needs to react to those changes in performance to be

able to predict the job times as accurately as possible. That is the reason why

our system takes into account the previous resource performance where the

executions will take place in order to retune those previous predictions about job

execution times. Accordingly, the system has to be able to autonomously tune

the predictions calculated for the execution time of jobs by using the information

about latest N errors made in predictions for each resource.

This implementation is based on ETTS but tuning the predictions on the ex-

ecution time of jobs by means of obtaining how reliable is the resource perfor-

mance, named Resource Trust. This resource trust is calculated by using the

Equation 4.1,

RT (ri) =

∑nj=(n−N)(t

riestimated(j)− trireal(j))

N(4.1)

being RT (ri) the trust in the resource ri; triestimated(j) is the job execution time

estimation made for the jth jobs execution in resource ri; and trireal(j) is the real

execution time of job jth in resource ri.

As a result, the confidence in the estimations depends on how trust–worthy

is the resource in which the job will be run in the moment when the scheduling

process takes place. This resource trust may be different from the trust in the

moment when the job is executed. This function takes into account the last N

execution times and their predictions in a specific resource ri to calculate the

trust in that specific resource, ri. The output of this function is the mean of the

errors made in those N predictions and it is used to tune the prediction made for


Algorithm 9 ETTS extended with Resource Trust (RT)

1: for each ri having a gap do2: Prolog = TransT_Estimation(ri, initT, d, sizeIN )3: Epilog = TransT_Estimation(ri, initT, d, sizeOUT )4: ExecT = ExecT_Estimation(ri, j)5: if RT (ri) < 0 then6: ExecT = ExecT + |RT (ri)|7: end if8: RT ri

j = Prolog + ExecT + Epilog

9: end for10: return RT ri

j

the job execution times in that resource. N is a customizable parameter which

depends on the how long in the past we want to go when estimating the recent

resource behavior. It must not be a high value as we want to measure just the

last performance. Nevertheless, it must also not be too low since this may lead

to a wrong measurement of the resource performance. Based on that, an after

a tuning process, we have set this value to 20, meaning a sample of how each

resource have been behaving in the last 20 minutes to 1 hour, depending on the

total Grid load.

This implementation is called RT and is detailed in Algorithm 9. Estimations

on execution and transfer times are calculated in the same way as explained

for ETTS (using Algorithms 7 and 8). With this information and the information

about trust in resource ri, labeled as RT (ri), the execution time is tuned (line 6)

and an estimation for the total completion time of the job, RT rij , is calculated

(line 8).

4.4.4 ExS Technique

Finally, we have developed another improvement to the previous predictions

techniques. Again, the predictions for the duration of jobs are calculated by

estimating the execution time of the job and the time needed to complete the

transfers separately, as it has been demonstrated that they yield better result

than performing predictions for execution and transfers time altogether [110].

The new improvement is based on an exponential smoothing function which

is used to tune the predictions of both the execution time of the job and the


Algorithm 10 Estimation of Execution Time (ExS Estimation)

1: ExecutionT ime =∑

nk=1

triexecution

(j)kn

2: Overloadri = ExecutionT ime ∗ (1− CPU_free(ri, initT, d))3: ExecutionT ime = ExecutionT ime+Overloadri4: return ExecutionT ime

network transfers times. This exponential smoothing function is detailed in next

subsection.

Regarding predictions on the execution times, they are performed as ex-

plained in Algorithm 10. An estimation for the execution time is calculated as

the average of previous executions (line 1). After that, the prediction on the fu-

ture status of the CPU of each resource is calculated by means of an exponential

smoothing function, as well as the overload generated due to the resource status

(line 2). Finally, the mean execution time is tuned by using the prediction about

the future CPU status of each resource (line 3).

With regard to transfer times, the mean bandwidth within the time period

between the start time of the job and its deadline is also calculated by using an

exponential smoothing function. Again, using this information along with the

total number of bytes to transfer, the time needed to complete the transfers is

estimated. The next section details the exponential smoothing functions used in

this work.

Finally, when both times have been estimated, the last behavior of the chosen

resource (resource trust) is calculated following Equation 4.1 and it is used in

the same way as in Algorithm 9 to tune the final execution time estimated for

the job.

4.4.5 Exponential smoothing predictions

Even though the development of statistical techniques is out of the scope of this

dissertation, we have used and adapted the ExS prediction method to calcu-

late predictions on the status of resources. So, to better understand the ExS

technique we detail it next.

Exponential Smoothing (ExS) [105] is a statistical technique for detecting sig-

nificant changes in data by ignoring the fluctuations irrelevant to the purpose


at hand. It provides a simple prediction method based on both historical and

current data [137]. In ExS (as opposed to moving average smoothing) older data

is given progressively–less relative weight (importance) whereas newer data is

given progressively–greater weight. In this way, ExS assigns exponentially de-

creasing weights as the observations get older. Hence, recent observations are

given relatively more weight in forecasting that the older ones.

ExS is a procedure for continually revising a forecast in the light of more

recent experience and it is employed in making short-term forecasts. There are

several types of ExS’s. In this work a triple exponential smoothing is used,

which is also named Holt–Winters [105]. With this kind of ExS the trend and

seasonality of data are taken into account for the predictions. Trend refers to

the long term patterns of data, whilst seasonality is defined to be the tendency

of time-series data to exhibit behavior that repeats itself every L periods. Such

trends are apparent when a user wants to execute a large, complex simulation

on a Grid at periodic intervals (such as for analyzing sales data or running a

scientific experiment with new observations). Conversely, at vacation periods,

probably not all the staff may take vacation at the same time, and the load on the

resources would be progressively decreasing. We have chosen ExS because our

data is likely to present both behaviors. ExS also provides a simple and efficient

method which can be implemented without slowing down the performance of the

system.

Regarding seasonality, it is likely that CPU availability increase at night, even

more in our scenario in which resources are shared with other users. Similarly,

resource usage may also depend on the week day being considered, with greater

workload during the week and greater availability at weekends. Thus, data col-

lection and analysis needs to be run at different times of the day to make an

accurate prediction about resource status. To this end, a weekly log is used

as input to the ExS function for predicting the future status of network and re-

sources for the next whole day, as we are then able to account for both seasonal

behaviors. In our approach, predicted information is updated every 30 minutes

to improve the results from time to time depending on the new knowledge ob-

served from the recent resource and network behaviors. The forecasting method

used is presented in Equation 4.2. At the end of a time period t and xt being the

observed value of the time series at time t (in our case, the CPU usage), ft+m is


the forecasted value for m periods ahead, Tt is the trend of the time series, Lt is

the deseasonalized level and St is the seasonal component.

ExS =deadline∑

m=initT

ft+m =deadline∑

m=initT

(Lt + Tt ∗ (m+ 1))St+m−L (4.2)

Lt = α ∗xt

St−L

+ (1− α) ∗ (Lt−1 + Tt−1) (4.3)

Tt = β ∗ (Lt − Lt−1) + (1− β) ∗ Tt−1 (4.4)

St = γ ∗xt

Lt

+ (1− γ) ∗ St−L (4.5)

The deseasonalized level (Lt) is calculated as shown in Equation 4.3, tak-

ing into account the previous values obtained for trend and seasonality and the

actual value observed. The new trend of the time series (Tt) is the smoothed

difference between two successive estimations of the deseasonalized level as de-

scribed in Equation 4.4. Finally, the seasonal component (St) is calculated using

Equation 4.5. This expression contains a combination of the most recently ob-

served seasonal factors given by the demand xt, divided by the deseasonalized

series level estimate (Lt) and the previous best seasonal factor estimate for this

time period. Thus, seasonality indicates how much this period typically deviates

from the period (in our case weekly) average. At least one full season of data is

required for the computation of seasonality.

In the equations, α, β and γ are constants that must be estimated in such

a way that the mean square error is minimized. These weights are called the

smoothing constants. For each component (level, trend, seasonal) there is a

smoothing constant that falls between zero and one. It is important to set a

correct value for them to predict the behavior of resources and network as ac-


curately as possible. In our work, the R program [138] is used for calculating

these parameters. We need to use at least a two week log, divided into weekly

data sets. Using the first of these data sets (one week log), the R program tries

to estimate these values for the last week. These results are then compared with

the real status registered for the following week, and the α, β and γ values are

adjusted to minimize the mean square error.

4.5 Evaluation

This section describes both the experiments conducted and the results ob-

tained in order to test the usefulness of the SA-Layer in a real Grid environment.

In this section, a comparison between the SA-Layer with the four techniques

of calculating the job completion time detailed before and a straightforward im-

plementation of the algorithm proposed by Castillo et al. in [3] is presented. The

techniques to calculate the job completion times are (1) estimations on the To-

tal Completion Time of jobs (labeled as TCT); (2) estimations on Execution and

Transfer Time Separately (labeled as ETTS); (3) ETTS extended with Resource

Trust (labeled as RT); (4) and RT extended with exponential smoothing functions

that predict the future status of resources and network (labeled as ExS).

As state above, they are compared with the algorithm proposed by Castillo et

al. [3] (labeled as Castillo). This straightforward implementation is based on a

linear function which estimate the time needed to complete a job in a resources

taking into account its inputs parameters. It does not take into account the

different resource performance over time, only the input and output parameters

of the job and the knowledge about its behavior. Hence, using this kind of

estimation to predict the job execution time, all the predictions for the execution

times of an application with the same parameters into a specific resource is going

to obtain the same value, without taking into consideration the current or future

resource status. To evaluate the performance of those techniques of estimating

the time needed to complete the job, several statistics are used.

• Scheduled job rate: fraction of accepted jobs, i.e., those whose deadline can

be met [38].

4.5. Evaluation 91

• QoS not fulfilled: means the number of rejected jobs, plus the number

of jobs that were initially accepted but their executions were eventually

delayed. Thus, their QoS agreements were not fulfilled (e.g. the deadline

was not met).

• Overlap: records the number of minutes that a job execution is extended

over the calculated estimation.

• Waste: records the number of minutes that are not used to execute any job

because the predicted execution time of jobs was longer than their actual

execution time.

The first two statistics are measures from the QoS perceived by the user.

Recall that in this work, QoS not fulfilled includes rejected jobs since they are

jobs which have not been executed with the QoS requested and the QoS specified

for each job is always reasonable in this experiment. The job QoS requirements

could be more or less strict but always (if the system is empty) the job could

be scheduled and executed meeting the QoS requested. On the other hand, the

last two statistics are measures of system performance (meta-scheduler point of

view), and they are related to the accuracy of predictions.

4.5.1 Testbed

The evaluation of the implemented framework has been carried out in the same

real Grid environment described in Chapter 3 at Section 3.4.1.

4.5.2 Workload

As in the previous chapter, one of the GRASP [115] benchmarks, named 3node,

has been run to evaluate the implementation. Again, the reason to select this

test was its versatility at generating different kind of jobs (computational inten-

sive and network demanding) in an easy way.

However, there are other important parameters that must be considered in

the workload when measuring the performance of the framework, as can be seen

in Figure 4.6. T_max reservation represents how far in advance we can schedule

a job. T_Execi is the time needed to execute the job i. Scheduling Window shows


Figure 4.6. Workload characteristic.

the time interval in which the job has to be scheduled. Taking the last two

parameters together we can obtain the Laxity which represents how strict the

user is when scheduling a job. It means the difference between the Scheduling

Window and the T_Exec for a job. Finally, Arrival Ratio depicts the average time

between two jobs sent.

For this evaluation, both the compute_scale and the output_scale take values

between 0 and 20, being the average 10. The input file size is 100 MB, and these

values of output_scale create output files whose size is between 100 MB and

2 GB. Both values are related to T_Exec of Figure 4.6 (the greater they are, the

longer the jobs execution will be). Thanks to this, both compute and/or data in-

tensive applications are created, which are mixed in the submission. Therefore,

the different jobs of the workload have different QoS requirements and different

behaviors. Recall that in this work we focus on applications where users provide

both the input files and the application itself.

On the other hand, the T_max reservation is up to 1 hour, with an average

of 30 minutes. The Laxity is set between 0 and 10 minutes, being the average

5 minutes. In this way, users always make a request to run a job with a reason-

able QoS as the laxity is never negative. The submission rate is from 1 to 6 jobs

per minute (following a random uniform distribution) and the total submission

time takes 1 hour. Finally, time slots last 1 minute and each strip last 10 min-

utes (there is a tree for every 10 minutes, with the aim of not having big trees

with may delay the searching techniques).

As jobs executed in a Grid environment have a delay in their executions (set-

ting the environments, communication between resources, transfer input and

output files, execution itself, . . . ), an execution that longs less than a minute is

not quite common. Therefore, having slots of 1 minute means sufficient granu-

4.5. Evaluation 93

0

20

40

60

80

100

1 2 3 4 5 6

% o

f S

ch

ed

ule

d J

ob

s

Number of Submitted Jobs per minute

CastilloTCT

ETTSRT

ExS

(a) Jobs Scheduled

0

5

10

15

20

25

30

35

40

45

1 2 3 4 5 6

% o

f Q

oS

No

t F

ulfill

ed


CastilloTCT

ETTSRT

ExS

(b) QoS not Fulfilled

Figure 4.7. Comparison of the different estimation techniques from the Users’ view-point.

larity for Grid environments. Moreover, it also leaves some margin to the predic-

tions in case of inaccuracies. The results shown are the average of 5 executions

for each case.

4.5.3 Experiments and results

Results from user point of view are depicted in Figure 4.7 and 4.8, whilst the

result from the system viewpoint are depicted in Figure 4.9. First, Figure 4.7 (a)


represents the percentage of scheduled jobs – the meta-scheduler has enough

free slots to allocate them, meeting the QoS requirements. The average, maxi-

mum and minimum obtained results are also plotted. It depicts that the more

jobs there are in the system, the more lost jobs there are. All the algorithms

have a similar behavior at low loads. The differences appear when the system

load is higher and more jobs are rejected for all the techniques.

When using different estimations for the transfers and for the execution itself

(ETTS, RT and ExS), the number of accepted jobs is noticeable higher then when

making just estimations for the total job duration (Castillo and TCT). Among the

techniques that present the better behavior, there is not large differences. As

RT and ExS take into consideration the lastly resource performance, and the

second one also the predicted status for the network and for the computational

resource, they make more accuracy estimations about jobs duration in spite of

accepting a slightly lower number of jobs. This will also lead to lower jobs not

meeting their QoS requirements after having been accepted to be executed (as

Figure 4.8 depicts).

Figure 4.7 (b) shows the percentage of jobs that were not executed with the

QoS requested, including lost jobs and jobs completed beyond the agreed dead-

line. Again, the more jobs there are in the system, the more jobs not executed

with the requested QoS there are. For lower submission rates (1 and 2 jobs

per minute), it is not essential to make separate estimations for executions and

transfer times since there are enough free slots for allocating most of the jobs.

Although better results are again obtained for the metrics which take into ac-

count that the resources present different performance over time.

However, for higher submission rates (3 jobs per minute onwards) the pre-

diction algorithm becomes very important. As Figure 4.7 (b) shows, a noticeable

reduction in the number of lost jobs is achieved when making separately predic-

tions for transfers and job execution (ETTS, RT and ExS), and it is more high-

lighted when the different resource performance over time is taken into account

(RT and ExS).

From those plots it may be deducted that ExS presents the best results from

the users’ point of view. In spite of not accepting as many jobs as ETTS or RT,

ExS performs more accurate estimations and it is less likely that accepted jobs

4.5. Evaluation 95

0

5

10

15

20

1 2 3 4 5 6

% o

f Q

oS

No

t F

ulfill

ed

pe

r A

cce

pte

d J

ob


ETTS RT ExS

Figure 4.8. QoS not fulfilled per accepted Job.

do not fulfill the agreed QoS. For low loads, there is not large differences among

them since not so many jobs fails after being accepted owing to the fact that

the system is not very overloaded. However, when the system load is higher,

giving some extra time for the execution of the jobs (number of slots reserved)

depending on the predicted status of resources, as ExS actually does, leads to

few jobs failing its QoS requirements. Then, in spite of accepting a slightly

lower number of jobs, the predictions are more accurate and it is less likely that

they need more time to finish their executions. Hence, it is a more conservative

technique, but it also presents a more reliable behavior regarding QoS provided

per accepted job, as Figure 4.8 depicts. That figure highlights the accuracy of the

predictions made by the ExS, presenting more uniform behavior regarding the

percentage of accepted jobs that finally do not fulfill the agreed QoS regardless

the number of jobs submitted per minute. In fact, this percentage is quite small

compared with the other two techniques (60 % reduction on average regarding

RT and 80 % regarding ETTS).

It must also be noted that ExS usually presents bigger variability than the

rest of techniques regarding accepted jobs and QoS not fulfilled. This is because

it takes heed of changing resource behavior and consequently, depending on the

environment status, it adapts its predictions and accept a different number of

jobs.

From the point of view of the system, Figures 4.9 (a) and (b) depict the mean

overlap and waste times when calculating the job completion time estimations,

respectively.


0

20

40

60

80

100

120

140

160

1 2 3 4 5 6

Me

an

Ove

rla

p p

er

Jo

b (

in s

ec.)


CastilloTCT

ETTSRT

ExS

0

20

40

60

80

100

120

140

160

1 2 3 4 5 6

Me

an

Wa

ste

pe

r Jo

b (

in s

ec.)


CastilloTCT

ETTSRT

ExS

(a) Overlap time (b) Waste Time

Figure 4.9. Comparison of the different estimation techniques from the System’sviewpoint.

Figure 4.9 (a) shows a greater overlap when using ETTS as there is a greater

number of running jobs. However, when using RT or ExS, the number of ac-

cepted jobs is close to ETTS, but they present much less overlap. What more,

ExS presents the more reliable and uniform behavior since the generated overlap

grows as load does. As a result of having lower overlap time, there will be fewer

jobs that do not meet the QoS requirements. This actually explains the results

depicted in Figure 4.8.

On the other hand, the reduction in the ETTS overlap from 4 jobs per minute

onwards is due to the fact that the system cannot accept more jobs since it

is saturated. Thus, the generated fragmentation may help the system not to

increase the overlap. If there would be a job in those free gaps, and other job

execution were longer than expected, the total overlap would be increased since

the overlap for the next jobs would consequently be bigger. It must also be noted

that, for very high loads in the system (5 and 6 jobs per minute), Castillo and

TCT techniques produce less overlap, but having accepted much fewer jobs than

the other three techniques.

Figure 4.9 (b) highlights that even having a higher number of running jobs,

the waste is lower when using ETTS, RT or ExS than when using TCT or Castillo.

That is, estimating execution and network times separately clearly has a good

influence in the performance of the system. The basic for this is that estimations

are more accurate. This also explains the results showed in Figure 4.7 (a),

inasmuch as having lower waste time, more jobs can be accepted since each

4.6. Summary 97

accepted job requires fewer reserved slots. Resource utilization is also better,

having less wasted time in between executions of jobs. So, resources will be

idle for less time. Note that ETTS gets the best results regarding waste time.

However, its overlap is the highest of the five techniques. This fact can make

jobs not meet their QoS requirements, as Figure 4.8 shows.

ExS also presents a logical tendency regarding waste time, which is having

more waste time as there are more jobs into the system. It presents a slightly

bigger waste than ETTS and RT techniques, because ExS assigns some extra

slots whenever it predicts that the resource could be more busy than usual

at the moment when the job is executed or if the network will be overloaded

when the files are transferred. Hence, more time could be waste but resulting

in a noticeable reduction in the overlapping time, which leads to a remarkable

improvement in the QoS provided, as shown in Figures 4.7 and 4.8.

4.6 Summary

Several research works aim at providing QoS in Grids by means of advance

reservations. However, making reservations of resources is not always possible

for several reasons. For this reason, we propose scheduling in advance (first step

of the reservation in advance process) as a possible solution to address QoS to

Grid users.

This type of scheduling requires to estimate whether or not a given application

can be executed before the deadline specified by the user. This requires to tackle

many challenges, such as developing efficient scheduling algorithms that scale

well or study how to predict the jobs completion time into the different resources

at different times. For this reason, making predictions about Grid resources

status is essential. It must be noted that network must be considered as another

Grid resource, as highlighted by previous studies [46] [112].

This chapter proposes an autonomic framework to perform meta-scheduling

in advance, which self-tunes the predictions made on the job execution time,

to improve the QoS offered to the users. This new system is concerned with

the dynamic behavior of the Grid resources, their usage, and the characteristics

of the jobs. Furthermore, this system takes into account the accuracy in the


recent predictions for each resource in order to calculate a resource trust. By

using this information the system retunes its predictions to better fit the usage of

resources in the future. Along with this, the variable availability of resources is

also tackled by means of rescheduling failed jobs, which improves the autonomic

behavior of the framework.

Apart from presenting the general framework, a comparison between several

strategies that perform estimations on completion time of jobs is included. These

strategies are based on using estimations on the Total Completion Time (TCT),

Execution and Transfer Time Separately (ETTS), Resource Trust (RT) and fu-

ture status of resources through Exponential Smoothing functions (ExS). Those

techniques are compared with an implementation of the original scheduling in

advance algorithm proposed by Castillo et al. [3]. This comparison highlights

the importance of making network estimations independently and of taking into

account the different and variable resource performance as it improves the re-

source usage, allowing thus more jobs to be scheduled.

CHAPTER

5Optimizing Resource Utilization

through Rescheduling Techniques

In highly heterogeneous and distributed systems, like Grids, it is rather difficult

to provide Quality of Service (QoS) to the users. As reservations of resources may

not always be possible, another possible way of enhancing the QoS perceived is

performing meta-scheduling of jobs in advance, where jobs are scheduled some

time before they are actually executed. Thank to this, it is more likely that the

appropriate resources are available to execute the job when needed.

However, when using this type of scheduling, fragmentation appears and may

become the cause of poor resource utilization. Because of that, some techniques

are needed to perform rescheduling of tasks that may reduce the existing frag-

mentation. To this end, two techniques have been developed to tackle fragmen-

tation problems, which consist of rescheduling already scheduled tasks. Under

such scenario, knowing the status of the system is a must. However, how to

measure and quantify the existing fragmentation in a Grid system is a challeng-

ing task. Thus, different metrics aiming at measuring that fragmentation, not

only at resource level but also taking into account all the resources of the Grid

environment as a whole, are presented.

99

100 Chapter 5. Optimizing Resource Utilization through Rescheduling Techniques

5.1 Introduction

The way of scheduling proposed in the previous chapter (Chapter 4) may produce

resource fragmentation as a result of the allocation process [69]. Fragmentation

is a well known effect in resource allocation, which decreases the resource uti-

lization [56]. This means that a job execution request may be rejected even if

the overall remaining capacity of the resources is sufficient to handle it. Thus,

it is easy to identify fragmentation as an individual reason for the rejection of a

single allocation.

In our case, as jobs may have deadline along with start time constraints, it is

possible to have great fragmentation in the system even if the load is not high.

This means that a job execution request could be rejected due to fragmenta-

tion. Therefore, the main aim of this chapter is to present different techniques

that measure the generated fragmentation and react depending on the values

obtained. But how the fragmentation can be quantified in a system requiring

continuous allocations like time schedulers or memory is really complicated.

To tackle these problems, the job rescheduling module has been developed,

over the SA-Layer framework for meta-scheduling in advance presented in [60],

which alleviates the fragmentation problem. This module performs two types of

rescheduling: a reactive and a preventive technique. When a job fails its allo-

cation, the reactive technique reallocates an already scheduled job (keeping its

QoS agreements) and uses its released slots to allocate the incoming job. With

this aim, heuristics have been implemented to decide which job has more prob-

ability of being reallocated and which job has assigned time–slots which could

be useful to allocate the new incoming job. The preventive technique [139] [140]

performs rescheduling of tasks, from time to time, by sorting the jobs already

scheduled in a certain time interval by using its start time information (instead

of by using its incoming time), in the same way as a Bag of Tasks (BoT). There-

fore, the allocation process has more information about jobs to be scheduled

and free slots are put together. By rescheduling those tasks in this new order,

the resource fragmentation is reduced, improving both the scheduled job rate

and the resource usage. Hence, by using these two rescheduling techniques,

the fragmentation problem is alleviated, resource utilization is improved and the

QoS perceived by users is also increased as more of their jobs can be executed.

5.2. Scheduling Problems 101

Whilst for the reactive technique it is quite easy to know when the replanning

must be done (every time a job fails its allocation), for the preventive reschedu-

ling technique this is not so easy. Owing to this fact, metrics to measure the

existing fragmentation are implemented with the objective of knowing (1) if it is

needed to perform the replanning process, (2) what resources must be involved

in, and (3) what time intervals and jobs must be replanned.

The structure of the chapter is as follows. Section 5.2 describes problems

often found in resource allocation processes, being one of them fragmentation.

Section 5.3 details the main contribution of this chapter, which is the imple-

mentation of a job rescheduling module with the preventive and reactive resche-

duling techniques described above in order to tackle fragmentation problems

in Grids. In Section 5.4, several ways of measuring the existing fragmentation

into a Grid system are detailed. A performance evaluation of the approaches is

presented in Section 5.5. Finally, Section 5.6 draws the summary of the chapter.

5.2 Scheduling Problems

A well known effect in every job allocation process is fragmentation, which de-

creases resource utilization, as studied in [56]. So, when a job allocation fails,

even though enough free capacity is available, fragmentation is easily spotted as

a cause. Among the reasons for rejecting job allocation requests in Grid envi-

ronments we can find the next ones [69]:

• High utilization: If there is no vacancy on the resources the meta-sche-

duler will reject the request. The resource owners are interested in high

utilization since it maximizes their revenue.

• Fragmentation: If free parts of the resources are scattered in space and

time, rejection because of fragmentation will appear. However, if optimiza-

tion could compact the reservations to form a large free block, then new

incoming jobs may be admitted. So, the fragmentation could be analyzed

as a way to describe the status of the Grid, as it is detailed in Section 5.4.

• Unfavorable previous decisions: Even if the utilization and fragmentation

are both very low, a job may be rejected. This rejection neither comes


from fragmentation nor from utilization. It is possible that a previously

scheduled job uses the only blocks which fit the new incoming job, so this

new job has to be rejected. The problem origins from the inability to foresee

the future. Usually, the meta-scheduler considers only the requested job

when making decisions.

The last two issues even become worst when having jobs with both start time

and deadline restrictions. As far as fragmentation concerns, if there are jobs

which have both time restrictions, the system may be forced to allocate the job

into time slots which may not be near to any other allocated job. Thus, the

fragmentation appears really soon and this could even be a problem with just a

few allocations.

Apart from that, it must be noted that even if there is no fragmentation, jobs

could be rejected due to unfavorable previous decisions. The main reason is due

to the fact that the meta-scheduler does not know what is going to happen in

the future regarding job allocation requests. Because of that, it has to make

the scheduling decisions only based on its current knowledge about the system

usage. For instance, the system allocates a job which does not have any strong

requirements using some particular free slots. Later, perhaps another job re-

quest arrives which could only be allocated into the slots that were assigned to

the previous job – which was not urgent. Thus, the system has to reject this

last job even if there is a chance of allocating both of them, just by allocating

both jobs in the opposite order. In the next section, two techniques devised for

addressing fragmentation will be presented.

5.3 Tackling fragmentation

Examining how fragmentation can be measured in the special domain of Grid

resources is a must. The nearest solution would be to reuse ideas of how to mea-

sure fragmentation in other domains, e.g., in file systems and main memory. For

instance, in [71] the characteristics of dynamic memory allocators were studied.

However, the domain of memory management does not map onto the domain of

Grid resources, as the main memory can be considered as homogeneous whilst

this is not the case for Grid resources.

5.3. Tackling fragmentation 103

On the other hand, in [69], a new way to measure the fragmentation of a

system, as well as their correlation with jobs rejection, is presented. It shows

that the proposed fragmentation measure is a good indicator of the state of the

system. However, they measure the fragmentation in the system resource by

resource and not all of them as a whole. Thus, further research is needed to

address fragmentation issues in the Grid meta-scheduling domain as such in-

formation could help to compare the effects of a scheduling decision.

The problem to tackle the fragmentation can be formulated as follows:

• A set of jobs j1, j2, ...jn have to be executed and are allocated for their future

execution on a set of m unrelated resources denoted by r1, r2, ....rm.

• Every job will start its execution after its starting time, ts and should finish

before its deadline, d.

• Each job consists of a single non-preemptable task that has to be executed

on one of the resources.

• The execution time of a job depends on the chosen resource and on the

time interval in which it is executed.

The objective is to allocate the n jobs to the m resources by minimizing the

fragmentation so that next job, n+1, can be successfully allocated into the m re-

sources. That is, the main objective function is to minimize the sum∑

i∈[1..m](Gi)

where Gi is the set of available gaps in resource ri.

This problem is NP-complex [24], so this strategy has to take into account

some key aspects in order to be efficient for the Grid system and low–cost in

computation. First, it has to be decided when the rescheduling technique must

be applied. For instance, it can be run periodically or triggered depending on the

status of the system. Next, the algorithm to avoid fragmentation is performed.

At this point, it must be highlighted that only the jobs that have been already

scheduled in-advance but have not started their execution yet are considered.

Hence, there is not any associated cost to the re-scheduling of that jobs regard-

ing their runtimes or data transfers. Moreover, it is a must that they have to

fulfill their QoS agreements after that re-allocation process. An aggressive tech-

nique can be used to reallocate the n + 1 jobs again. That is, the scheduling


Figure 5.1. The Scheduler in Advance Layer (SA-Layer).

algorithm would consider all the jobs as requesting at the same time. On the

other hand, a more conservative scheme can be used in which the minimum

changes are done in order to allocate the n+ 1 jobs.

Based on those two options, two rescheduling techniques, named Replanning

Capacity (RC) and Bag of Task Rescheduling (BoT-R) [140] have been developed

in this dissertation. These techniques have been implemented in the so called

modules showed in Figure 5.1, and are explained in next section.

In the literature there are examples of moving jobs from one resource to an-

other to try to avoid fragmentation and to improve the resource usage. One

related work dealing with reallocation of jobs is [72], which highlights the im-

portance of having accurate information available when provisioning resources

in multiple domains. It uses backfilling to perform that provision of resources.

On the other hand, in [74] an algorithm to perform resource selection based

on performance predictions and also provides an algorithm for moving already

made reservations by making coallocation of jobs. However, no mechanism is

provided to do that among different users as we actually do. Finally, in [75]

it is analyzed the task reallocation in Grids, presenting different reallocation

algorithms and studying their behaviors in the context of a multi-cluster Grid


environment. However, unlike this Thesis, this work is center in a dedicated

Grid environment and evaluated though a simulated environment.

5.3.1 Reactive techniques: Replanning Capacity (RC)

The Replanning Capacity (RC) is categorized into the reactive techniques as it is

only applied when a job allocation fails. Its functionality is implemented in the

module called Replanning Capacity (shown in Figure 5.1) and it is a technique

triggered by the Gap Management module.

Essentially, this technique works as follows. Every time that a job request

is rejected, this technique is triggered. When this happens, it first selects the

already scheduled jobs which could be suitable for being rescheduled without

affecting their QoS requirements – without affecting their expected completion

time; and whose reserved slots may be suitable for the new incoming job. Once

this set of jobs is decided, the systems tries to make room for the new incoming

job (the one which otherwise would be originally rejected) by means of reschedu-

ling one of the suitable jobs chosen before. If this can be done without affecting

the expected completion time of the already scheduled job, then the new incom-

ing job is allocated into the resource and time slots that were just released.

The way how the RC technique works is detailed in Algorithm 11 and is ex-

plained the next. When a job request is rejected, the RC technique is triggered

(line 8). First, the set of target scheduled jobs must be selected (line 9). To do

this, the system filters out the jobs based on two steps. As the first step, jobs

which do not have any reserved slot in the time interval between the start time

and the deadline of the new incoming job (the one whose allocation failed) are

filtered out. Second, jobs less likely to be successfully rescheduled are also fil-

tered out. To this end, the system calculates the laxity for each target job [141].

The laxity is calculated by using Equation 5.1.

Laxity =SchedulingWindow − ExecutionT ime

ExecutionT ime(5.1)

In Equation 5.1, SchedulingWindow is the time interval in which the job has

to be executed (deadline − startT ime), and ExecutionT ime is the time the system


Algorithm 11 Replanning Capacity Algorithm1: Let j = the new incoming job2: Let R = set of resources known {ri / i in [1..n] }3: Let laxity_threshold = the threshold used to filter resources4: Let SelectJ = set of already scheduled jobs {J1, J2,. . . ,Jm} which have reserved

slots in between the start time and the deadline of the j, and whose laxity isabove threshold

5: Let SortJobsByLaxity(J) the function which sorts the jobs of the list J by itsprobability of being reallocated

6: Let GapR = set of available gaps in R

7: Let TSJl,ri = time slots reserved for job Jl in resource ri8: if j allocation fails then9: SelectJ = filterJobs(laxity_threshold)

10: SelectJ= SortJobsByLaxity(SelectJ)11: for each SelectJl ∈ SelectJ do12: if TSSelectJl,ri is feasible for j then13: for each GapRk ∈ GapR do14: if GapRk is feasible for SelectJl then15: Allocate SelectJl at GapRk16: Allocate j at TSSelectJl,ri

17: Exit18: end if19: end for20: end if21: end for22: end if

estimated that the job would need to complete its execution for the resource

and the time interval where it was allocated. The laxity means the probability

of making a successful replanning of a job. This value is used for filtering out

the target jobs. Therefore, those jobs whose associated laxity is lower than a

specific value will not be taken into account as candidates for being reallocated,

since their scheduling window is too tight. In case of best–effort requests without

time requirements, the deadline to finish the job is assumed to be infinity and

their laxity will also be infinite. Hence, these best–effort jobs are prone to be

rescheduled.

Finally, the resulting list of jobs which have slots that could be suitable for

allocating the new incoming job is sorted by their probability of reallocation.

Those jobs are sorted by decreasing laxity (line 10). With the aim of keeping the

time needed to allocate the incoming job as short as possible, the resulting list

is truncated to the first x jobs of the list. In the case that the system fails at

trying to reallocate the first x target jobs, the process would be canceled and the

incoming job would not be accepted. This x value is set to not increase the time


needed to perform the rescheduling action in excess. To this end, in this Thesis

a maximum number of 10 jobs checked is established.

Once the list of target jobs is obtained and sorted (SelectJ in Algorithm 11),

the system acts as follows for each job SelectJl in SelectJ :

• Estimate the number of slots that the new incoming job (j in Algorithm 11)

would need for its execution in the resource. With this information, it is

decided if the number of slots of the job SelectJl (TSSelectJl,Ri ) is enough for

allocating j (line 12). In negative case, the slots of the next job of the list

(SelectJl+1) are checked. Otherwise, the next point is performed.

• If the number of slots of SelectJl (TSSelectJl,Ri ) is enough for allocating j, the

system tries to allocate the already scheduled job (SelectJl) in other resource

and/or time interval (line 14). In negative case, the algorithm continues to

the next loop iteration and the next job of the list (SelectJl+1) is checked.

Ultimately, when the previous conditions have been fulfilled for a job in the

list, the system reallocates the already scheduled job (SelectJl) to its new re-

source and/or time interval (line 15). Its previous reserved slots are used for

allocating the new incoming job (line 16). In this way, both jobs are allocated

leading to a better utilization of the resources. Consequently, the load increases

on the computing resources since more jobs can be executed. On the other

hand, as the rescheduled job is reallocated taking into account the jobs sched-

uled after its first allocation, the system has more information when making the

new scheduling decision, and the fragmentation is therefore decreased. It must

be noted that the job to be reallocated is rescheduled in the same way as it was

scheduled the first time – following the same algorithm.

5.3.2 Preventive techniques: Bag of Task Rescheduling (BoT-R)

Apart from the reactive technique, we have also developed a prevention technique

named Bag of Task Rescheduling (BoT-R). The BoT-R technique is regularly ap-

plied in order to avoid job allocation failures or at least to reduce them to the

minimum. However, this process is made only at intra-domain level with the aim

of not generating delays into network transfers and to make it more scalable. In

this way, there is not any substantially increment in the network usage.


BoT-R is carried out at intervals, meaning that at most only the jobs which

will be executed in the studied interval will take part into the replanning process.

The jobs whose executions start before the beginning of the interval are also

excluded of the replanning process. No matter what interval we study, jobs

being executed will not take part into the replanning process. Thus, there will

not be preemption of jobs.

Furthermore, it must be noted that the BoT-R technique could also be used

together with the previously explained RC with the aim of avoiding job allocation

failures in between two contiguous executions of the BoT-R algorithm. With all

these assumptions, the BoT-R algorithm can be separated into two steps:

1. Trigger phase: Estimate if there is any necessity of performing rescheduling

of tasks. In such case, the time intervals which will be involved in the

rescheduling process are selected for the next step (this is presented in

Algorithm 12). It is explained in next subsection.

2. Filtering phase: Perform the BoT rescheduling process for the selected time

intervals, but only over the resources and tasks that need to be involved

in the process (this is presented in Algorithm 13). It is explained in detail

later.

In spite of the fact that the rescheduling is performed every period of time,

neither there is need of performing it in all periods nor involving all the re-

sources. Therefore, in order to perform the BoT rescheduling just when needed

and over the resources that really need it, resource fragmentation must be mea-

sured since it could be used as a forecast of how likely future allocations may

fail [69].

To this end, different metrics have been implemented to try to make good es-

timations about existing fragmentation and to take this information into account

when performing the two phases aforementioned. Those metrics are explained

in Section 5.4.

Trigger Phase

Regarding the triggering phase, Algorithm 12 has been implemented. It checks

several state variables to verify if rescheduling is needed. Moreover, the algo-


Algorithm 12 BoT-R Trigger executed every L period

1: Let StfJ = the first start time of the scheduled jobs2: Let EtlJ = the latest end time of the scheduled jobs3: Let P = the period between [StfJ , EtlJ ]4: Let Pi = the ith period between [StfJ , EtlJ ] in strips of strip slots5: Let OcuPi

= the percentage of resource occupation into period Pi

6: Let ruhigh = the maximum resource usage threshold7: Let rulow = the minimum resource usage threshold8: Let Frag(Pi) = the percentage of fragmentation at Pi interval9: Let FragThreshold = the minimum percentage of fragmentation needed to

making rescheduling of tasks10: Let #GapsPi

= number of gaps into period Pi

11: Let BoT −R_Algorithm(Pi) = the BoT-R function over period Pi

12: if (#Jobs > #Resources) then13: for each Pi ∈ P do14: if (#Jobs > 2 ∗ #Resources) and (OcuPi

in (rulow,ruhigh)) and (#GapsPi>

#Resources) and (Frag(Pi) > FragThreshold) then15: BoT-R_Algorithm(Pi)16: end if17: end for18: end if

rithm has to calculate in which period of time the system needs to apply Algo-

rithm 13. Thanks to this, the rescheduling is performed only on a number of

resources, periods of time and jobs, performing the rescheduling over all the

resources, periods of time and jobs would be against scalability.

On explaining Algorithm 12, it must be noted that, in order to reduce the

computational cost some conditions are first checked. It is checked if there are

more jobs than resources (line 12). In negative case, there is no need of replan-

ning since the system load is insignificant. In case of a positive answer, extra

conditions need to be evaluated. In such case, the period of time between the

first and the last scheduled job is split into intervals of strip slots. Subsequently,

these subintervals are checked separately so that replanning is only applied on

the time period which presents fragmentation. For each subinterval, it has to be

checked the information related to the load of the system and to the status of the

Gap Management.

Regarding the system load, the heuristic applied consists on checking if the

number of jobs is more than twice the number of resources for the selected

time interval (line 14). If not, there is no necessity of replanning as there are

not enough jobs to have fragmentation problems. It is also checked if resource

usage is between two thresholds (rulow and ruhigh), because unless this happens,


there is no necessity of replanning. The rationale is that either if there are a lot

of free time slots (no need to reschedule jobs to allow more allocations) or if there

are too few free time slots (rescheduling would not make suitable gaps to allow

new allocations), the rescheduling of the jobs in such cases would not show any

improvement. With this objective, those thresholds are respectively set to 40 %

and 95 % in this Thesis.

With regard to the Gap Management status, and taking into account if all

the aforesaid conditions were fulfilled, information related to the fragmentation

is calculated and evaluated, such as the number of gaps (#GapsPi> #Resources

in line 14). To this end, different metrics measuring the fragmentation gen-

erated due to the scheduling decisions are used (represented by Frag(Pi) >

FragThreshold in line 14), which are explained in Section 5.4.

Filter Phase

Once the system has estimated which subintervals need tasks replanning to re-

duce the fragmentation, the rescheduling algorithm (Algorithm 13) is performed,

for each of them. The rescheduling problem involves a set of n jobs that has to

be processed on a set of m unrelated machines. Each job has to be executed on

one of the machines, taking into consideration that the processing time of a job

depends on the chosen machine where the processing is performed. In addition,

a release time is given for each job, meaning the time at which the job is avail-

able for processing. Moreover, each job has it own deadline. Since this problem

in just a single machine is already NP-hard in the strong sense, the problem

for multiple unrelated machines is NP-hard in the strong sense too [24]. Even

if the jobs are distributed to the machines, there is not any pseudo-polynomial

algorithm for sequencing optimally the jobs distributed to some machine. For

this reason the simple way of reallocating the jobs, presented in Algorithm 13,

has been implemented. This algorithm takes as input the interval to perform re-

planning – defined by start time and end time of the period. Then, the following

process is applied, following Algorithm 13:

1. The resources involved in the replanning process must be decided (line 9).

To this end, a filtering process is applied over the available resources so

that the resources which present a very high load are not taken into ac-


Algorithm 13 BoT-R Algorithm

1: Input: start time and end time of the interval to perform replanning (Pi)2: Let R = set of resources known {ri / i in [1..n] }3: Let J = set of already scheduled jobs {J1, J2,. . . ,Jm}4: Let Pi = the period to replan5: Let ResourcesFilter() = function which obtains the resources to defragment,

considering their workload6: Let JobFilter(R) = function which obtains the jobs scheduled to resources R

7: Let JobIntervalFilter(Pi,J ) = function which obtains the jobs of J whose fullexecution is within period Pi

8: Let SortJobsByStartTime(J ) = function which obtains a sorted list of the jobsJ taking into account its start time information

9: R′ = ResourcesFilter()10: J ′ = JobFilter(R′)11: JtoDefrag = JobIntervalFilter(Pi,J ′)12: Jsorted = SortJobsByStartTime(JtoDefrag)13: for each ri ∈ R′ do14: for each j ∈ Jsorted do15: if j can be allocated in ri then16: Schedule j to ri17: Jsorted = Jsorted − j

18: end if19: end for20: end for

count, as they do not have usable fragmentation. Only the resources that

present fragmentation in their allocations are studied.

2. The jobs scheduled to the resources obtained in step 1 are selected (line 10).

3. The jobs obtained in step 2 are filtered in order to take into account only

those whose full execution is within the defined period (line 11).

4. Once the list of jobs to replan is obtained, it is sorted by the job start time

restriction (line 12).

5. Finally, for each resource selected to be defragmented (line 13), the sorted

list of jobs is scanned in order (line 14), with the aim of allocating as many

jobs as possible in each resource (line 16), reducing the number of free

slots in between contiguous allocations. When it is not possible to allocate

any more jobs, the next resource is used for allocating the jobs which are

not allocated yet, and so on.

It must be noted that the network transfer times are recalculated with the

new information, with is more up–to–date than when it was estimated the pre-


vious time – thus, the transfer estimations are more accurate. These times are

overlapped with the execution times of the previous job (in case of prolog phase)

and with the next job (in case of epilog phase). This fact reduces the resource

usage fragmentation to the minimum.

As it is possible that this rescheduling technique does not find a suitable

solution for reallocating all the jobs, the whole process could be canceled and

the jobs would keep its initial scheduled slots in the case that it was not possible

to reallocate all the involved jobs.

In addition, this process of rescheduling could be performed on another ma-

chine, so that the computational cost were reduced – it could even be executed

as another task submitted to the Grid.

However, measuring the status of the system (the existing fragmentation) is

required to perform this technique. To this end, different metrics aimed to this

issue are detailed in next section.

5.4 Fragmentation metrics

As stated before, a key point on the rescheduling algorithms is how to know

when there is a necessity of performing rescheduling of tasks and over what

resources it has to be applied. The most suitable information to trigger the

rescheduling process is a metric capable of measuring existing fragmentation.

However, how to measure the fragmentation in a real Grid system is a challeng-

ing task which still needs to be studied. There are studies in others domains,

like memory, but they do not map into Grid domains since Grids have more

constraints which must be taken into consideration. For this reason, different

ways of measuring the status of the Grid resources are presented and evaluated

in this chapter.

To overcome the fragmentation problems at the scheduling process, a tech-

nique based on rescheduling already scheduled tasks in a Bag of Tasks has

been developed [140]. The way of knowing when and over what resources is go-

ing to be applied is carried out in a two phases, named trigger phase and filtering

phase. To improve the functionality of those steps, different metrics have been

5.4. Fragmentation metrics 113

implemented that measure the fragmentation presented in evary single resource

belonging to the systems and also taking into account all of them as a whole.

Those metrics try to make good estimations about existing fragmentation. In this

way, the performance of the BoT-R technique will be improved by better knowing

the status of the Grid system.

It must be noted that, for all the metics next detailed, the first checked condi-

tion is the average occupancy of all the resources within the target time interval.

This occupancy has to be between two threshold, meaning that there is enough

occupancy to take advantage of performing a rescheduling process and it is not

so high as to have negligible improvement after having done it.

5.4.1 Trigger Phase

With the aim of properly estimating if a subinterval presents high fragmen-

tation (Frag(Pi) value) and consequently needs to be replanned (higher than

FragThreshold), three different metrics have been implemented. They are the

next:

• Gap: One possible way of estimating how properly the resources are be-

ing used, is by counting the number of free time intervals, named gaps,

and their average size (Equation 5.2). Whenever the number of gaps is

higher than a specific threshold and their average size is lower than another

threshold (when the gaps size are too large, this is not considered real frag-

mentation but free intervals that may allocate new jobs), the rescheduling

process is started – execution of Algorithm 13 over the selected interval.

AverageGapsSize =

∑nj=1 gj

n(5.2)

• Fragmentation: Another way of weighting the proper use of resources

by using the free time intervals generated in the allocation process is the

metric proposed in [69]. In that work, a metric to measure the fragmen-


tation generated at the scheduling process by using the next equation is

presented:

Frag(ri) = 1−

∑nj=1 g

pj

(∑n

j=1 gj)p

(5.3)

where gj is the jth gap size in resource ri; and p is a variable that makes

the equation resistant to small negligible fragments as long as one large

fragment exits. It is a value that boost the influence (exponentially) of the

large gaps over the small ones. Nonetheless, in [69] authors measure the

fragmentation resource by resource, not all of them as a whole. For this

reason, we use the average of the measured fragmentation in each resource

as the fragmentation of the whole Grid system. Then, in case that the

estimated total fragmentation is greater than FragThreshold (line 14), the

algorithm to reschedule the task (Algorithm 13) will be executed for the

selected interval (line 15).

• Max_Fragmentation: Finally, the fragmentation of the whole system is

measured as the highest fragmentation found within a resource (measured

following Equation 5.3) instead of the average of them. In the same way

as the previous point, when the estimated fragmentation is greater than

the established threshold (FragThreshold in Algorithm 12), the algorithm

to reschedule the tasks (Algorithm 13) will be executed for the selected

interval.

All the established thresholds when using the above explained metrics have

the aim of correctly measure if there is fragmentation in the system or it is just

low or too high usage. Moreover, they try to trigger the rescheduling process

just when the chances of successfully ending the process is reasonable high.

To this end, after testing their behavior for different thresholds, we chose the

ones with provides the best behavior most of the times. In this case, for the

Gap metric, when the average gaps size is greater than 15 slots, it is observed

that they are not usually fragmentation, but free slots due to not too high us-

age. Also, the number of gaps needs to be greater than the number of resources.

As Fragmentation metric concerns, it is observed that, following Equation 5.3,

real fragmentation (which supposes a problem at the scheduling process) only

5.4. Fragmentation metrics 115

appears when the average value is above 10. For values lower than that, the

rescheduling process would be triggered without a real necessity of doing it. Re-

garding Max_Fragmentation, as it is chosen the maximum value obtained for

all the resources, this threshold must be greater to avoid performing useless

rescheduling processes. After trying different values, threshold was set to 30

since it resulted in a good rate between triggered rescheduling actions and suc-

cessfully committed.

5.4.2 Filter Phase

With regard to the second step of the BoT replanning process, the system has to

figure out which resources should be involved in the process whenever it is esti-

mated that in a certain subinterval there exists enough fragmentation to trigger

the rescheduling techniques. Therefore, and bearing in mind the aforementioned

ways of measuring the fragmentation in the system, two different techniques are

implemented to discern which resources will take part in the process:

• Gap: This technique selects the resources by taking into account the time

slots in use in the selected interval. When the number of used slots is

above a specific threshold, the resource will be filtered out and it will not

be taken into consideration in the rescheduling process. The reason for this

is that when the resource is too loaded, close to 100 %, the fragmentation

presented in that resource is negligible (or at least we cannot take advan-

tage of it) and the rescheduling process is not going to improve its usage

notoriously. Hence, taking this resource into account at the rescheduling

process will not provide any improvement considering the time needed to

carry out this process.

• Fragmentation: This technique uses Equation 5.3 to obtain the fragmen-

tation presented in each resource. Then, the resources which do not have

enough fragmentation will be dropped of the list of resources which will be

included into the BoT rescheduling process. Consequently, the jobs already

allocated in that resource will also not be included into that process.

Those metrics are in charge of filtering out the resources (and its jobs) that

present a high usage with a good enough scheduling (not having almost frag-


mentation). To this end, for Gap metric, the threshold to estimate this fact is

set to 95 %. Hence, resources with a usage greater than that are not taken into

account in the rescheduling process. With regard to Fragmentation metric , it

is set another threshold for the value obtained using Equation 5.3 for each re-

source. Hence, the resources with a value below the threshold are filtered out.

After different experiments, this threshold was set to 30.

All these different ways of measuring the fragmentation are mixed to try to

find a good balance between the necessity of reducing fragmentation and the

possible computational cost that the rescheduling technique may involve.

5.5 Evaluation

This section describes the experiments conducted to test the usefulness of

the rescheduling techniques, along with the results obtained.

In this section, several implementations of the SA-Layer are compared with a

straightforward implementation of the algorithm presented by Castillo [38]. The

SA-Layer implementations first compared here are:

1. The new implementation with the re-scheduling capabilities presented in

this chapter. The Bag of Task Rescheduling is labeled as BoT-R, the Replan-

ning Capacity as RC and finally, when both techniques are working together

is labeled as BoT-RC;

2. The previous implementation of the SA-Layer presented in Chapter 4 (la-

beled as ExS in figures) which does not provide re-scheduling of jobs. So,

ExS does not include any mechanism to deal with fragmentation and poor

resource utilization as a result of the allocation process, apart from the way

of searching the target gaps by the Gap Management subsystem.

These techniques use Exponential Smoothing functions to better estimate the

time needed to complete the execution of a job in a specific resource – including

networks transfers.

After that, the performance of the SA-Layer using the different fragmentation

metrics presented in Section 5.4 and detailed in Table 5.1 is evaluated (they

5.5. Evaluation 117

are labeled in figures as they are named in that table). Their comparison with

Castillo and ExS is also outlined.

To evaluate the performance of those scheduling techniques several statistics

are used. Among the ones related to the user viewpoint:

• Scheduled job rate is the fraction of accepted jobs, i.e., those whose deadline

can be met.

• Rejected job rate, which means the number of jobs that are not accepted

to be executed since the system estimates that it is not possible to execute

them fulfilling their QoS requirements under the current Grid conditions.

• QoS not fulfilled, meaning the number of rejected jobs, plus the number

of jobs that were initially accepted but their executions were eventually

delayed. Thus, their QoS agreements were not fulfilled (e.g. the deadline

was not met).

Recall that in this work, QoS not fulfilled includes rejected jobs since they

are jobs which have not been executed with the requested QoS, and the QoS

specified for each job is always reasonable. The job QoS requirements could be

more or less strict but in all cases the job could be scheduled meeting the QoS

requested if the system is empty.

On the other hand, and from the system point of view, there are statistics

related to the frequency of replanning (how often a rescheduling process is per-

formed) and regarding how likely those rescheduling actions cannot end suc-

cessfully. They are named Submitted replanned and Aborted replanned, respec-

tively. Finally, the Resources Usage is another metric that shows variety of

information, such as the number of resources used, the way they are used and

during how long they are being used.

5.5.1 Testbed

The evaluation of the rescheduling implementation has been carried out in the

same real Grid environment detailed in Chapter 4, but with the addition of more

resources belonging to a new University. The testbed is made of resources lo-



cated in two different Spanish Universities and in the University of Umeå (UmU),

Sweden, as it is depicted in Figure 5.2.

Note that these machines are administrated and operated by their respective

owners. Each non–cluster machine is a personal computer (hence not a dedi-

cated machine) of a member of the staff of UCLM, UNED or UmU. Individual

workloads (including network load) of these machines vary greatly and are not

defined in the testbed or the experiment setup. Moreover, they could fail or leave

the Grid at any moment, making our system even more realistic.

5.5.2 Workload

One more time, the selected test for evaluating our proposals is the 3node of the

GRASP [115] benchmarks, due to its versatility to make jobs network and/or

computation demanding.

For this evaluation, the compute_scale takes random values between 0 and

20, whilst the output_scale takes values between 0 and 2, both following a uni-

form distribution. So, we have up to 63 different kind of jobs (21 ∗ 3). The input

5.5. Evaluation 119

file size is 100 MB, and these values of output_scale create output files whose

size is up to 200 MB. The rationale of not setting the output_scale between 0 and

20, as it has been done in Chapter 4, is based on the fact that in this perfor-

mance evaluation we are interested in measuring the fragmentation presented

in the resources, which is related just to computational time. So, sending bigger

or smaller files is not really important for the study carried out in this section.

In this evaluation three different kind of workloads are used. First, a work-

load is generated by submitting the 3node jobs one after another (named Work-

load 1). So, when one job is submitted, and the system has accepted or rejected

it, the next job is submitted. A more realistic kind of workload has also been

studied, named Workload 2, where the submission rate is varied among 2, 4

and 6 jobs per minute, following a uniform distribution to generate the random

values for each case. In both cases, the total number of jobs submitted was

500 jobs and each result presented in this paper is the average of 5 executions.

Finally, to evaluate the fragmentation metrics, another workload made by the

jobs above defined is used, named Workload 3. This time, 1000 jobs have been

submitted for each test following a uniform distribution with an average of 8 jobs

per minute.

5.5.3 Rescheduling techniques

The aim of this test is to prove that rescheduling techniques improve the re-

source usage by modifying unfavorable previous decisions and BoT improves

resource usage by also reducing the fragmentation. As a result, more jobs may

be accepted. Moreover, when this technique is used in conjunction with the

preventive technique (BoT-RC), even better results are obtained.

The first statistic presented here is the scheduled job rate, which is presented

in Figure 5.3 (a). The x–axis represents the number of submitted jobs and the

y–axis the number of jobs that were actually accepted. So, the progression of

accepted jobs over the submitted jobs is depicted.

This figure highlights the improvement obtained by using the rescheduling

techniques, as they clearly outperform the other two techniques when the load

becomes higher. In fact, when only 200 jobs had been submitted, and using

any rescheduling technique, more jobs are accepted than when 500 jobs were


50 100 150 200 250 300 350 400 450 500

Submitted Jobs

0

100

200

300

400

500

Accep

ted

Job

s

Castillo ES RCBoT-R BoT-RC

Castillo ES RC BoT-R BoT-RC0

10

20

30

40

50

60

70

% Q

oS

Not F

ulfi

lled

(a) Scheduled job rate (b) QoS not Fulfilled

Figure 5.3. Comparison between the scheduling techniques for Workload 1.

submitted using Castillo. With slightly more than 350 jobs submitted, the RC

technique exhibits more accepted jobs than when 500 jobs were submitted ap-

plying the ExS technique. Moreover, when using BoT-R or the mixture of both

rescheduling techniques, with just 300 jobs submitted, the number of jobs ac-

cepted is greater than when 500 jobs were submitted using ExS.

Another trend may be seen as load is increased. In case of medium or low

load (less than 250 jobs), both rescheduling techniques exhibit a similar be-

havior. Even there are not large differences between using or not rescheduling

techniques. At this loads the fragmentation is not a big issue as there are enough

slots for allocating most of the jobs. When the amount of jobs is higher (up to

400 jobs) the BoT-R rescheduling technique presents better results. The ratio-

nale is that there is enough fragmentation to take advantage of it. However,

when the load is even higher, the resource usage is rather high and the frag-

mentation is not usable. Therefore, the algorithm in charge of triggering the Bag

of Task rescheduling decides that it is worthless to make this replanning and it

is only triggered over a few resources and time intervals in which the existing

fragmentation may be used. When the load is very high (from 400 to 500 jobs)

the RC technique presents a better behavior than BoT-R as this technique tries

to improve the resource usage by modifying unfavorable previous decisions.

The BoT-RC technique presents the best behavior due to the fact that it takes

advantage of the fragmentation and it is able to modify unfavorable previous

decisions. In this way, BoT-RC presents the best results of all the rescheduling

5.5. Evaluation 121

2Jobs/Min 4Jobs/Min 6Jobs/Min0

10

20

30

40

50

60

70

80

90

100

% J

ob

s A

ccep

ted

Castillo ES RC BoT-RBoT-RC

2Jobs/Min 4Jobs/Min 6Jobs/Min0

10

20

30

40

50

60

% Q

oS

not f

ulfi

lled

Castillo ES RC BoT-R

BoT-RC

(a) Scheduled job rate (b) QoS not Fulfilled

Figure 5.4. Comparison between the scheduling techniques for Workload 2.

techniques as load is increased. Therefore, the BoT-RC technique obtains an im-

provement on accepted jobs of a 114 % over Castillo and around 40.2 % over ExS.

The performance difference between using only one rescheduling technique or

both together is lower but also remarkable. The BoT-RC technique outperforms

BoT-R in 19 % and RC in 6.5 %. In spite of not having noticeable differences with

RC when 500 jobs were submitted, if we pay attention to the whole progress, the

difference on accepted jobs between BoT-RC and RC is up to 19.6 %.

Figure 5.3 (b) depicts the percentage of jobs which did not fulfill the requested

QoS. The improvement obtained by the rescheduling techniques is still main-

tained and even improved when they are used in conjunction. BoT-RC obtains a

reduction in the QoS not fulfilled of 70.2 % over Castillo and of 56 % over ExS.

On the other hand, the conjunction of both rescheduling techniques obtains an

improvement over the cases when only one of them is used of 21.4 % over RC

and of 41.7 % over BoT-R.

The results for the Workload 2, which represents a more realistic experiment,

are depicted in Figure 5.4, showing the scheduled job rate and the percentage

of jobs which finally do not meet their QoS requirements. Both graphics are

presented, in spite of being almost the inverse of each other, with the aim of

highlighting that even making a more stressful use of the resources, this fact

does not lead to a noticeable increment in the number of jobs that finally do

not fulfill their QoS requirements due to mispredictions or inaccuracies in the

estimations of jobs duration.


As the plots depict, Castillo technique is again clearly outperformed by the

other techniques. Comparing ExS and RC, there are small differences at low

loads since the fragmentation and resource usage are not as high as when the

submission frequency is increased. The rejections due to fragmentation or un-

favorable previous decisions are less likely. However, when the system load in-

creases, the differences are more remarkable and the RC technique outperforms

ExS.

For the BoT-R and BoT-RC cases, these differences are noticeable for all the

submission frequencies. At low loads (2 Jobs/Min.), and thanks to the Bag of

Task replanning, both BoT-R and BoT-RC can allocate all the jobs due to the fact

that the fragmentation in between allocation is reduced from time to time and

there are free slots for all the allocations.

When the submission rate rises to 4 jobs/min., jobs start to be rejected in

spite of applying this technique. It also must be noted that a submission rate of

4 jobs/min. onwards supposes a very high load and there is not enough com-

putational resources to allocate all the jobs. Note that submitting 4 jobs/min.

does not means executing 4 jobs/min. as these jobs may have a start time and

deadline constraints. Because of that, there may be periods of time where just

a few jobs have to be executed. By contrast, there may also be other periods of

time in which the number of jobs to be executed would be much greater than

the submission rate. This fact can be seen in Castillo statistics, which depicts

a QoS below 50 % of the submitted jobs. At this submission frequency, BoT-

RC improves the accepted job rate by 101.3 % over Castillo, by 37 % over ExS,

by 10.6 % over RC and by 2.3 % over BoT-R. With regard to the QoS not ful-

filled, BoT-RC reduces this statistic by 88.5 % compared with Castillo, by 80 %

compared with ExS, by 58 % compared with RC and by 33 % compared with

BoT-R.

From this point, the differences between using or not rescheduling tech-

niques keeps increasing. However, the differences between using only the Re-

planning Capacity or both rescheduling techniques are smoothing. The reason

for this is that the resource usage is quite close to the full usage and there is not

any usable fragmentation. In this way, just a few jobs may be moved to try to

avoid rejection due to unfavorable previous decisions for both techniques, and

the Bag of Tasks replanning is hardly committed. Consequently, the differences

5.5. Evaluation 123

Table 5.1. Combination of Fragmentation Metrics.G-G F-G F-F MaxF-F

Intervalsto be replanned Gaps Fragmentation Fragmentation Max_Fragmentation

Resourceswithin the process Gaps Gaps Fragmentation Fragmentation

between BoT-R and BoT-RC are a bit more remarkable as BoT-RC has the chance

of moving several jobs to try to avoid some unfavorable previous decisions whilst

BoT-R not.

These results emphasize the goodness of the rescheduling techniques to in-

crease the resource usage and the QoS received by users. By applying them,

more jobs are accepted as it is possible to reallocate already scheduled jobs. In

this way, the jobs which have less restrictive QoS requirements may be reallo-

cated in order to be able to allocate a new incoming job that has more restrictive

requirements. Hence, both jobs (the rescheduled and the new incoming one) can

be executed.

Moreover, several jobs may be reallocated in a BoT way having more informa-

tion about the job to schedule than the information that the system had about

them when their first allocation take place.

5.5.4 Fragmentation metrics

On the other hand, a number of experiments are undertaken in the previously

detailed Grid environment testbed to evaluate the efficiency and accuracy at

measuring the real fragmentation in the whole Grid system when using the frag-

mentation metrics above presented. Table 5.1 shows how those metrics are

mixed to be evaluated.

To evaluate those metrics, the Workload 3 (see Section 5.5.2) is used. For

the sake of clarity, this evaluation is based on the performance of the BoT-R

technique when using the different fragmentation metrics. Thus, the results

for RC and BoT-RC are not depicted as they have been studied in the previous

section.

The aim of the next tests are to highlight the importance of properly quanti-

fying the status of a Grid system and of making a good balance between com-


Castillo ES G-G F-G F-F MaxF-F0

5

10

15

20

25

% o

f R

eje

cte

d J

ob

s

Figure 5.5. Percentage of Rejected Jobs for Workload 3.

putation time needed to perform the rescheduling actions and the advantages

obtained by using them.

Figure 5.5 depicts the evaluation regarding the rejection job rate. This is a

metric from the user point of view as it influences on the vision that the user

has about the system performance. This figure highlights the importance of

making good predictions (all the techniques clearly outperform Castillo’s imple-

mentation) and the benefits of using techniques that reduce the fragmentation

generated at the scheduling process. All the BoTs techniques outperform ExS,

reducing the percentage of rejected jobs between 50 % and 70 %. Moreover, it

can be seen that the different ways of measuring the Fragmentation result also

in different performance regarding the number of jobs rejected. The techniques

using Fragmentation as a metric to estimate what resources are taken into ac-

count into the rescheduling process present better behavior that the ones using

Gaps.

On the other hand, and regarding performance from system viewpoint, Fig-

ure 5.6 shows how often the rescheduling techniques are executed and how

successfully they are. Thus, this figure represents how much overload implies

the rescheduling techniques when using each fragmentation metric.

From these plots, it could be said that MaxF-F presents the best behavior.

When using MaxF-F, the fragmentation is better quantified. As a consequence of

that, it is much less likely that the rescheduling actions do not end successfully.

5.5. Evaluation 125

G-G F-G F-F MaxF-F0

10

20

30

40

50

60

70

80

% of Submitted Replanned % of Aborted Replanned

Figure 5.6. Relationship among checked, submitted and canceled reschedulingfor Workload 3.

In fact, a rescheduling action has almost always success when it is submitted

by using these fragmentation metrics.

F-G is the techniques that entails less overload (less than 50 % of the evalu-

ated rescheduling action are actually executed) but at the expense of more job

rejected. Moreover, the chances of having a rescheduling process aborted are

remarkable higher than with MaxF-F. This is the same for the other two tech-

niques. For instance, F-F presents the best behavior regarding rejected jobs,

though it is the techniques that requires more computational time to perform

the rescheduling actions. The rescheduling process is submitted more times

than using MaxF-F with similar cost for each one, as both use Fragmentation as

a technique to select the resources and jobs involved into the process. Suffice

it to say that this fact also leads to have greater chances of having the process

aborted, which clearly means a useless waste of time. Hence, MaxF-F is the

technique that presents a better balance between the overload produced and the

number of jobs rejected.

Finally, Figure 5.7 depicts resources usage without using any rescheduling

technique and Figure 5.8 presents the results obtained when the rescheduling

technique is performed using the different fragmentation metrics explained in

Section 5.4. Those figures shows the resource usage along time slots, distin-

guishing each resource usage by using a different color tone. In this experi-

ments, for clarity reasons, the number of available resources is set to 12. So,


10 30 50 70 90 110 130 150 170 190

Time Slots

0

2

4

6

8

10

12

Nu

mb

er o

f resou

rces u

sed

Figure 5.7. Resources Usage without fragmentation metrics for Workload 3.

the aim is to make a efficient usage of those resources by achieving a nearly

100 % resource usage when there is enough number of jobs to do it.

Those figures highlight the advantages of performing rescheduling techniques

and the importance of how measuring the existing fragmentation. Without using

rescheduling techniques, the resulting resources usage presents more fluctua-

tions. And what is even more important, affecting a greater number of resources.

This last fact can be seen in the more frequent (greater in number) and bigger

(deeper) alterations that have the colored stripes in Figure 5.7.

Regarding the differences among the fragmentation metrics presented (plots

of Figure 5.8), the best behavior is again obtained when using MaxF-F. All of

them present a more uniform behavior (fewer fluctuations) than ExS technique.

However, there are remarkable differences among them. First of all, when using

MaxF-F the fluctuations regarding existing fragmentation (without taking into

account the increase or decrease owing to the fact that the total number of jobs

in the system is increasing or decreasing) are fewer than in the other cases.

Furthermore, when fragmentation appears, it is presented just in one resource

whilst in the other cases fragmentation may affect to more than one resource.

That is, the peaks are smaller (just one level hop) using MaxF-F than using the

other fragmentations metrics (hops of up to 3 levels).

On the other hand, both G-G and F-F have certain periods of time where the

rescheduling process seems not to have been submitted or to have been aborted.

This is shown at time slots 140 to 180 for G-G and between 100 and 120 for F-F.

5.6. Summary 127

10 30 50 70 90 110 130 150 170 190

Time Slots

0

2

4

6

8

10

12

Nu

mb

er o

f resou

rces u

sed

10 30 50 70 90 110 130 150 170 190

Time Slots

0

2

4

6

8

10

12

Nu

mb

er o

f resou

rces u

sed

(a) G-G (b) F-G

10 30 50 70 90 110 130 150 170 190

Time Slots

0

2

4

6

8

10

12

Nu

mb

er o

f resou

rces u

sed

10 30 50 70 90 110 130 150 170 190

Time Slots

0

2

4

6

8

10

12

Nu

mb

er o

f resou

rces u

sed

(c) F-F (d) MaxF-F

Figure 5.8. Resources Usage when using fragmentation metrics (BoT) for Work-load 3.

Hence, as far as resource usage concern, the metrics that present better per-

formance are F-G and MaxF-F. However, taking into account the other statistics

outlined above, the best behavior is provided by MaxF-F. It presents low rate

of rejected jobs, with a low computational overload (not as good as F-G, but, in

contrast, MaxF-F almost always is able to finish the rescheduling process) and

with the better performance from the resource usage viewpoint.

5.6 Summary

Providing QoS in such a distributed and heterogeneous system, such as Grid

environments are, is a challenging task. Advance reservations are usually pro-

posed to this end, but they are not always possible in real Grids. This Thesis

proposed meta-scheduling in advance as a possible solution to provide QoS to

Grid users. However, this kind of scheduling requires tackling many challenges,

such as developing efficient scheduling algorithms that scale well or studying


how to predict the jobs duration into resources. Another important point that

has to be addressed is the necessity of dealing with the poor utilization of re-

sources due to the generated fragmentation at the scheduling process.

This chapter proposes an extension to the framework presented in Chapter 4

to be able to overcome the fragmentation and unfavorable previous decisions

which lead to poor resource utilization. These new features allow the system

to make rescheduling of already scheduled jobs with the aim of being able to

allocate a greater number of jobs by using resources more efficiently. Hence,

the main contributions are the development of the two rescheduling techniques

mentioned above, aimed at improving resource utilization and QoS provision in

Grids by reallocating already scheduled jobs (keeping their previous QoS agree-

ments). The reactive approach is called Replanning Capacity (RC), and it is exe-

cuted every time a job fails its allocation. In this case the system tries to resched-

ule a target job, maintaining its QoS requirements. Therefore, the reallocation of

that job would release the time slots needed to accept the incoming job, in such

a way that both jobs are executed meeting their respective QoS requirements.

The preventive approach, called Bag of Tasks (BoT), reschedules jobs (in a

Bag of Tasks way) by its start time instead of by its arrival time. Hence, the real-

location of those tasks will create fewer fragmentation into resources. Moreover,

different ways of measure the fragmentation generated in the allocation process

are presented and used in the implementation to trigger the BoT rescheduling.

Moreover, this chapter presents different metrics to measure the fragmenta-

tion presented in a Grid system which need to be used into the BoT Rescheduling

algorithms. These metrics are used to trigger the rescheduling of tasks, but just

when needed. Apart from that, they are useful to decide over which resources

they have to be applied. Consequently, the computational time needed to com-

plete the rescheduling process may be shorter.

Along with the improved framework, comparisons between different fragmen-

tation metrics and about whether or not using the rescheduling strategies are in-

cluded. This comparison highlights the importance of performing these resche-

duling processes so that a higher number of jobs can be allocated into resources.

Therefore, not only the use of resources but also the QoS perceived by users are

5.6. Summary 129

improved. The importance of accurately measure the status of a Grid system, by

using different techniques that measure the fragmentation, is also highlighted.

CHAPTER

6Improving Grid QoS by means of

Adaptable Fair Share Scheduling

Federated Grid resources typically span multiple administrative domains and

utilize multiple heterogeneous schedulers, which complicates not only provi-

sioning of quality of service but also management of end–user resource utiliza-

tion quotas. The system developed and detailed in previous chapters (SA-Layer)

does not have any mechanism to deal with different resource usage policies. To

overcome these problems, this chapter proposes a solution based on the combi-

nation of the predictive SA-Layer meta-scheduling framework and a distributed

fairshare job prioritization system.

The SA-Layer is designed to provide scheduling of jobs in advance by use of

heuristics and prediction methods. The aim of SA-Layer is to ensure resource

availability for future job executions, and as such, the system provides quality

of service to end–users in terms of fulfillment of job deadlines. The fairshare job

prioritization system, FSGrid [4], provides a distributed system for decentral-

ized management of resource allocation policies and an efficient mechanism for

fairshare-based job prioritization.

The integrated architecture presented combines the strengths of both sys-

tems, providing a scheduling solution that improves end–user quality of service

131

132Chapter 6. Improving Grid QoS by means of Adaptable Fair Share Scheduling

by managing reliable resource allocations adhering to usage allocation policies

whilst also improving the performance of both systems.

6.1 Introduction

Grids are distributed systems that enable coordinated use of dispersed hetero-

geneous resources. Federation of computational resource in Grids facilitates the

existence of large–scale parallel applications in science, engineering and com-

merce [1]. A core feature of Grids is that the systems are comprised of resources

shared among several organizations that maintain site independence and auton-

omy [2]. As such, Grids are highly variable systems in which resources may join

or leave the systems at any time. This variability makes QoS regarding job dead-

lines highly desirable but very difficult to achieve in practice. One reason for

this limitation is the lack of central coordination of the system. This is especially

true in the case of the networks that connect the various components of a Grid

system. Thus, achieving good end–to–end QoS is difficult, as without resource

reservations guarantees of QoS are hard to satisfy. In real Grid environments,

reservations are not always feasible as not all Local Resource Management Sys-

tem (LRMS) permit them. In addition, there are types of resources, e.g., network,

which may lack global management entities making the reservation of resource

capacity infeasible.

A key idea to solve the scheduling problem is to ensure that a specific re-

source is available when a job requires it, and to this end, it was developed the

SA-Layer [60] [111] presented in Chapters 4 and 5. The SA-Layer is a schedu-

ling system that performs meta-scheduling of jobs in advance through efficient

allocation techniques and prediction heuristics for job durations and resource

status (including network). This system improves resource utilization and the

QoS provided to Grid users by ensuring that jobs finish on time. However, the

system does not take user priority into account when scheduling jobs. Hence, in

this chapter, improving Grid resource utilization QoS is addressed by combining

our approach to meta-scheduling jobs in advance with a distributed mechanism

for fairshare job prioritization. The resulting system distributes resource ac-

cess according to pre-specified resource allocation policies and thus improves

resource utilization QoS from the end–user point of view.

6.2. Improving end-user QoS: Sample Scenario 133

A number of scheduling systems exists that support fairshare prioritiza-

tion of jobs, e.g., Maui [52] and Simple Linux Utility for Resource Management

(SLURM) [98]. These are however typically not designed to support Grid en-

vironments that span multiple administrative domains, utilize heterogeneous

schedulers, and require support for site autonomy in allocation policies, but are

typically limited to enforcing usage quotas and operating on usage data from

within ownership domains. Regarding to the Grid environments, FSGrid [4] is

a system for decentralized fairshare job prioritization that operates on global

(Grid-wide) usage data and provides fairshare support to resource site sched-

ulers operating across ownership domains. In essence, FSGrid calculates job

execution prioritizations for users, projects, and virtual organizations, and can

thus be used to complement SA-Layer as a job scheduling order mechanism.

The integration of these two mechanisms provides improved end–user QoS

not only in terms of jobs finishing in time to meet deadlines, but also in dis-

tributed resource utilization quotas influencing job scheduling order to improve

scheduling fairness. Then, resource usage is improved and balanced by taking

into account pre-defined usage allocation policies.

The rest of the chapter is structured as follows. First, the motivation and a

sample scenario are presented in Section 6.2. In the next section, the FSGrid

(Section 6.3) system is introduced. Then, in Section 6.4 a new architecture with

both systems working together is presented. After that, Section 6.5 details a

performance evaluation investigating the goodness of the presented approach.

Finally, Section 6.6 outlines a brief of the whole chapter.

6.2 Improving end-user QoS: Sample Scenario

As mentioned before, SA-Layer does not have any kind of user prioritization,

so jobs are scheduled in the order they arrive. It must be noted that this fact

does not mean that first scheduled jobs are going to be executed first. This

order depends on the time constraints of jobs, the previous allocated jobs, the

status of the resources, and so forth. This fact is depicted in Figure 6.1. As

illustrated, the first submitted job is not going to be executed first (due to start

time restrictions). The fact that User 1 has already submitted jobs before does


Figure 6.1. Scheduling Process using SA-Layer.

not have any influence on the order in which jobs are going to be submitted next.

User 2 and 3 have to wait as they submitted their jobs execution requests after

User 1. This results in User 1 having 100% of jobs allocated whilst the other two

just have 50% success. So, it does not matter if User 2 or 3 have higher priority

than User 1, all jobs are scheduled in the order they arrive. Hence, mechanisms

to deal with fairness in scheduling in a way that leads to fair resource usage are

desirable.

This is the motivation to include a system which could be in charge of this

fairshare scheduling. In this Thesis, the FSGrid system is selected as it is a

scalable distributed fairshare system capable of prioritization of users at multi-

ple levels (e.g., among users, projects or virtual organizations). However, there

exist others, such as Fair Execution Time Estimation (FETE) scheduling [101],

which constitutes a version of Grid fairshare scheduling where jobs are sched-

uled based on completion time predictions. This is similar to scheduling in time-

sharing systems and the focus of this work lies at minimizing risk for missed job

deadlines. However, it is evaluated by using a simulated environment assuming

that tasks get a fair share of the resource’s computational power. Additional

algorithms for fair scheduling in Grids are presented in [102].

The allocation process described in Chapter 4, Section 4.2 has been slightly

modified in order to take into account the information provided by the FSGrid

system. New usage scenario of an allocation process is depicted in Figure 6.2. In

6.2. Improving end-user QoS: Sample Scenario 135

Figure 6.2. Meta-Scheduling in Advance Process.

this figure, as before, several administrative domains are represented (the three

bubbles) which have several users submitting jobs to resources through sev-

eral meta-schedulers. In each administrative domain there is an entity, named

Gap Management, in charge of managing the current and future usage of the

resources of that domain (taking into account Grid user’s usage). When there

is more than one meta-scheduler per domain, all must communicate with the

same Gap Management entity for that domain. There may be one or more FSGrid

servers having the information about users, projects and virtual organizations

prioritizations of the whole system. It must be noted that there may be one

FSGrid server per resource site, including several administrative domains and

virtual organizations, but there may also be several FSGrid servers (one per re-

source site) managing the information about the same virtual organizations. In

this way, resource sites mount global usage policies onto local policies, propa-

gating transparently the allocation updates to the other resource sites. Hence,

the new steps for the SA-Layer meta-scheduling in advance process (Figure 6.2)

are:

1. A user sends a request to the local meta-scheduler providing a tuple with

information on the application and the input QoS parameters. Again, as

outlined in previous chapters, in this approach the input QoS parameters


are just specified by the start time and the deadline. This request waits in

the jobs pool until it is chosen to be scheduled.

2. FSGrid sorts the job pools by taking into account job ownership (what users

submitted the job requests) and user usage histories and priorities.

3. The meta-scheduler selects the first job of the pool to allocate it in the same

way as in Chapter 4. That is, it communicates with the Gap Management

entity that executes a gap search algorithm to obtain both the resource and

the time interval to be assigned for the execution of the job.

4. If it is not possible to fulfill the user’s QoS requirements using the resources

of its own domain, communication with meta-schedulers from other do-

mains starts. Techniques based on P2P systems (as proposed by [128] [131],

among others) can be used to perform the inter-domain communications

efficiently. For scalability reasons, each meta-scheduler does not have com-

plete knowledge of all the meta-schedulers in the system.

5. If it is still not possible to fulfill the QoS requirements, a renegotiation

process is started between the user and the meta-scheduler to redefine QoS

requirements. This renegotiation, as well as the overall interaction with

users, may be conducted by means of Service Level Agreements (SLAs) [35]

[134].

A sample scenario illustrating the way SA-Layer submits jobs when supported

by FSGrid (as opposed to Figure 6.1, which shows the original SA-Layer behav-

ior), is depicted in Figure 6.3. As Figure 6.3 illustrates, the FSGrid user prioriti-

zation system updates user priority and (re)sorts the job pools accordingly after

each allocation decision.

Hence, when using both systems together, jobs are not executed taking into

account just their arrival time but also considering the user who sent them.

Thus, the way of using resources is improved from the user point of view as they

can execute a different number of jobs taking into account previous jobs already

executed (or submitted) as well as user priority (defined in system usage alloca-

tion policies). These diagrams illustrate end–user fairness for a scenario where

all users have the same priority. With both systems combined, all users get to

schedule the same amount of jobs. However, if the policy were different (e.g.,

6.3. FSGrid 137

Figure 6.3. Scheduling Process with SA-Layer and FSGrid integrated. Jobs arescheduled (but not executed) in order of user priority.

User 3 would have more priority than the other two users) then users with more

priority would execute more jobs than the ones with lower priority (jobs of User 3

would be executed before the others). Hence, the integration of the two systems

lends the possibility of not only ensuring job executions within deadlines, but

also of providing different and specific QoS per user, project and virtual organi-

zation. It must be noted that not all jobs need to have time constraints. If users

do not need or want time restrictions for jobs, they may submit jobs without,

which then will be scheduled in a best effort way. This feature highlights the

necessity of having a job prioritization system that decides what job is to be the

next to be submitted. A brief description of FSGrid is given next.

6.3 FSGrid

One of the key mechanism used to affect how resource capacity is distributed

in scheduling environments is job prioritization. Whilst schedulers like the

SA-Layer system determine when jobs are run, prioritizers determine in what

order jobs should be run (scheduled) to meet a specific objective function. In

fairshare scheduling environments, resource capacity allocations are specified

as quota allocations and the objective function (fairness) is defined in terms of

resource capacity utilization meeting quota allocations. In these environments,

schedulers use a fairshare job prioritizer mechanism to ensure that jobs are


Figure 6.4. FSGrid Architecture [4].

scheduled in an order that ensures that users receive their allocated resource

capacity.

FSGrid [4] defines a decentralized distributed system that extends the con-

cept of fairshare scheduling to Grid environments and defines mechanisms for

fair user prioritization and scheduling of jobs in federated resource environments

(more in-depth details in [4]).

The FSGrid architecture (depicted in Figure 6.4) is realized as a web service–

based Service Oriented Architecture (SOA) and can be integrated with existing

scheduling environments such as Maui or SLURM with a minimum of intrusion.

Integration points are exposed as services for policy specification, usage data

storage, and fairshare calculation. For integration with local scheduling systems

that lack global (Grid-level) mappings between job owners and jobs, FSGrid also

provides an optional interface for job ownership resolution. The FSGrid archi-

tecture is designed to facilitate distributed load balancing and pre-computation

and caching of all computational states within the system.

The policy model used in FSGrid is based on the concept of organization of

end–users in Virtual Organizations (VOs) [5] that autonomously specify recursive

policy trees that define hierarchies and quotas for users, projects, and organi-

zations. Thus, the definition of fairness used here details system–wide resource

utilization to converge to predefined resource capacity allocations over time. The

policy model of FSGrid is illustrated in Figure 6.5, and defines resource capacity

allocations in a tree format that expresses usage allocations in fraction shares of

resource capacity. This construction virtualizes the current capacity allocation

and decouples policy allocations from the actual resource capacity metrics used

in scheduling. FSGrid can utilize any metric for resource capacity, e.g., arbi-

trary combinations of CPU time, wall clock time, storage requirements, etc., but

6.3. FSGrid 139

Figure 6.5. A FSGrid policy tree.

requires that metrics used are homogeneous or comparable between resource

sites contributing resource usage data. To this end, in this Thesis, computa-

tional time has been selected as the comparable metric to perform a fairshare

usage of resources.

The tree-based format of FSGrid resource allocations allows policy defini-

tion to be recursively delegated to Resource Sites (RS), VOs, and projects within

VOs (PX ). Note that projects may contain users (UX ) as well as sub-projects,

for instance /VO2/P2/P3. Moreover, it is possible to define local queues and

groups of users that are not defined in VOs, e.g., user /LQ/UX. In FSGrid, re-

source site administrators define local policy trees for resource sites and mount

global (distributed) policy component trees for VOs onto branches of local policy

trees. FSGrid provides mechanisms for distributed access to policy components,

which can further be subdivided and administrated by VOs and VO entities (e.g.,

project administrators). Policy component updates are transparently propagated

to resource sites and automatically updated in fairshare calculations.

The fairshare calculation algorithm of FSGrid is based on comparing policy

trees to usage trees, which are identical in structure to policy trees but contain-

ing actual usage data rather than usage allocation quotas. A set of tree compar-

ison operators are combined to produce a customizable mechanism for fairshare

based job prioritization. To limit and modulate the influence of historical usage

information on fairshare calculations, FSGrid defines a structure where usage

data is organized in time-resolved user–level histograms (e.g., storing all known

usage data for a specific user in a histogram where each bin contains a sum-

mary of that user’s resource capacity usage for a specific day). To allow site ad-

ministrators greater control of usage data influence, FSGrid defines individually

configurable usage decay functions that can be used to modulate the impact of


usage data on fairshare. FSGrid supports and automatically adapts to dynamic

updates in usage policies and usage data.

Comparisons between policy allocations and usage data are expressed in fair-

share trees that also inherit structure from policy trees and contain all fairshare

information for a resource site in a single structure. The tree-based FSGrid

fairshare load balancing algorithm is very efficient, and can pre-computate and

cache fairshare state data for entire virtual organizations. For comparison of jobs

in scheduling prioritization, FSGrid utilizes an algorithm that extracts fairshare

vectors from paths in fairshare trees. Fairshare vectors contain all pertinent in-

formation for comparison of usage status for job owners, and facilitates ranking

of jobs on multiple fairshare levels simultaneously [4]. With this computational

structure, prioritization of jobs is reduced to (lexicographic or arithmetical) or-

dering of fairshare vectors associated with jobs.

However, two major factors that impact the speed of FSGrid convergence (the

convergence of usage consumptions to policy usage allocations) are usage cost

variance (differences in job lengths) and usage update latencies. Usage cost vari-

ance stems from differences in job lengths and are in general unavoidable – jobs

will get different run lengths due to differences in computations, resource avail-

ability, resource capacity, and even variations in resource capacity (in shared

systems or resource elasticity in paravirtualized resources such as those in

certain Cloud systems). Usage update latencies stem from the total cost (e.g.

run length) of a job being unknown until the job is successfully executed and

processed. Until the usage cost of a job is known, and reported to the Usage

Statistics Service (USS), FSGrid is unaware that the job exists and does not fac-

tor it into fairshare usage allocation enforcement calculations. Factors such as

scheduling costs, data transmission overhead, and storage requirements may be

factored into usage costs as well, further complicating the calculations.

FSGrid is a fully decentralized system where resource site administrators

individually determine what VOs to contribute resource capacity to, as well as

the relative amount of resource capacity to contribute to each VO. The system

fully preserves resource site autonomy and is devoid of central coordination.

6.4. Integrated Architecture 141

6.4 Integrated Architecture

As both systems work at different levels and try to manage QoS from different

points of view, they can work together to try to enhance the overall QoS perceived

by Grid users. In this way, QoS can be addressed in terms of jobs finishing on

time, taking into account previous and scheduled usage of the system. What is

more, usage policies may be set (not all users having the same priority) to take

into account the different users existing in the system, as well as the different

projects and virtual organizations.

In addition, FSGrid may take advantage of the SA-Layer usage cost predic-

tions to improve the convergence of resource utilization to allocation quotas.

This means that less time is needed to achieve correct fairshare values. If the us-

age policy is changed dynamically, the system is going to adapt itself to achieve

the new fairshare values quite fast, even without having completed any job exe-

cutions.

When scheduling is done in environments with resource queues (where mul-

tiple jobs are queued on resources prior to execution), the time until the cost

of a job is known may vary substantially, leading to significant perturbations of

FSGrid convergence. To provide better QoS in scheduling and resource pool load

balancing, SA-Layer computes predictions of job execution times as part of the

meta-scheduling process. To increase the efficiency of the FSGrid fairshare job

prioritization (i.e., increase FSGrid convergence rates), SA-Layer job execution

time predictions are used as estimates of job usage costs and reported to FSGrid

when individual jobs are scheduled. Predictions are later replaced with actual

job usage costs when they become known (after job completion).

A snapshot of this new architecture with both systems working together is

depicted in Figure 6.6. When a user sends a job execution request, it is stored

into a job pool. This pool is sorted by taking into account current users prior-

ities obtained through the Fairshare Calculation Service of the FSGrid system.

Then, the SA-Layer schedules the first job of the highest prioritized user. Dur-

ing the scheduling process carried out by SA-Layer, the predicted information

for the scheduled job is sent to the USS of the FSGrid system. This propagates

scheduling information as quickly as possible, facilitating fast convergence of

the FSGrid fairshare values. Finally, when the scheduled job finishes its exe-


Figure 6.6. SA-Layer and FSGrid systems integrated.

cution, the actual information about the usage cost of that execution is sent to

the FSGrid system to replace the previously predicted information. In this way,

more accurate fairshare values may be obtained as, at the end, actual execu-

tions times are used. The predictions are only used whilst the system is waiting

for the actual usage cost. In cases when the user cancels a job, or the job is not

executed fulfilling the agreed QoS, the predicted information sent to the USS of

FSGrid is also updated in order to remove the erroneously predicted information.

6.5 Performance Evaluation

To evaluate the performance and characterize the behavior of the proposed sys-

tem, a number of experiments are undertaken in a Grid environment testbed.

6.5.1 Testbed

In this case, the evaluation testbed consists of resources located at three dif-

ferent Universities across Europe, in the same way as in previous chapter (Fig-

ure 6.7). The only difference is that at the University of Umeå (UmU), Sweden,

there is also a FSGrid server in charge of the users prioritization.

6.5. Performance Evaluation 143


6.5.2 Workload

As in previous chapters, the 3node test of the GRASP [115] benchmarks, is used

for testing our implementation.

To evaluate the performance of both presented systems working together,

3node jobs are submitted with different input parameters for 7 different users

which also have different usage policies (see Figure 6.5). To this end, the com-

pute_scale parameter of the 3node test is set by following an uniform distribution

function taking values between 0 and 20. The total amount of bytes to be trans-

ferred in each execution is set to 100 MB. The rationale of that is because this

is an intermediate size whose transfer is quite fast when using local resources

(UCLM resources) and rather slow when using external resources (UNED or UmU

resources).

6.5.3 FSGrid Convergence Rate

The impact of utilizing job usage cost predictions in meta-scheduling environ-

ments with resource queues is illustrated in Figure 6.8, which depicts experi-

ments using a workload made by submitting 3node jobs to force that there are


0

20

40

60

80

100

0 2000 4000 6000 8000 10000

Rel

ativ

e U

sag

e (%

)

Jobs

/VO1/LQ

/VO2

(a) Using actual job costs.

0

20

40

60

80

100

0 500 1000 1500 2000 2500

Rel

ativ

e U

sag

e (%

)

Jobs

/VO1/LQ

/VO2

(b) Combining predicted and actual job costs.

Figure 6.8. FSGrid convergence rates for an isolated policy tree subgroup.

always jobs belonging to each user in the job pool. Jobs are submitted continu-

ously up to the convergence of the system, since the aim of this test is to measure

the convergence speed. The rationale is that if there are only jobs from one user

in the job pool, the system is going to schedule those jobs even if his fairshare

vector is low (as there are no other users competing for resource usage).

Figure 6.8 shows the summarized resource usage for the first level depicted

at Figure 6.5. Hence, VO1 should reach relative resource usage of 50 %, whilst

for LQ and VO2 they should be around 20 % and 30 %, respectively. In the

illustration, the relative distribution of number of jobs scheduled is plotted as a

function of total number of jobs scheduled.

As illustrated in Figure 6.8 (a), usage cost update latencies delay FSGrid

convergence and require the system to process many jobs before it can reach a

6.5. Performance Evaluation 145

balanced usage state. As illustrated in Figure 6.8 (b), incorporation of SA-Layer

job cost predictions (which are reported to FSGrid during the scheduling phase)

significantly improves the FSGrid convergence rate. It must be noted that, for

this last Figure, the represented x–axis scale only goes to 2500 as the system

converged long before. While SA-Layer provides high quality estimates of usage

costs it is worth noting that even poor predictions substantially improve FSGrid

convergence rates. This is due to predicted values effectively displacing the main

FSGrid convergence noise from usage update noise to usage cost variance noise,

which has substantially lower impact on overall FSGrid convergence. Further

treatment of FSGrid convergence and noise models is available in [4].

6.5.4 Quality of Service

To highlight the impact of using FSGrid to fairly distribute resource usage (from

the end–user point of view), another workload is used. In this case, all users

submit identical jobs simultaneously. This is repeated up to 150 different jobs

per user (in this example, for a total of 1050 jobs).

In these experiments, the functionality of the SA-Layer with and without the

FSGrid job prioritization system is compared. To this end, the used metric is

the percentage of jobs whose allocations fail. This metric may be defined as

the number of jobs whose allocations fail due to the fact that the system does

not have enough free time slots to execute them within their time constraints.

Therefore, it is a metric related to the resource utilization QoS from the end–user

point of view.

As Figure 6.9 (a) depicts, when the FSGrid system is not used, all users

experience a roughly equal failure rate as differences in user priority are not

taken into account. However, when user prioritization is used, the failure rate is

related to the usage policy allocations. Therefore, this figure shows that for users

with higher priority (see Figure 6.5) the failure rate is lower than for those with

lower priority. For instance, user /LQ/UX has a failure rate below 5% whilst

for user /VO1/P1/U4 is above 25%. Figure 6.9 (b) highlights the relationship

between usage policy and failure rate. It shows how failure rate decreases when

the resource usage percentage is increased in the policy.


/LQ/UX /VO1/P1/U2 /VO1/P1/U3 /VO1/P1/U4 /VO1/U1 /VO2/P2/P3 /VO2/P2/U10

5

10

15

20

25

30

Failure

Rate

(%

)

With FSGrid Without FSGrid

0,08 0,1 0,12 0,14 0,16 0,18 0,2

Usage Percentage

0

5

10

15

20

25

30

Failu

re R

ate

(%

)

(a) Failure Rate Per User. (b) Failure Rate Per Percentage.

Figure 6.9. Failure rates.

To sum up, the graphics of Figure 6.9 show that a better QoS is provided to

users when using SA-Layer in conjunction with FSGrid system, as users with

higher priority have fewer jobs that cannot be executed fulfilling the QoS re-

quested. Moreover, this fact, along with the fast convergence rate, gives us the

possibility of providing special QoS to certain users, projects or VOs during a

specific time. Hence, it is possible to address adaptable QoS that fits current

requirements, which means great improvement of the QoS provisioned to users.

6.6 Summary

In this chapter an infrastructure to manage QoS by improving resource uti-

lization is presented. This improvement is based on using FSGrid system to-

gether with the SA-Layer meta-scheduler presented in previous chapters. FS-

Grid improves fairness in scheduling by taking into account usage policies spec-

ifying multi–level resource capacity allocations (i.e., allocations for end–users,

projects and virtual organizations).

Combination of the SA-Layer and the FSGrid systems is shown to improve

end–user resource utilization QoS as resource usage is balanced taking into

account pre-defined resource allocation policies. In addition, use of SA-Layer

job duration predictions is shown to improve the convergence rate of the FSGrid

fairshare prioritization system, resulting in fewer jobs being required to achieve

fairshare resource utilization and end–user usage quota enforcement. In this

way, those usage policies can be changed dynamically and the system will reach

a fairness resource usage quite soon.

6.6. Summary 147

Finally, a performance evaluation of the combined architecture is presented.

This study highlights the benefits of SA-Layer using the information provided by

FSGrid to be able to improve the QoS provided to the users. Moreover, the ben-

efits from the FSGrid point of view are also presented. FSGrid takes advantage

of the predictions made by SA-Layer, and by using this information, it presents

a huge improvement regarding the time needed to reach a fair resource usage.

CHAPTER

7Conclusions, Contributions and

Future Work

This chapter presents the conclusions drawn from this Thesis, reviews the con-

tributions obtained from the work developed, and suggests guidelines for future

research.

7.1 Conclusions

A Grid is a highly distributed system where providing QoS is very difficult due to

several reasons, such as the heterogeneity of Grid resources or the different se-

curity policies of the administrative domains that build the whole environment.

There are several works whose objective is to overcome those difficulties. A com-

pilation of those approaches has been presented along with the identification of

their weak points. Apart from that, other approaches similar to our proposals or

relate to the techniques used to get over the found challenges are studied.

With the aim of developing an open–source middleware for the Grid com-

munity that is capable of addressing QoS, a real Grid environment (based on

Globus and GridWay) has been set up and maintained. In this way, the evalu-

ation results consider natural behavior, the heterogeneity and the dynamism of

149

150 Chapter 7. Conclusions, Contributions and Future Work

Grid resources. Over this infrastructure several modifications have been imple-

mented to make it network–aware and to provide an autonomic behavior which

improves the scheduling decisions.

Anyway, in a Grid environment it is quite difficult to provide any kind of QoS

without making reservation of resources in advance. However, as organizations

sharing their resources in such a context still keep their independence and au-

tonomy [2], and due to the fact that not every resource in a Grid environment

provides this functionality, those reservations are not always feasible. In fact,

there are some kind of resources, such as bandwidth, which may be scattered

across several administrative domains, making their reservations really difficult.

Owing to these facts, our implementation is based on performing scheduling in

advance rather than advance reservations. This means that resources and time

periods to execute the jobs are selected and taken into account when performing

the next allocation decisions, but without making any physical reservation of the

resources.

Under that scheduling in advance scenario, and based on the idea proposed

by Castillo et al. in [3] [38], efficient algorithms to select a suitable resource

and time period to execute the jobs have been developed, and have been im-

plemented efficient data structures to store the information needed to perform

this scheduling process. This led us to have to develop prediction techniques

to estimate the future status of the resources, network inclusive, as well as im-

plementing several heuristics to estimate the time needed to complete the

job execution in a resource at a specific time in the future.

However, even making this process as accurate as possible, it has some prob-

lems regarding resource utilization. Mainly job rejections due to fragmentation

and/or unfavorable previous decisions. To overcome these issues, two resche-

duling techniques have been developed. The first one is a reactive technique

that moves an already scheduled task in order to try to also accept a new in-

coming job. The second one is a preventive technique that, from time to time,

tries to reduce the fragmentation of the system. To this end, several heuristics

have been implemented to try to better measure the existing fragmentation in

a Grid system at some specific point, and to know whether is useful to apply this

technique or not.

7.2. Contributions 151

Finally, as the developed software does not provide any mechanism to deal

with different users priorities, and as a result of the stay made at the University

of Umeå, work has also been carried out on the integration of SA-Layer with a

system capable of providing the information needed to manage different levels of

QoS depending on the target usage policy (FSGrid [4]).

In addition to these implementation, a performance evaluation of each one of

them has been done and presented. They highlight (1) the advantages of having

a system that autonomously adapt itself to the current system behavior; (2) the

need to make scheduling of resources in advance; (3) the importance of making

predictions about resources’ status and job durations and the improvement ob-

tained by using them; (4) the necessity of rescheduling techniques that improve

the resource usage by better allocating the jobs when as the system has more

information about all the jobs to be executed; (5) and the benefits of being able

to deal with different users priorities when the system is overloaded.

7.2 Contributions

The work carried out during this Thesis has produced the following contribu-

tions:

• Development of an autonomic network–aware meta-scheduler over Grid-

Way. This is presented in Chapter 3, where two different modification to

GridWay are presented. First, the methodology followed to include network

information into GridWay by using the Iperf tool is detailed. Subsequently,

another proposal is presented which focuses on the implementation of an

autonomic network–aware meta-scheduling architecture that is capable of

adapting its behavior to the current status of the environment, so that

jobs can be efficiently mapped to computing resources. This proposal uses

concepts from autonomic computing to react to changes in the status of

the system in order to perform meta-scheduling more efficiently. In this

way, by using this information, the GridWay meta-scheduler has knowl-

edge about how trustable a resource behavior is. Consequently, it can

choose a more reliable resource to submit the job.


• Development of a meta-scheduler in advance system (SA-Layer) over

GridWay. This is presented in Chapter 4, where the predictive framework

built on top of Globus and the GridWay meta-scheduler, named SA-Layer,

is detailed. It manages QoS by means of performing meta-scheduling of

jobs in advance. SA-Layer manages idle/busy periods of resources in order

to choose the most suitable one for each job by using red–black trees as a

data structure. It uses heuristics that consider the network as a first level

resource and it makes different estimations for the time needed to complete

the transfers and the execution times.

• Development of predictive techniques over SA-Layer. This is also pre-

sented in Chapter 4, where different complement prediction techniques are

detailed. A predictive module to figure out the future status of resources

and interconnection network has been implemented. This prediction tech-

nique is based on an Exponential Smoothing function which takes into

account the previous status of the system resources to try to figure out

what will be the future performance of them. An autonomous behavior is

added by means of computing a trust value for each resource by also taking

into account their previous behaviors.

• Development of rescheduling techniques over the SA-Layer to improve

resource utilization. This is presented in Chapter 5, where the problems

related to the scheduling techniques detailed in Chapter 4 are outlined.

Fragmentation appears as a well known effect of every allocation process

and may become the cause of poor resource utilization. Apart from that,

there may also be job rejection caused by unfavorable previous decisions

due to the inability to foresee the future. For these reasons, two techniques

have been developed to tackle fragmentation problems, which consists of

rescheduling already scheduled tasks with the aim of reducing the frag-

mentation by having more information in the allocation process. Under

such scenario, knowing the status of the system is a must. In this way,

different metrics are developed aiming at measuring the fragmentation of

the system.

• Improving SA-Layer functionality by using an adaptable fairshare job

prioritization system (FSGrid). This is presented in Chapter 6, where

the integration of the SA-Layer with a fairshare job prioritization system,

7.3. Publications 153

named FSGrid, is detailed and evaluated. FSGrid provides a distributed

system for decentralized management of resource allocation policies and

an efficient mechanism for fairshare-based job prioritization. Hence, the

integrated architecture focuses on enhancement of resource utilization QoS

through the combination of both systems. Thus, the new architecture com-

bines the strengths of both systems and improves perceived end–user QoS

by providing reliable resource allocations adhering to usage allocation poli-

cies.

7.3 Publications

The work carried out during this Thesis has produced the following publications.

They include three international journal paper, several international conference

papers and several national conference papers.

Also, two technical reports have been published, which are first draft of pa-

pers submitted for publication in conferences and journals. This has been done

in order to gain visibility and obtain faster feedback. Moreover, works submit-

ted for publication are also mentioned, as well as several publication which are

related to or based on the contributions made along this Thesis. Finally, addi-

tional contributions regarding the developed software, and the direction of some

final degree projects and a Master Thesis are outlined.

7.3.1 Journal papers

• Luis Tomás, Agustín Caminero, Blanca Caminero and Carmen Carrión.

Network–aware meta-scheduling in advance with autonomous self-tuning

system. Future Generation Computer Systems – The International Jour-

nal of Grid Computing Theory Methods and Applications, Elsevier, Holland,

ISSN: 0167–739X, Volumen 27, pages 486–497, 2011.

Impact 2010: 2,365. This journal is 9/97 (Q1) in JCR 2010, in the field

Computer Science, theory and methods.

This publication presents a framework built on top of Globus and the

GridWay meta-scheduler to improve QoS by means of performing meta-


scheduling of jobs in advance, named SA-Layer. This framework manages

idle/busy periods of resources in order to choose the most suitable resource

for each job. Besides, no prior knowledge on the duration of jobs is re-

quired, as opposed to other works using similar techniques. SA-Layer uses

heuristics that consider the network as a first level resource and presents

an autonomous behavior so that it adapts to the dynamic changes of the

Grid resources. The autonomous behavior is obtained by means of com-

puting a trust value for each resource and performing job rescheduling.

• Luis Tomás, Agustín Caminero, Omer Rana, Carmen Carrión and Blanca

Caminero. A GridWay-based Autonomic Network–Aware Metascheduler. Fu-

ture Generation Computer Systems – The International Journal of Grid

Computing Theory Methods and Applications, Special Issue on Quality of

Service in Grid and Cloud Computing, In Press. Elsevier, Holland, ISSN:

0167–739X, 2012.


Computer Science, theory and methods.

This publication presents the implementation of an autonomic network–

aware meta-scheduling architecture which is capable of adapting its behav-

ior to the current status of the environment, so that jobs can be efficiently

mapped to computing resources. The implementation extends the widely

used GridWay meta-scheduler and relies on Exponential Smoothing to pre-

dict the execution and transfer times of jobs. An autonomic control loop

(which takes account of CPU use and network capability) is used to alter

job admission and resource selection criteria in order to improve overall job

completion times and throughput.

7.3.2 International conference papers


Studying the influence of network-aware Grid scheduling on the performance

received by users. In Proceedings of the International Conference on Grid

computing, high-performAnce and Distributed Applications (GADA). Mon-

terrey, México. November, 2008. ISBN: 978-3-540-88870-3.

Quality indicator: CORE: C; paper referenced in DBLP.


In this paper an extension to the GridWay metascheduler to perform sche-

duling considering the network status is presented.

• Luis Tomás, Agustín Caminero, Blanca Caminero, Carmen Carrión. Im-

proving GridWay with Network Information: Tuning the Monitoring Tool. In

Proceedings of the 6th High-Performance Grid Computing (HPGC) Work-

shop, in conjunction with International Parallel and Distributed Processing

Symposium (IPDPS). Rome, Italy. May, 2009. ISBN: 978-1-4244-3750-4.

Quality indicator: CORE: C; paper referenced in DBLP. Acceptance rate of

23% (IPDPS rate).

This paper presents an evaluation and tuning of the overhead produced by

the network monitoring tool used in [112].

• Luis Tomás, Agustín Caminero, Blanca Caminero, Carmen Carrión. Ad-

vanced Meta-Scheduling using Red–Black Trees in Heterogeneous Grids En-

vironments. In Proceedings of the 7th High-Performance Grid Comput-

ing (HPGC) Workshop, in conjunction with International Parallel and Dis-

tributed Processing Symposium (IPDPS). Atlanta, USA. April, 2010. ISBN:

978-1-4244-6441-8.

Quality indicator: CORE: C; paper referenced in DBLP. Acceptance rate of

24.1% (IPDPS rate).

This publication presents the first version of the SA-Layer. A meta-schedu-

ler in advance system based on red–black trees data structures to manage

the idle/busy periods of resources and with a simple method to obtain pre-

dictions about jobs durations into resources.

• Luis Tomás, Agustín Caminero, Blanca Caminero, Carmen Carrión. Us-

ing Network Information to Perform Meta-scheduling in Advance in Grids.

In Proceedings of the 16th International Euro–Par Conference (Euro–Par

2010). Lecture Notes in Computer Science, Part I Volumen 6271/2010, pp.

431–443. Ischia, Italy. September, 2010. ISBN: 978-3-642-15276-4.

Quality indicator: CORE: A; paper referenced in DBLP and Scopus. Overall

acceptance rate of 35%. Specific track acceptance rate of 28%.

This article is based on improving the SA-Layer functionality by considering

the network as a first level resource. In this way, it is included into the


SA-Layer a new technique to better estimate the time needed to complete

the job executions by making different estimations for the transfers and

execution times.

• Luis Tomás, Agustín Caminero, Blanca Caminero, Carmen Carrión. Expo-

nential Smoothing for network–aware meta-scheduler in advance in Grids.

In Proceedings of the International Workshop on Scheduling and Resource

Management on Parallel and Distributed Systems (SRMPDS), in conjunc-

tion with the Intl. Conference on Parellel Processing (ICPP). San Diego,

USA. September, 2010. ISBN:978-0-7695-4157-0.

Quality indicator: CORE: C; paper referenced in DBLP and Scopus. Accep-

tance rate of 32% (ICPP rate).

This paper presents a new version of the SA-Layer that uses exponential

smoothing functions to predict the future status of the resources and in-

terconnection networks when estimating job durations. In this way, the

time needed to complete the jobs are better estimated.

• Luis Tomás, Agustín Caminero, Blanca Caminero, Carmen Carrión. Ad-

dressing Resource Fragmentation in Grids Through Network–Aware Meta-

Scheduling in Advance. In Proceedings of the 11th International Sym-

posium on Cluster, Cloud and Grid Computing (CCGrid 2011). Newport

Beach, USA. May, 2011. ISBN: 978-0-7695-4395-6. BEST POSTER AWARD.

Quality indicator: CORE: A; paper referenced in DBLP and Scopus. Accep-

tance rate of 29.1%.

This publication presents a new technique within SA-Layer to tackle frag-

mentation problems regarding the allocation process which may lead to a

poor resource utilization. This technique consists of rescheduling already

scheduled tasks. To this end, heuristics are implemented to calculate the

intervals to be replanned and to select the jobs involved in the process.

Moreover, another heuristic is implemented to put rescheduled jobs as

close together as possible to minimize the fragmentation.

• Luis Tomás, Agustín Caminero, Blanca Caminero, Carmen Carrión. A

Strategy to Improve Resource Utilization in Grids Based on Network–Aware

Meta-Scheduling in Advance. In Proceedings of the 12th IEEE/ACM In-


ternational Conference on Grid Computing (Grid 2011). Lyon, France.

September, 2011. ISBN: 978-0-7695-4572-6.

Quality indicator: CORE: A; paper referenced in DBLP.

This paper presents a more in-depth study and evaluation about the Bag

of Task rescheduling techniques presented in [139].

• Luis Tomás, Per-Olov Östberg, Blanca Caminero, Carmen Carrión, Erik

Elmroth. An Adaptable In–Advance and Fairshare Meta-Scheduling Archi-

tecture to Improve Grid QoS. In Proceedings of the 12th IEEE/ACM Interna-

tional Conference on Grid Computing (Grid 2011). Lyon, France. Septem-

ber, 2011. ISBN: 978-0-7695-4572-6.

Quality indicator: CORE: A; paper referenced in DBLP.

This work focuses on the enhancement of resource utilization QoS through

combination of two systems. Our predictive meta-scheduling framework

(SA-Layer) and a distributed fairshare job prioritization system, named FS-

Grid. The integrated architecture presented in this work combines the

strengths of both systems and improves perceived end–user quality of ser-

vice by providing reliable resource allocations adhering to usage allocation

policies.

7.3.3 National conference papers

• Luis Tomás Bolívar, Agustín Caminero Herráez, Blanca Caminero Herráez,

Carmen Carrión Espinosa. Incorporando información de red en el meta-

planificador GridWay. In Proceedings of the XIX Jornadas de Paralelismo.

Castellón, Spain. September, 2008. ISBN: 978-84-8021-676-0.

This paper presents the modifications made over GridWay to make it net-

work–aware.

• Luis Tomás Bolívar, Blanca Caminero Herráez, Carmen Carrión Espinosa.

Planificación Avanzada en GridWay. In Proceedings of the XX Jornadas de

Paralelismo. A Coruña, Spain. September, 2009. ISBN: 84-9749-346-8.

This publication introduces the first steps to provide scheduling in advance

within the GridWay meta-scheduler.


• Luis Tomás Bolívar, Blanca Caminero Herráez, Carmen Carrión Espinosa.

Meta-Planificación por Adelantado en Grids Heterogéneos. In Proceedings of

the XXI Jornadas de Paralelismo. Valencia, Spain. September, 2010. ISBN:

978-84-92812-49-3.

This article details the basic implementation of the SA-Layer. It is focused

on the data structures used for storing the information about future usage

of resources and the way of accessing to that information.

7.3.4 Technical reports

• Luis Tomás, Agustín Caminero, Blanca Caminero, and Carmen Carrión,

Grid Metascheduling Using Network Information: A Proof–of–Concept Imple-

mentation.. Technical Report, DIAB–08–04–2, Computing Systems Depart-

ment, University of Castilla–La Mancha, Spain, April 30, 2008.

This report presents an in-depth study about the performance of including

network monitoring tools into the GridWay meta-scheduler.

• Luis Tomás, Agustín Caminero, Blanca Caminero, and Carmen Carrión,

Using Network Information to Perform Meta-scheduling in Advance in Grids..

Technical Report, DIAB–10–03–2, Computing Systems Department, Uni-

versity of Castilla–La Mancha, Spain, March 25, 2010.

This report presents the basic functionality and performance of one of the

first versions of SA-Layer – without prediction techniques but making dif-

ferent estimations for execution and transfer times.

7.3.5 Submitted works

Journals papers


On the Improvement of Grid Resource Utilization: Preventive and Reactive

Rescheduling Approaches. Submitted to Journal of Grid Computing, Spe-

cial Issue on High Performance Grid and Cloud Computing, ISSN: 0167–

739X. Impact 2010: 1,556. This journal is 29/97 (Q2) in JCR 2010, in

the field Computer Science, theory and methods.


This article presents two techniques that have been developed to tackle

poor resource utilization, whose main idea consists of rescheduling already

scheduled jobs so that new incoming jobs can be allocated. They deal with

the rejections due to fragmentation and because of unfavorable previous

decisions. In this way, several heuristics are presented to choose the best

job or jobs to be reallocated and when it has to be done.

International conference papers

• Luis Tomás, Per-Olov Östberg, Blanca Caminero, Carmen Carrión, Erik

Elmroth. Addressing QoS in Grids through a Fairshare Meta-Scheduling

In-Advance Architecture. Submitted to 12th International Symposium on

Cluster, Cloud, and Grid Computing (CCGrid), Ottawa, Canada, 2012.

This paper extends the study and results about the integrated architecture

(FSGrid plus SA-Layer) presented in [142].

• Luis Tomás, Blanca Caminero, Carmen Carrión. Improving Grid Resource

Usage: Metrics for Measuring Fragmentation. Submitted to 12th Interna-

tional Symposium on Cluster, Cloud, and Grid Computing (CCGrid), Ot-

tawa, Canada, 2012.

This paper studies different techniques for measuring the existing fragmen-

tation with the aim of improving the BoT rescheduling technique presented

in [140].

7.3.6 Related contributions

Journals papers

• Javier Conejero, Luis Tomás, Carmen Carrión, Blanca Caminero. QoS

Provisioning with Meta-Scheduling in Advance within SLA-based Grid Envi-

ronments. Computing and Informatics, In Press.


Computer Science, Artificial Intelligence.

This publication presents the mechanisms needed to manage the commu-

nication between the users and the SA-Layer system presented in [60]. To


this end, those mechanisms are presented and implemented through SLA

contracts based on the WS-Agreement specification.

International conference papers

• Javier Conejero, Luis Tomás, Carmen Carrión, Blanca Caminero. An SLA-

based Meta-Scheduling in Advance System to Provide QoS in Grid Environ-

ments. In Proceedings of the 5th Iberian Grid Infrastructure Conference

(Ibergrid 2011). Santander, Spain. Jun, 2011. ISBN: 978-84-9745-884-9.

This article introduces the necessity of development an entity in charge of

establish the agreements between users and the entities which manage the

Grid resources. The mechanisms presented to this end are implemented

through SLA contracts based on the WS-Agreement specification.

• Javier Conejero, Luis Tomás, Carmen Carrión, Blanca Caminero. Differen-

tiated QoS in Grids supported by SLAs. In Proceedings of the 9th Intl. Work-

shop on Middleware for Grids, Clouds and e–Science (MGC), in conjunction

with the 12th Intl. Middleware Conference. Lisbon, Portugal. December,

2011. ISBN: n/a.

Quality indicator: CORE: C.

This paper presents a framework to negotiate SLAs between users and Grid

service providers, where the QoS expected by users is clearly defined in

three levels. This levels are used to classify the importance of each SLA

and deal with the confidence that Grid resources can provide.

National conference papers

• Javier Conejero, Luis Tomás, Carmen Carrión, Blanca Caminero QoS en

Entornos Grid mediante un Sistema de Meta-planificación por Adelantado

basado en SLAs. In Proceedings of the XXII Jornadas de Paralelismo. La

Laguna, Spain. September, 2011. ISBN: 978-84-694-1791-1.

This paper presents the infrastructure proposed to address the QoS pro-

visioning to users through Service Level Agreements following the WS-

Agreement specification.

7.4. Funds 161

7.3.7 Additional contributions

• The software developed along this Thesis is available at web page: http://

www.i3a.uclm.es/raap/gridcloud/SA-Layer.

In that web page, the main characteristics of the software are outlined. The

aim of this is to gain visibility amongst the Grid community.

• Direction of Final Degree Project: “Estudio y Evaluación de la Herramienta

de Monitorización NWS en un Sistema Grid”. Student: D. Francisco Javier

Conejero Bañón. July 2009. Escuela Superior de Ingeniería Informática,

University of Castilla–La Mancha (UCLM).

• Direction of Final Degree Project: “Planificación avanzada en Grids: Uso del

árbol rojo–negro para los algoritmos de alojamiento de trabajos”. Student:

D. Angel Codón Ramos. September 2009. Escuela Superior de Ingeniería

Informática, University of Castilla–La Mancha (UCLM).

• Direction of Master Thesis: “Desarrollo de técnicas escalables para el des-

cubrimiento de información en sistemas paralelos distribuidos”. Student: D.

Ismael García Pérez. October 2011. National University of Distance Educa-

tion (UNED).

7.4 Funds

The present Thesis has been carried out thanks to the funds received from a

number of projects and grants. They are classified into national and regional

projects.




7.4.1 National projects

Project title: High-performance, Reliable Architectures forData Centers and Internet Servers

Funding entity: Consolider-Ingenio 2010 Program

Code: CSD2006-46

Participants: University of Castilla–La Mancha, PolytechnicUniversity of Valencia, University of Murcia,University of Valencia

Length: from October 2006 to December 2011

Main researcher: Dr. Francisco J. Quiles Flor (UCLM subproject)

Number of researchers: 80

Total price of the project: 3,500,000 (1,038,000 euros for UCLM)

Project title: Server architectures, applications and services

Funding entity: Ministerio de Ciencia e Innovación (MICINN)

Code: TIN2009-14475-C04-03 (TIN subprogram)

Participants: University of Castilla–La Mancha, PolytechnicUniversity of Valencia, University of Murcia,University of Valencia

Length: from October 2009 to October 2012

Main researcher: Dr. Francisco J. Quiles Flor (UCLM subproject)


Total price of the project: 407,800 euros (UCLM only)

7.4. Funds 163

7.4.2 Regional projects

Project title: Improvement of the Quality of Service of GridApplications

Funding entity: UCLM

Code: PBI08-0055-2800

Participants: University of Castilla–La Mancha, MAAT G-Knowledge

Length: from March 2008 to December 2010

Main researcher: Dra. María Blanca Caminero Herráez


Total price of the project: 72,000 euros + 70,149 euros (infrastructureFEDER funds)

Project title: MoteGrid: Grid Architecture for DistributedProcessing Information Collected by WirelessSensor Networks

Funding entity: Junta de Comunidades de Castilla–La Mancha

Code: PII1C09–0101–9476

Participants: University of Castilla–La Mancha, University ofMurcia, Complutense University of Madrid

Length: from April 2009 to April 2012

Main researcher: Dra. Carmen Carrión Espinosa


Total price of the project: 150,000 euros


7.5 Collaborations with other research groups

To complete this Thesis, the author realized a research stay, which was hosted

by Professor Erik Elmroth, within the research group on Distributed Systems

(Grid & Cloud, www.cloudresearch.se), which belongs to the Department of

Computing Science at the University of Umeå (Sweden). This stay lasted 3

months from 29th March to 1st July of 2011. During this stay, fairness poli-

cies when performing meta-scheduling were studied and applied. To this end,

interaction between the FSGrid and the SA-Layer was implemented and tested.

As a result, a poster was presented at the 12th IEEE/ACM International Con-

ference on Grid Computing (Grid 2011). Moreover, an extended performance

evaluation of the integrated architecture have been submitted to the 12th Inter-

national Symposium on Cluster, Cloud and Grid Computing (CCGrid 2012).

7.6 Future work

The work presented in this Thesis has led to different ideas for further work.

Some of them are presented the next.

• Study of more sophisticated methods (apart from exponential smoothing

functions) to predict job execution times.

• Network reservations: in this Thesis, focus is placed on the scheduling

process. In the future, addressing the issues related to network reserva-

tions is planned, i.e., to develop a Bandwidth Broker which collaborates

with the meta-scheduler in order to perform network reservations. Us-

ing this approach, the effective bandwidth in between measures could be

better predicted and the estimations would be more accurate. Hence, the

TOLERANCE value would be improved and there would be less options of

choosing a wrong resource due to a misprediction. For this reason, it is a

good point to try to reserve network bandwidth when and where this could

be possible.

• Development of algorithms to schedule data as another resource, with the

aim of improving the time needed for transfers when executing a job, with

www.cloudresearch.se

7.6. Future work 165

the consequent reduction in the execution times. More precisely, jobs may

require multiple pieces of data, which in turn may be replicated on different

storage resources. So, finding an instance of the required pieces of data for

each job, and performing the execution of the job meeting its QoS require-

ments (e.g., the execution deadline), is an interesting research issue. Al-

though some techniques have already been presented (for instance, [143]),

keeping the bandwidth between all the storage resources and the comput-

ing resources creates scalability issues, which must be addressed.

• Improvement of the rescheduling techniques to make them more so-

phisticated and intelligent. To this end, different information (apart from

the start time constraint) may be used when performing the rescheduling

techniques, such as deadline or laxity.

• Comparison of SA-Layer approach with an algorithm which applies real

reservations of resources, whenever possible. For example, clusters man-

aged by Maui [52] supporting reservation of CPUs.

• Deal with workflows by taking into account where each job is located. In

this way, the jobs with file dependencies may be put in the same resource

(or at least, close to it) with the aim of decreasing the time needed to send

those files or even avoiding transfers in some cases.

• Implementation of mechanisms for Fault Tolerance Management. This

is another guideline for current and future work based on trying to fore-

see future problems regarding resources availability and performance. By

using this information some resources could be avoided during some spe-

cific time to prevent job failures which would lead to QoS agreements not

fulfilled.

• Improvement of the FSGrid with network information. To this end, it

may be also useful to develop some kind of fair network usage metric. The

aim of this would be not only to be fair in computational resource usage

but also in network usage, as this may influence the performance of many

other job executions.

• Deployment of the implementation on EGEE resources [10], providing

us with a larger Grid testbed, where resources are more distributed and

more users are involved in submitting jobs to the system.


• SA-Layer Adaptation to the Cloud: the main idea is to adapt the func-

tionality provided by SA-Layer to the cloud infrastructures with the aim

of taking advantages of the features provided by them. On the one hand,

it is desired to modify the prediction techniques to also estimate the time

needed to deploy a virtual machine, to resume it or to stop it. On the other

hand, an adaptation of the rescheduling techniques to reduce the fragmen-

tation generated at the scheduling process must be addressed. To this end,

using virtual machines (live) migrations is a point that makes possible to

take advantages of small amount of slots, otherwise useless, by moving,

from one resource to another and from one period of time to another (live

migration of the application or even the whole virtual machine), some best–

effort applications to use those scattered slots in time and/or resources.

APPENDIX

AAcronyms

AD Administrative Domain

API Application Programming Interface

ANM-ExS Autonomic Network-aware Meta-scheduler

BB Bandwidth Broker

BoT Bag of Tasks

BoT-R Bag of Task Rescheduling

CAC Connection Admission Control

CSF Community Scheduler Framework

DB Database

DSRT Dynamic Soft Real Time Scheduler

EDF Earliest Deadline First

EGEE Enabling Grids for E-sciencE

ESII Escuela Superior de Ingeniería Informática

ETTS Execution and Transfer Time Separately

ExS Exponential Smoothing

167

168 Appendix A. Acronyms

FCFS First Come First Serve

FETE Fair Execution Time Estimation

GARA Globus Architecture for Reservation and Allocation

GarQ Grid Advanced Reservation Queue

GIS Grid Information System

GNB Grid Network Broker

GNRB Grid Network-aware Resource Broker

GRAM Grid Resource Allocation Management

GridFTP Grid File Transfer Protocol

GRIP Grid Resource Information Protocol

GRMS GridLab Resource Management System

GSI Grid Security Infrastructure

GT4 Globus Toolkit 4

G-QoSM Grid Quality of Service Management

I3A Instituto de Investigación en Informática de Albacete

I/O Input/Output

LAN Local Area Network

LHC Large Hadron Collider

LRMS Local Resource Management System

LSF Load Sharing Facility

MAPE Monitor Analyze Plan Execute

MDS Monitoring and Discovery System

MI Millions of Instructions

MIPS Millions of Instructions Per Second

NGB NAS Grid Benchmarks

169

NPB NAS Parallel Benchmarks

NRSE Network Resource Scheduling Entity

NWS Network Weather Service

OGSA Open Grid Service Architecture

OSI Open System Interconnection

P2P Peer-to-Peer

PBS Portable Batch System

QBETS Queue Bounds Estimation from Time Series

QoS Quality of Service

RC Replanning Capacity

RS Resource Sites

RT Resource Trust

SA-Layer Scheduling in Advance Layer

SDK Software Development Kit

SJF Shortest Job First

SLA Service Level Agreement

SLURM Simple Linux Utility for Resource Management

SOA Service Oriented Architecture

SOI Service Oriented Infrastructure

TCP Transport Control Protocol

TCT Total Completion Time

UCLM University of Castilla–La Mancha

UmU University of Umeå

UNED National University of Distance Education

USS Usage Statistics Service

170 Appendix A. Acronyms

VARQ Virtual Advance Reservations Queues

VIOLA Vertically Integrated Optical testbed for Large Application

VO Virtual Organization

VP Visualization Pipeline

WSAG4J WS-AGreement for Java

WSLA Web Service Level Agreement

WS-GRAM Web Service Grid Resource Allocation Management

XML Extensible Markup Language

Bibliography

[1] Schwiegelshohn, U., Badia, R.M., Bubak, M., Danelutto, M., Dustdar, S.,

Gagliardi, F., Geiger, A., Hluchy, L., Kranzlmüller, D., Laure, E., Priol,

T., Reinefeld, A., Resch, M., Reuter, A., Rienhoff, O., Rüter, T., Sloot, P.,

Talia, D., Ullmann, K., Yahyapour, R., von Voigt, G.: Perspectives on Grid

Computing. Future Generation Computer Systems 26(8) (2010) 1104 –

1115

[2] Foster, I., Kesselman, C.: The Grid 2: Blueprint for a New Computing

Infrastructure. 2 edn. Morgan Kaufmann (2003)

[3] Castillo, C., Rouskas, G.N., Harfoush, K.: Efficient resource manage-

ment using advance reservations for heterogeneous Grids. In: Proc. of the

Intl. Parallel and Distributed Processing Symposium (IPDPS), Miami, USA

(2008)

[4] Östberg, P.O., Henriksson, D., Elmroth, E.: Decentralized, Scalable, Grid

Fairshare Scheduling (FSGrid). Future Generation Computer Systems

(submitted, 2011)

[5] Foster, I., Kesselman, C., Tuecke, S.: The Anatomy of the Grid: Enabling

Scalable Virtual Organizations. International Journal of High Performance

Computing Application 15 (August 2001) 200–222

[6] CERN. LHC Computing. Web page at http://www.interactions.org/

LHC/computing/index.html (Date of last access: 16th December, 2011)

[7] Krauter, K., Buyya, R., Maheswaran, M.: A taxonomy and survey of Grid

resource management systems for distributed computing. Software – Prac-

tice & Experience 32 (2002) 135–164

171

http://www.interactions.org/LHC/computing/index.html

http://www.interactions.org/LHC/computing/index.html

172 Bibliography

[8] Al-Ali, R., Sohail, S., Rana, O., Hafid, A., von Laszewski, G., Amin, K., Jha,

S., Walker, D.: Network QoS provision for distributed Grid applications.

Intl. Journal of Simulations Systems, Science and Technology, Special

Issue on Grid Performance and Dependability 5(5) (2004) 13–28

[9] Foster, I.T.: Globus Toolkit Version 4: Software for Service-Oriented Sys-

tems. In: Proc. of the Intl. Conference on Network and Parallel Computing

(NPC), Beijing, China (2005)

[10] Vázquez, C., Huedo, E., Montero, R.S., Llorente, I.M.: Federation of Ter-

aGrid, EGEE and OSG infrastructures through a metascheduler. Future

Generation Computer Systems 26(7) (2010) 979 – 985

[11] Yeo, C.S., Buyya, R.: A taxonomy of market-based resource management

systems for utility-driven cluster computing. Software – Practice & Expe-

rience 36 (2006) 1381–1419

[12] Huedo, E., Montero, R.S., Llorente, I.M.: A modular meta-scheduling ar-

chitecture for interfacing with pre-WS and WS Grid resource management

services. Future Generation Computing Systems 23(2) (2007) 252–261

[13] Czajkowski, K., Fitzgerald, S., Foster, I., Kesselman, C.: Grid informa-

tion services for distributed resource sharing. In: Proceedings of 10th

IEEE International Symposium on High Performance Distributed Com-

puting (HPDC), San Francisco, USA (2001)

[14] Portable Batch System. Web page at http://www.openpbs.org (Date of

last access: 16th December, 2011)

[15] Litzkow, M.J., Livny, M., Mutka, M.W.: Condor - A Hunter of idle work-

stations. In: Proc. of the 8th Intl. Conference on Distributed Computer

Systems (ICDCS), San Jose, USA (1988)

[16] Gentzsch, W.: Sun Grid Engine: Towards creating a compute power Grid.

In: Proc. of the First Intl. Symposium on Cluster Computing and the Grid

(CCGrid), Brisbane, Australia (2001)

[17] Zhou, S.: LSF: Load Sharing in Large-Scale Heterogeneous Distributed

Systems. In: Proc. of the Workshop on Cluster Computing. (1992)

http://www.openpbs.org

Bibliography 173

[18] Wei, X., Ding, Z., Yuan, S., Hou, C., Li, H.: CSF4: A WSRF compliant

meta-scheduler. In: Proc. of the Intl. Conference on Grid Computing &

Applications (GCA), Las Vegas, USA (2006)

[19] Legion Project. Web page at http://legion.virginia.edu/ (Date of last

access: 16th December, 2011)

[20] Marco, C., Fabio, C., Alvise, D., Antonia, G., Francesco, G., Alessandro,

M., Moreno, M., Salvatore, M., Fabrizio, P., Luca, P., Francesco, P.: The

gLite Workload Management System. In: Proc. of the 4th Intl. Conference

on Advances in Grid and Pervasive Computing (GPC), Geneva, Switzerland

(2009)

[21] EGEE project (Enabling Grids for E-science in Europe). Web page at

http://public.eu-egee.org/ (Date of last access: 16th December,

2011)

[22] Frey, J., Tannenbaum, T., Livny, M., Foster, I., Tuecke, S.: Condor-G:

A Computation Management Agent for Multi-Institutional Grids. Cluster

Computing 5 (2002) 237–246

[23] Romberg, M.: The UNICORE Grid infrastructure. Scientific Programming

Special Issue on Grid Computing 10 (April 2002) 149–157

[24] Bank, J., Werner, F.: Heuristic algorithms for unrelated parallel machine

scheduling with a common due date, release dates, and linear earliness

and tardiness penalties. Mathematical and Computer Modelling 33(4-5)

(2001) 363 – 383

[25] Foster, I., Kesselman, C., Tsudik, G., Tuecke, S.: A security architecture

for computational Grids. In: Proc. of the 5th ACM conference on Computer

and Communications Security. (1998)

[26] Allcock, J.B.B., Bresnahan, J.: GridFTP protocol specification. In: In SGF

GridFTP Working Group Document. (2002)

[27] GridWay Project. Web page at http://www.gridway.org/ (Date of last

access: 16th December, 2011)

http://legion.virginia.edu/

http://public.eu-egee.org/

http://www.gridway.org/

174 Bibliography

[28] Distributed Systems Architecture (DSA) Research Group at Universidad

Complutense de Madrid (UCM). Web page at http://dsa-research.org/

(Date of last access: 16th December, 2011)

[29] Kurowski, K., Ludwiczak, B., Nabezyski, J., Oleksiak, A., Pukacki, J.: Dy-

namic Grid scheduling with job migration and rescheduling in the GridLab

resource management system. Scientific Programming 12(4) (2004) 263–

273

[30] Venugopal, S., Buyya, R., Winton, L.J.: A Grid service broker for schedu-

ling e-Science applications on global data Grids. Concurrency and Com-

putation: Practice and Experience 18(6) (may 2006) 685–699

[31] Merlo, A., Clematis, A., Corana, A., Gianuzzi, V.: Quality of Service on

Grid: Architectural and methodological issues. Concurrency and Compu-

tation: Practice & Experience 23 (2011) 745–766

[32] Ali, R.A., Rana, O., von Laszewski, G., Hafid, A., Amin, K., Walker, D.: A

Model for Quality-of-Service Provision in Service Oriented Architectures.

Journal of Grid and Utility Computing (2005)

[33] Foster, I., Kesselman, C., Lee, C., Lindell, B., Nahrstedt, K., Alain: A dis-

tributed resource management architecture that supports advance reser-

vations and co-allocation. In: Proc. of the Intl. Workshop on Quality of

Service, London, England (1999)

[34] Seidel, J., W aldrich, O., Ziegler, W., Wieder, P., Yahyapour, R.: Using SLA

for resource management and scheduling - A survey. Technical Report

CoreGRID TR-0096, Institute on Resource Management and Scheduling

(2007)

[35] Conejero, J., Tomás, L., Carrión, C., Caminero, B.: Differentiated QoS in

Grids supported by SLAs. In: Proc. of the 9th Intl. Workshop on Middle-

ware for Grids, Clouds and e-Science (MGC), in conjunction with the 12th

Intl. Middleware Conference, Lisbon, Portugal (2011)

[36] Buyya, R., Abramson, D., Giddy, J.: An Economy Driven Resource Man-

agement Architecture for Global Computational Power Grids. In: Proc.

of the Intl. Conference on Parallel and Distributed Processing Techniques

and Applications, (PDPTA), Las Vegas, USA (2000)

http://dsa-research.org/

Bibliography 175

[37] Sulistio, A., Cibej, U., Prasad, S.K., Buyya, R.: GarQ: An efficient schedu-

ling data structure for advance reservations of Grid resources. Int. Journal

of Parallel Emergent and Distributed Systems 24(1) (2009) 1–19

[38] Castillo, C., Rouskas, G.N., Harfoush, K.: On the Design of Online Sche-

duling Algorithms for Advance Reservations and QoS in Grids. In: Proc.

of the Intl. Parallel and Distributed Processing Symposium (IPDPS), Los

Alamitos, USA (2007)

[39] Battré, D., Hovestadt, M., Kao, O., Keller, A., Voss, K.: Planning-based

Scheduling for SLA-awareness and Grid Integration. In: Proc. of the 26th

Workshop of the UK Planing and scheduling Special Interest Group (Plan-

SIG2007), Prague, Czech Republic (2007)

[40] Adami, D., Giordana, S., Repeti, M., Coppola, M., Laforenza, D., Tonel-

lotto, N.: Design and Implementation of a Grid Network-Aware Resource

Broker. In: Proc. of the Intl. Conference on Parallel and Distributed Com-

puting and Networks, Innsbruck, Austria (2006)

[41] Xhafa, F., Abraham, A.: Computational models and heuristic methods for

Grid scheduling problems. Future Generation Computer Systems 26(4)

(2010) 608 – 621

[42] Guan, D., Cai, Z., Kong, Z.: Provision and analysis of QoS for distributed

Grid applications. In: Proc. of the 5th Intl. Conference on Wireless commu-

nications, networking and mobile computing (WiCOM). (2009) 4191–4194

[43] Chu, H.H., Nahrstedt, K.: CPU service classes for multimedia applica-

tions. In: Proc. of Intl. Conference on Multimedia Computing and Systems

(ICMCS), Florence, Italy (1999)

[44] Mateescu, G.: Extending the Portable Batch System with preemptive job

scheduling. In: SC2000: High Performance Networking and Computing,

Dallas, USA (2000)

[45] Cárdenas, C., Gagnaire, M.: Evaluation of Flow-Aware Networking (FAN)

architectures under GridFTP traffic. Future Generation Computer Sys-

tems 25(8) (2009) 895 – 903

176 Bibliography

[46] Caminero, A., Rana, O., Caminero, B., Carrión, C.: Performance eval-

uation of an autonomic network-aware metascheduler for Grids. Con-

currency and Computation: Practice and Experience 21(13) (2009) 1692–

1708

[47] Palmieri, F.: Network-aware scheduling for real-time execution support in

data-intensive optical Grids. Future Generation Computer Systems 25(7)

(2009) 794 – 803

[48] Wolski, R., Spring, N.T., Hayes, J.: The Network Weather Service: A dis-

tributed resource performance forecasting service for metacomputing. Fu-

ture Generation Computer Systems 15(5–6) (1999) 757–768

[49] Sulistio, A.: Advance Reservation and Revenue-based Resource Manage-

ment for Grid Systems. PhD thesis, Department of Computer Science and

Software Engineering, The University of Melbourne, Australia (May 2008)

[50] Sulistio, A., Schiffmann, W., Buyya, R.: Advanced reservation-based sche-

duling of task graphs on clusters. In: Proc. of the 13th Intl. Conference on

High Performance Computing (HiPC, Bangalore, India (2006)

[51] GWD-I, Global Grid Forum (GGF): Advance reservations: State of the art.

J. MacLaren (2003) http://www.ggf.org , Date of last access: 16th De-

cember, 2011.

[52] Jackson, D., Snell, Q., Clement, M.: Core Algorithms of the Maui Sche-

duler. In Feitelson, D., Rudolph, L., eds.: Job Scheduling Strategies for

Parallel Processing. Volume 2221 of Lecture Notes in Computer Science.

Springer Berlin / Heidelberg (2001) 87–102

[53] Roy, A., Sander, V.: GARA: A Uniform Quality of Service Architecture. In:

Grid Resource Management. Kluwer Academic Publishers (2003) 377–394

[54] Siddiqui, M., Villazón, A., Fahringer, T.: Grid capacity planning with

negotiation-based advance reservation for optimized QoS. In: Proc. of the

Conference on Supercomputing (SC), Tampa, USA (2006)

[55] Waldrich, O., Wieder, P., Ziegler, W.: A meta-scheduling service for co-

allocating arbitrary types of resources. In: Proc. of the 6th Intl. Conference

on Parallel Processing and Applied Mathematics (PPAM), Poznan, Poland

(2005)

http://www.ggf.org

Bibliography 177

[56] Smith, W., Foster, I., Taylor, V.: Scheduling with advanced reservations.

In: Proc. of the 14th Intl. Parallel and Distributed Processing Symposium

(IPDPS), Washington, USA (2000)

[57] Qu, C.: A Grid Advance Reservation Framework for Co-allocation and Co-

reservation Across Heterogeneous Local Resource Management Systems.

In: Proc. of 7th Intl. Conference on Parallel Processing and Applied Math-

ematics (PPAM), Gdansk, Poland (2007)

[58] Elmroth, E., Tordsson, J.: Grid resource brokering algorithms enabling

advance reservations and resource selection based on performance pre-

dictions. Future Generation Computing Systems 24(6) (2008) 585–593

[59] Singh, G., Kesselman, C., Deelman, E.: A provisioning model and its

comparison with best-effort for performance-cost optimization in Grids.

In: Proc. of the 16th Intl. symposium on High Performance Distributed

Computing (HPDC), Monterey, USA (2007)

[60] Tomás, L., Caminero, A.C., Carrión, C., Caminero, B.: Network-aware

meta-scheduling in advance with autonomous self-tuning system. Future

Generation Computer Systems 27(5) (2011) 486 – 497

[61] Brown, R.: Calendar queues: a fast 0(1) priority queue implementation

for the simulation event set problem. Communications of the ACM 31(10)

(1988) 1220–1227

[62] Brodnik, A., Nilsson, A.: A Static Data Structure for Discrete Advance

Bandwidth Reservations on the Internet. In: Proc. of Swedish National

Computer Networking Workshop (SNCNW), Stockholm, Sweden (2003)

[63] Nurmi, D., Brevik, J., Wolski, R.: QBETS: Queue Bounds Estimation from

Time Series. In: Proc. of 13th Intl. Workshop on Job Scheduling Strategies

for Parallel Processing (JSSPP), Seattle, USA (2007)

[64] Nurmi, D., Wolski, R., Brevik, J.: VARQ: Virtual Advance Reservations

for Queues. In: Proc. of 17th Intl. Symposium on High-Performance Dis-

tributed Computing (HPDC), Boston, USA (2008)

[65] Dobber, M., van der Mei, R., Koole, G.: A prediction method for job run-

times on shared processors: Survey, statistical analysis and new avenues.

Performance Evaluation 64(7-8) (2007) 755–781

178 Bibliography

[66] Dinda, P.A.: The statistical properties of host load. Scientific Programming

7(3-4) (1999) 211–229

[67] Jin, H., Shi, X., Qiang, W., Zou, D.: An adaptive meta-scheduler for data-

intensive applications. Intl. Journal of Grid and Utility Computing 1(1)

(2005) 32–37

[68] Zhang, Y., Sun, W., Inoguchi, Y.: Predict task running time in Grid envi-

ronments based on CPU load predictions. Future Generation Computing

Systems 24(6) (2008) 489–497

[69] Gehr, J., Schneider, J.: Measuring Fragmentation of Two-Dimensional

Resources Applied to Advance Reservation Grid Scheduling. In: Proc. of

the 9th Intl. Symposium on Cluster Computing and the Grid (CCGRID),

Shanghai, China (2009)

[70] Johnstone, M.S., Wilson, P.R.: The memory fragmentation problem:

solved? SPNOTICES: ACM SIGPLAN Notices 34(3) (1999) 26–36

[71] Wilson, P.R., Johnstone, M.S., Neely, M., Boles, D.: Dynamic storage

allocation: A survey and critical review. In: Proc. of the Intl. Workshop on

Memory Managment (IWMM), Kinross, UK (1995)

[72] De Assunção, M.D., Buyya, R.: Performance analysis of multiple site

resource provisioning: effects of the precision of availability information.

In: Proc. of the 15th Intl. Conference on High Performance Computing

(HiPC), Bangalore, India (2008)

[73] Figuerola, S., Ciulli, N., De Leenheer, M., Demchenko, Y., Ziegler, W.,

Binczewski, A.: Phosphorus: single-step on-demand services across

multi-domain networks for e-Science. In: Proc. of the European Confer-

ence and Exhibition on Optical Communication, Berlin, Germany (2007)

[74] Elmroth, E., Tordsson, J.: A standards-based Grid resource brokering

service supporting advance reservations, coallocation and cross-Grid in-

teroperability. Concurrency and Computation: Practice and Experience.

21(18) (2009) 2298 – 2335

[75] Caniou, Y., Charrier, G., Desprez, F.: Analysis of Tasks Reallocation in a

Dedicated Grid Environment. In: Proc. of the Intl. Conference on Cluster

Computing (CLUSTER), Heraklion, Greece (2010)

Bibliography 179

[76] Kephart, J.O., Chess, D.M.: The vision of autonomic computing. Com-

puter 36(1) (2003) 41–50

[77] Parashar, M.: Autonomic Grid Computing. Autonomic Computing – Con-

cepts, Requirements, Infrastructures, Editors: M. Parashar and S. Hariri,

CRC Press (2006)

[78] Dobson, S., Denazis, S.G., Fernández, A., Gaïti, D., Gelenbe, E., Massacci,

F., Nixon, P., Saffre, F., Schmidt, N., Zambonelli, F.: A survey of autonomic

communications. ACM TAAS 1(2) (2006) 223–259

[79] Dong, X., Hariri, S., Xue, L., Chen, H., Zhang, M., Pavuluri, S., Rao, S.:

Autonomia: an autonomic computing environment. In: Proc. of the IEEE

Intl. Conference on Performance, Computing and Communications. (2003)

[80] Liu, H., Parashar, M., Hariri, S.: A component-based programming model

for autonomic applications. In: Proc. of the Intl. Conference on Autonomic

Computing (ICAC), New York, USA (2004)

[81] Abawajy, J.H.: Autonomic Job Scheduling Policy for Grid Computing. In:

Proc. of the 5th Intl. Conference on Computational Science (ICCS), Atlanta,

USA (2005)

[82] Nou, R., Julií, F., Hogan, K., Torres, J.: A path to achieving a self-managed

Grid middleware. Future Generation Computer Systems 27(1) (2011) 10–

19

[83] Theilmann, W., Baresi, L.: Towards the Future Internet. In: Multi-

level SLAs for Harmonized Management in the Future Internet. IOS Press

(2009) 193–202

[84] Brandic, I., Music, D., Dustdar, S., Venugopal, S., Buyya, R.: Advanced

QoS Methods for Grid Workflows Based on Meta-Negotiations and SLA-

Mappings. In: Proc. of the 3rd Workshop on Work ows in Support of

Large-Scale Science, Austin, USA (2008)

[85] Ejarque, J., de Palol, M., Goiri, I.n., Julià, F., Guitart, J., Badia, R.M.,

Torres, J.: Exploiting semantics and virtualization for SLA-driven resource

allocation in service providers. Concurrency and Computation: Practice

and Experience 22(5) (2010) 541–572

180 Bibliography

[86] Andrieux, A., Czajkowski, K., Dan, A., Keahey, K., Ludwig, H., Nakata, T.,

Pruyne, J., Rofrano, J., Tuecke, S., Xu, M.: Web Services Agreement Spec-

ification (WS-Agreement). GFD-R-P.192. Technical report (October 2011)

[87] Waeldrich, O., Battré, D., Brazier, F., Clark, K., Oey, M., Papaspyrou, A.,

Wieder, P., Ziegler, W.: WS-Agreement Negotiation Version 1.0. GFD-R-

P.193. Technical report (October 2011)

[88] Lamanna, D.D., Skene, J., Emmerich, W.: SLAng: A Language for Defining

Service Level Agreements. In: Proc. of the Intl. Workshop of Future Trends

of Distributed Computing Systems, Los Alamitos, USA (2003)

[89] WSLA: Web Service Level Agreements. Web page at http://www.

research.ibm.com/wsla/ (Date of last access: 16th December, 2011)

[90] Parkin, M., Badia, R.M., Martrat, J.: A Comparison of SLA Use in Six

of the European Commissions FP6 Projects. Technical Report TR-0129,

Institute on Resource Management and Scheduling, CoreGRID - Network

of Excellence (2008)

[91] SLA at SOI. Web page at http://sla-at-soi.eu/ (Date of last access:

16th December, 2011)

[92] Battré, D., Djemame, K., Gourlay, I., Hovestadt, M., Kao, O., Padgett, J.,

Voβ, K., Warneke, D.: Assessgrid strategies for provider ranking mech-

anisms in risk-aware grid systems. In: Proc. of the 5th Intl. Workshop

on Grid Economics and Business Models (GECON), Las Palmas de Gran

Canaria, Spain (2008)

[93] EU-Brein. Web page at http://www.eu-brein.com/ (Date of last access:


[94] WSAG4J - WS-Agreement framework for Java. Web page at http://

packcs-e0.scai.fraunhofer.de/wsag4j/ (Date of last access: 16th De-

cember, 2011)

[95] Snelling, D.F., Anjomshoaa, A., Wray, F., Basermann, A., Fisher, M., Sur-

ridge, M., Wieder, P.: NextGRID Architectural Concepts. In: Proc. of the

CoreGRID Symposium, Rennes, France (2007)

http://www.research.ibm.com/wsla/

http://www.research.ibm.com/wsla/

http://sla-at-soi.eu/

http://www.eu-brein.com/

http://packcs-e0.scai.fraunhofer.de/wsag4j/

http://packcs-e0.scai.fraunhofer.de/wsag4j/

Bibliography 181

[96] Dumitrescu, C., Foster, I.: GRUBER: A Grid Resource Usage SLA Broker.

In: Proc. of the 11th Intl. Conference on Parallel Computing (Euro-Par),

Lisbon, Portugal (2005)

[97] Kay, J., Lauder, P.: A fair share scheduler. Commun. ACM 31(1) (1988)

44–55

[98] Yoo, A., Jette, M., Grondona, M.: SLURM: Simple Linux Utility for Re-

source Management. In Feitelson, D., Rudolph, L., Schwiegelshohn, U.,

eds.: Job Scheduling Strategies for Parallel Processing. Volume 2862 of

Lecture Notes in Computer Science. Springer Berlin / Heidelberg (2003)

44–60

[99] Krawczyk, S., Bubendorfer, K.: Grid resource allocation: allocation mech-

anisms and utilisation patterns. In: Proc. of the 6th Australasian work-

shop on Grid computing and e-Research (AusGrid), Darlinghurst, Aus-

tralia (2008)

[100] De Jongh, J.: Share scheduling in distributed systems. PhD thesis, Delft

Technical University (2002)

[101] Dafouli, E., Kokkinos, P., Varvarigos, E.A.: Fair Execution Time Estima-

tion Scheduling in Computational Grids. In Kacsuk, P., Lovas, R., Németh,

Z., eds.: Distributed and Parallel Systems. Springer US (2008) 93–104

[102] Doulamis, N., Varvarigos, E., Varvarigou, T.: Fair Scheduling Algorithms

in Grids. IEEE Transactions on Parallel and Distributed Systems 18

(2007) 1630–1648

[103] Austin, J., Jackson, T., Fletcher, M., Jessop, M., Cowley, P., Lobner, P.:

Predictive Maintenance: Distributed Aircraft Engine Diagnostics. In: The

Grid 2: Blueprint For A New Computing Infrastructure. Elsevier Science

(2004)

[104] Marchese, F.T., Brajkovska, N.: Fostering asynchronous collaborative vi-

sualization. In: Proc. of the 11th Intl. Conference on Information Visual-

ization, Zürich, Switzerland (2007)

[105] Kalekar, P.S.: Time series Forecasting using Holt-Winters Exponential

Smoothing. Technical report, Kanwal Rekhi School of Information Tech-

nology (2004)

182 Bibliography

[106] Fitzgerald, S., Foster, I., Kesselman, C., von Laszewski, G., Smith, W.,

Tuecke, S.: A directory service for configuring high-performance dis-

tributed computations. In: Proc. of 6th Symposium on High Performance

Distributed Computing (HPDC), Portland, USA (1997)

[107] Massie, M.L., Chun, B.N., Culler, D.E.: The Ganglia distributed monitor-

ing system: Design, implementation, and experience. Parallel Computing

30(5-6) (2004) 817–840

[108] NLANR/DAST : Iperf - The TCP/UDP Bandwidth Measurement Tool. Web

page at http://dast.nlanr.net/Projects/Iperf/ (Date of last access:


[109] Sohail, S., Pham, K.B., Nguyen, R., Jha, S.: Bandwidth Broker Imple-

mentation: Circa-Complete and Integrable. Technical report, School of

Computer Science and Engineering, The University of New South Wales

(2003)

[110] Tomás, L., Caminero, A., Caminero, B., Carrión, C.: Using network in-

formation to perform meta-scheduling in advance in Grids. In: Proc. of

the 16th Intl. Conference on Parallel Computing (Euro-Par), Ischia, Italy

(2010)

[111] Tomás, L., Caminero, A., Carrión, C., Caminero, B.: Exponential Smooth-

ing for network-aware meta-scheduler in advance in Grids. In: Proc. of

the 6th Intl. Workshop on Scheduling and Resource Management on Par-

allel and Distributed Systems (SRMPDS), in conjunction with the 39th Intl.

Conference on Parellel Processing (ICPP), San Diego, USA (2010)

[112] Tomás, L., Caminero, A., Caminero, B., Carrión, C.: Studying the Influ-

ence of Network-Aware Grid Scheduling on the Performance Received by

Users. In: Proc. of the Grid computing, high-performAnce and Distributed

Applications (GADA), Monterrey, Mexico (2008)

[113] Tomás, L., Caminero, A., Caminero, B., Carrión, C.: Improving GridWay

with Network Information: Tuning the Monitoring Tool. In: Proc. of the

High Performance Grid Computing Workshop (HPGC), held jointly with

the Intl. Parallel & Distributed Processing Symposium (IPDPS), Roma, Italy

(2009)

http://dast.nlanr.net/Projects/Iperf/

Bibliography 183

[114] Stevens, W.R.: TCP/ IP Illustrated: The Protocols. Addison-Wesley (1994)

[115] Chun, G., Dail, H., Casanova, H., Snavely, A.: Benchmark probes for Grid

assessment. In: Proc. of 18th Intl. Parallel and Distributed Processing

Symposium (IPDPS), Santa Fe, New Mexico (2004)

[116] Frumkin, M., Van der Wijngaart, R.: NAS Grid Benchmarks: a tool for

Grid space exploration. In: Proc. of 10th IEEE Intl. Symposium on High

Performance Distributed Computing. (2001)

[117] The NAS Parallel Benchmark. Web page at http://www.nas.nasa.gov/

Resources/Software/npb.html (Date of last access: 16th December,

2011)

[118] Vazquez-Poletti, J., Huedo, E., Montero, R., Llorente, I.: A Comparison

Between two Grid Scheduling Philosophies: EGEE WMS and GridWay.

Multiagent and Grid Systems 3(4) (2007)

[119] GSA-RG: Grid scheduling use cases. Technical Report GFD-I.064, Global

Grid Forum (2006)

[120] Grimme, C.: Grid metaschedulers: An overview and up-to-date solutions.

Technical report, University of Dortmund (2006)

[121] Tomás, L., Caminero, A.C., Rana, O., Carrión, C., Caminero, B.: A

GridWay-based autonomic network-aware metascheduler. Future Gener-

ation Computer Systems, In Press

[122] Kalantari, M., Akbari, M.K.: Grid performance prediction using state-

space model. Concurrency and Computation: Practice and Experience.

21(9) (2009) 1109–1130

[123] Gong, L., he Sun, X., Member, S., Watson, E.F.: Performance modeling

and prediction of non-dedicated network computing. IEEE Transactions

on Computers 51 (2002) 1041–1055

[124] Vadhiyar, S.S., Dongarra, J.J.: A Performance Oriented Migration Frame-

work For The Grid. In: Proc. of the 3rd Intl. Symposium on Cluster Com-

puting and the Grid (CCGrid), Tokyo, Japan (2003)

http://www.nas.nasa.gov/Resources/Software/npb.html

http://www.nas.nasa.gov/Resources/Software/npb.html

184 Bibliography

[125] Wieczorek, M., Siddiqui, M., Villazon, A., Prodan, R., Fahringer, T.: Apply-

ing Advance Reservation to Increase Predictability of Workflow Execution

on the Grid. In: Proc. of the 2nd Intl. Conference on e-Science and Grid

Computing (e-Science), Washington, USA (2006)

[126] Tanwir, S., Battestilli, L., Perros, H.G., Karmous-Edwards, G.: Dy-

namic scheduling of network resources with advance reservations in opti-

cal Grids. Int. Journal of Network Management 18(2) (2008) 79–105

[127] Barz, C., Martini, P., Pilz, M., Purnhagen, F.: Experiments on Network

Services for the Grid. In: Proc. of the 32nd Conference on Local Computer

Networks (LCN), Washington, USA (2007)

[128] Caminero, A., Rana, O., Caminero, B., Carrión, C.: Network-aware heuris-

tics for inter-domain meta-scheduling in Grids. Journal of Computing and

System Sciences 77(2) (2011) 262 – 281

[129] de Assunção, M.D., Buyya, R., Venugopal, S.: InterGrid: A case for inter-

networking islands of Grids. Concurrency and Computation: Practice and

Experience 20(8) (2008) 997–1024

[130] Xiong, Z., Yang, Y., Zhang, X., Zeng, M.: Grid resource aggregation in-

tegrated P2P mode. In: Proc. of the 4th Intl. Conference on Intelligent

Computing (ICIC), Shanghai, China (2008)

[131] Stefano, A.D., Morana, G., Zito, D.: A P2P strategy for QoS discovery

and SLA negotiation in Grid environment. Future Generation Computer

Systems 25(8) (2009) 862 – 875

[132] Litke, A., Konstanteli, K., Andronikou, V., Chatzis, S., Varvarigou, T.:

Managing Service Level Agreement contracts in OGSA-based Grids. Fu-

ture Generation Computer Systems 24(4) (2008) 245 – 258

[133] Conejero, J., Tomás, L., Carrión, C., Caminero, B.: A SLA-based Meta-

Scheduling in Advance System to Provide QoS in Grid Environments. In:

Proc. of the 5th Iberian Grid Infraestructure Conference (Ibergrid), San-

tander, Spain (2011)

[134] Conejero, J., Tomás, L., Carrión, C., Caminero, B.: QoS Provisioning

with Meta-Scheduling in Advance within SLA-based Grid Environments.

Computing and Informatics, In Press

Bibliography 185

[135] Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to

algorithms. McGraw-Hill Book Company, Cambridge, London (2001)

[136] Huedo, E., Montero, R.S., Llorente, I.M.: A framework for adaptive execu-

tion in Grids. Software: Practice and Experience 34(7) (2004) 631–651

[137] Dobber, M., van der Mei, R., Koole, G.: A prediction method for job run-

times on shared processors: Survey, statistical analysis and new avenues.

Performance Evaluation 64(7-8) (2007) 755 – 781

[138] The R Foundation. Web page at http://www.r-project.org/ (Date of

last access: 16th December, 2011)

[139] Tomás, L., Caminero, A., Carrión, C., Caminero, B.: Addressing Resource

Fragmentation in Grids Through Network-Aware Meta-Scheduling in Ad-

vance. In: Proc. of the 11th Intl. Symposium on Cluster, Cloud, and Grid

Computing (CCGrid), Newport Beach, USA (2011)

[140] Tomás, L., Caminero, A., Carrión, C., Caminero, B.: A Strategy to Improve

Resource Utilization in Grids Based on Network–Aware Meta-Scheduling

in Advance. In: Proc. of the 12th IEEE/ACM Intl. Conference on Grid

Computing (Grid), Lyon, France (2011)

[141] Farooq, U., Majumdar, S., Parsons, E.W.: Efficiently Scheduling Advance

Reservations in Grids. Technical report, Carleton University, Department

of Systems and Computer Engineering (2005)

[142] Tomás, L., Östberg, P.O., Carrión, C., Caminero, B., Elmroth, E.: An

Adaptable In–Advance and Fairshare Meta-Scheduling Architecture to Im-

prove Grid QoS. In: Proc. of the 12th IEEE/ACM Intl. Conference on Grid

Computing (Grid), Lyon, France (2011)

[143] Venugopal, S., Buyya, R.: An SCP-based heuristic approach for scheduling

distributed data-intensive applications on global Grids. Journal of Parallel

and Distributed Computing 68(4) (2008) 471–487

http://www.r-project.org/

186 Bibliography

improving quality of service in grids through meta

Documents