improving quality of service in grids through meta
TRANSCRIPT
Improving Quality of Service in Grids
Through Meta-Scheduling in Advance
A DISSERTATION FOR THE DEGREE OF DOCTOR OF PHILOSOPHY IN COMPUTER
SCIENCE TO BE PRESENTED WITH DUE PERMISSION OF THE COMPUTING SYSTEMS
DEPARTMENT FROM THE UNIVERSITY OF CASTILLA–LA MANCHA, FOR PUBLIC
EXAMINATION AND DEBATE
Author: Luis Tomás Bolívar
Advisors: Dr. María del Carmen Carrión Espinosa
Dr. María Blanca Caminero Herráez
Albacete, December 2011
Mejora de la Calidad de Servicio en Grids
mediante Meta-Planificación por
Adelantado
TESIS DOCTORAL
PRESENTADA AL DEPARTAMENTO DE SISTEMAS INFORMÁTICOS
DE LA UNIVERSIDAD DE CASTILLA-LA MANCHA
PARA LA OBTENCIÓN DEL TÍTULO DE
DOCTOR EN INFORMÁTICA
Autor: Luis Tomás Bolívar
Tutores: Dra. Dña. María del Carmen Carrión Espinosa
Dra. Dña. María Blanca Caminero Herráez
Albacete, Diciembre de 2011
Agradecimientos
A pesar de que estas cosas no se me dan bien, me gustaría agradecer en estas
líneas a todo aquel que ha aportado su granito de arena para que la realización
de esta Tesis sea posible.
En primer lugar me gustaría agradecer a Carmen y Blanca por todo su apoyo,
dedicación y tiempo invertido en este trabajo. Sin ellas no hubiera sido posible
la realización de esta Tesis, ya que fueron ellas las que me convencieron para
coger este camino y las que se han preocupado porque éste estuviera “allanado”.
También quisiera agradecerle a Agustín su ayuda durante todo el proceso de
elaboración de la Tesis, sobre todo en mis inicios.
Por otro lado, me gustaría hacer especial mención a mi familia, mi padre, mi
madre, mi hermana y Ainhoa. Todos ellos han contribuido a formarme, no sólo
como futuro doctor, sino como persona. Una vez más, sin ellos no sería lo que
soy, ni habría llegado hasta aquí. Mi mayor logro es teneros a mi lado. Gracias
por el apoyo incondicional que siempre me habéis dado.
También quiero aprovechar estas líneas para acordarme de todas aquellas
nuevas amistades hechas durante mi estancia en Umeå. A todos ellos me gus-
taría agradecerles su hospitalidad y ayuda. De todos ellos guardo un gran re-
cuerdo, ya que me hicieron sentir como en casa.
Finalmente, me gustaría agradecer también a todos mis colegas, amigos y
compañeros de trabajo, por todos esos ratos de risas y distracción que son más
que necesarios para desconectar y cargar las pilas.
Luis Tomás Bolívar
Diciembre de 2011, Albacete, España.
i
ii
Abstract
Grids allow the coordinated use of heterogeneous computing resources within
large-scale parallel applications in science, engineering and commerce [1]. Since
organizations sharing their resources in such a context still keep their indepen-
dence and autonomy [2], Grids are highly variable systems in which resources
may join/leave the system at any time. This variability makes Quality of Ser-
vice (QoS) highly desirable, though often very difficult to achieve in practice.
One reason for this limitation is the lack of a central entity that orchestrates the
entire system. This is especially true in the case of the network that connects
the various components of a Grid system. Thus, without resource reservations
any guarantee on QoS is often hard to achieve. However, in a real Grid system,
reservations may not be always feasible, as not all the Local Resource Manage-
ment System (LRMS) permit them. There are also other types of resources, such
as network, which may lack a global management entity, thereby making their
reservation impossible. Because of that, the provision of QoS in Grid environ-
ments is still an open issue that needs attention from the research community.
One way of contributing to the provision of QoS in Grids is by performing
meta-scheduling of jobs in advance, which means that jobs are scheduled some
time before they are actually executed. In this way, it becomes more likely that
the appropriate resources are available to run the job when needed, so that QoS
requirements of jobs are met (i.e. jobs are finished within a deadline).
The main aim of this Thesis is to develop a system capable of managing QoS in
real Grid environments. To this end, this Thesis investigates the QoS provision
to Grid users by means of efficient meta-scheduling. New scheduling metrics
have been added to a current Grid meta-scheduler and a Grid meta-scheduling
layer, named Scheduling in Advance Layer (SA-Layer), has been developed. The
iii
SA-Layer provides the meta-scheduling in advance functionality in a real Grid
environment, making possible to deal with users’ QoS requirements.
This software has been developed in an incremental way. Thus, different
modules have been added and modified to increase and improve the functional-
ity of this layer. Initially, the implementation of red-black trees as an efficient
data structure to manage resource usage information was tackled. Then, dif-
ferent prediction techniques were developed to make a proper scheduling by
performing accurate enough estimations about status of resources and job du-
rations on them. After that, two rescheduling techniques (preventive and reac-
tive techniques) were implemented to deal with low resource usage due to the
fragmentation generated at the allocation process and owing to unfavorable pre-
vious decisions. Finally, as a result of the research stay made, the integration
of the SA-Layer and a job prioritization system, named FSGrid, is addressed.
This system provides SA-Layer with the information needed to take into account
different usage policies for users, projects and virtual organizations. In this way,
different QoS levels may be provided depending on the established policies.
To sum up, this Thesis has the following contributions:
• Use network resources as a first level resource together with computing
resources to schedule jobs. Consequently, network information has been
included into a current meta-scheduler in order to make it network–aware
when scheduling jobs.
• Use of autonomic computing ideas to perform efficient meta-scheduling of
jobs to computing resources.
• Develop an architecture that performs scheduling of jobs in advance by
making predictions about future status of resources (network inclusive)
and about real durations of jobs in them.
• Use of techniques to cope with low resource usage issues due to the result-
ing fragmentation generated at the allocation process.
• Include collaboration with another system to manage different QoS levels to
the different users and virtual organizations depending on the established
policy.
Those proposals, as well as their evaluation into a real Grid environment
based on the Globus Toolkit and the GridWay meta-scheduler, have been devel-
iv
oped and their performance has been evaluated. To this end, a testbed where
experiments are carried out has been built up by using non-dedicated machines
across several administrative domains.
v
vi
Resumen
Los Grids permiten el uso coordinado, compartido y a gran escala de recur-
sos heterogéneos en aplicaciones paralelas en ciencia, ingeniería y comercio [1].
Sin embargo, las organizaciones que comparten dichos recursos siguen man-
teniendo su independencia y autonomía [2], lo que hace que estos sistemas
sean altamente variables, pudiendo los recursos entrar y salir del sistema en
cualquier momento. Dicha variabilidad hace muy difícil la provisión de calidad
de servicio (QoS) en la práctica. Una razón para esta limitación podría fun-
damentarse en la falta de una entidad central que gestione el sistema. Esto
es especialmente remarcable en el caso de la red que interconecta los diversos
componentes presentes en un entorno Grid. Por otro lado, la reserva de recursos
en un entorno Grid no siempre es posible ya que no todos los recursos propor-
cionan esta funcionalidad o no se dispone de derechos para ello. Además, hay
otros tipos de recursos que pueden no estar gestionados por una sola entidad,
como la red de interconexión, con su consiguiente dificultad para realizar una
reserva. Todo esto dificulta aún más las garantías de QoS. Es por eso que dicha
provisión de QoS en entornos Grid todavía necesita ser un punto a estudiar y
solucionar por la comunidad científica.
Una posible solución para la provisión de QoS en entornos Grid puede ser la
realización de una planificación de trabajos por adelantado. Esto significa que
un trabajo es planificado con antelación al tiempo en el cual será ejecutado. De
esta forma es más probable que el recurso apropiado para ejecutar la aplicación
esté disponible cuando se le necesite y por consiguiente la QoS requerida por el
trabajo sea cumplida (el trabajo finaliza antes que el deadline establecido por el
usuario).
El objetivo principal de esta Tesis es el desarrollo de un sistema que sea capaz
de proporcionar cierta QoS en entornos Grid reales mediante una planificación
vii
eficiente. Para ello se han incluido nuevas métricas en un meta-planificador
existente y se ha desarrollado una capa de meta-planificación sobre éste, lla-
mada SA-Layer. Dicha capa es la encargada de proporcionar la funcionalidad de
meta-planificación por adelantado, haciendo posible el manejo de los diferentes
niveles de QoS solicitados por los usuarios.
SA-Layer ha sido implementado de forma incremental, de forma que se le
han ido añadiendo y modificando diferentes módulos que incrementan y mejo-
ran su funcionalidad. Primero se ha implementado el uso de árboles rojo–negro
como estructura de datos a usar para gestionar la información sobre el uso de
los recursos de una forma eficiente. Después se han incluido diferentes técnicas
de predicción, cuyo objetivo es realizar predicciones lo suficientemente precisas
sobre el estado de los recursos y las duraciones de los trabajos en ellos. Una
vez que el proceso de planificación es lo suficientemente preciso, dos técnicas de
replanificación de tareas ya planificadas (una reactiva y otra preventiva) se han
implementado para solucionar los problemas relacionados con la fragmentación
generada en el proceso de planificación que pueden llevar a un uso de los re-
cursos bajo. Finalmente, y como resultado de la estancia realizada, se ha inte-
grado dicho sistema con otro de prioritización de trabajos, llamado FSGrid. Este
sistema proporciona a SA-Layer la información necesaria para poder tener en
cuenta las diferentes políticas de uso establecidas para los usuarios, proyectos
y organizaciones virtuales.
Más específicamente, esta Tesis presenta las siguientes contribuciones:
• Inclusión de información de red en el meta-planificador GridWay para que
éste la tenga en cuenta a la hora de elegir el recurso en el que ejecutar cada
trabajo.
• Uso de ideas sobre computación autónoma para realizar una planificación
de tareas más eficiente.
• Desarrollo de una arquitectura para la planificación de trabajos por ade-
lantado que lleva a cabo predicciones sobre el estado de los recursos (red
incluida) y sobre la duración de los trabajos en ellos.
• Implementación de técnicas para aliviar los problemas de fragmentación
durante el proceso de planificación.
viii
• Colaboración con otro sistema para gestionar diferentes QoS teniendo en
cuenta los diferentes usuarios, proyectos y organizaciones virtuales aten-
diendo a las políticas establecidas.
En esta Tesis se presentan todas estas propuestas, así como su evaluación
en un entorno Grid real basado en el Toolkit de Globus y en el meta-planificador
GridWay. Dicho entorno Grid esta formado por máquinas no dedicadas pertene-
cientes a diferentes dominios administrativos.
ix
x
Contents
1 Introduction 1
1.1 Grid computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Grid Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.2 Middleware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3 Objectives of this Thesis . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4.1 The Globus Toolkit . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4.2 The GridWay Meta-scheduler . . . . . . . . . . . . . . . . . . . 14
1.4.3 Extensions to GridWay . . . . . . . . . . . . . . . . . . . . . . 16
1.4.4 Integration with other systems . . . . . . . . . . . . . . . . . . 17
1.5 Structure of this Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2 QoS Provision and Meta-Scheduling in Grids 21
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2 Models for addressing Quality of Service in Grids . . . . . . . . . . . 22
2.2.1 Best Effort Model . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2.2 QoS Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2.3 Economic Model . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3 Proposals for Addressing Quality of Service in Grids . . . . . . . . . 24
2.3.1 Scheduling Techniques . . . . . . . . . . . . . . . . . . . . . . 25
2.3.2 Advance Reservation . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3.3 Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.3.4 Prediction Techniques . . . . . . . . . . . . . . . . . . . . . . . 29
2.3.5 Fragmentation Problems . . . . . . . . . . . . . . . . . . . . . 30
2.3.6 Co-allocation and Rescheduling Techniques . . . . . . . . . . 31
xi
2.3.7 Autonomic Computing . . . . . . . . . . . . . . . . . . . . . . . 32
2.3.8 Service Level Agreements . . . . . . . . . . . . . . . . . . . . . 33
2.3.9 Fairness Resource Usage . . . . . . . . . . . . . . . . . . . . . 35
2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3 Including Metrics to Improve QoS at the Meta-Scheduling Level 39
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2 Autonomic Network–aware Meta-scheduling (ANM) . . . . . . . . . . 41
3.3 Implementation of ANM . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.3.1 Extending GridWay to be network–aware . . . . . . . . . . . . 45
3.3.2 Autonomic scheduler . . . . . . . . . . . . . . . . . . . . . . . 46
3.3.3 Predicting and tuning resource performance . . . . . . . . . . 50
3.4 Experiments and results . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.4.1 Experiment Testbed . . . . . . . . . . . . . . . . . . . . . . . . 53
3.4.2 Workload . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.4.3 Performance evaluation . . . . . . . . . . . . . . . . . . . . . . 56
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4 Adding Support for Meta-Scheduling in Advance: The SA-Layer 65
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.2 Network–aware meta-scheduling in advance . . . . . . . . . . . . . . 68
4.3 Meta-Scheduling in advance implementation . . . . . . . . . . . . . 73
4.3.1 Gap Management . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.3.2 Data Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.3.3 Job Migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.3.4 Predictor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.4 Prediction Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.4.1 TCT Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.4.2 ETTS Technique . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.4.3 RT Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.4.4 ExS Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.4.5 Exponential smoothing predictions . . . . . . . . . . . . . . . 87
4.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.5.1 Testbed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.5.2 Workload . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.5.3 Experiments and results . . . . . . . . . . . . . . . . . . . . . 93
xii
4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5 Optimizing Resource Utilization through Rescheduling Techniques 99
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.2 Scheduling Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.3 Tackling fragmentation . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.3.1 Reactive techniques: Replanning Capacity (RC) . . . . . . . . 105
5.3.2 Preventive techniques: Bag of Task Rescheduling (BoT-R) . . 107
5.4 Fragmentation metrics . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.4.1 Trigger Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.4.2 Filter Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
5.5.1 Testbed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
5.5.2 Workload . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
5.5.3 Rescheduling techniques . . . . . . . . . . . . . . . . . . . . . 119
5.5.4 Fragmentation metrics . . . . . . . . . . . . . . . . . . . . . . 123
5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
6 Improving Grid QoS by means of Adaptable Fair Share Scheduling 131
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
6.2 Improving end-user QoS: Sample Scenario . . . . . . . . . . . . . . . 133
6.3 FSGrid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
6.4 Integrated Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 141
6.5 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 142
6.5.1 Testbed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
6.5.2 Workload . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
6.5.3 FSGrid Convergence Rate . . . . . . . . . . . . . . . . . . . . . 143
6.5.4 Quality of Service . . . . . . . . . . . . . . . . . . . . . . . . . 145
6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
7 Conclusions, Contributions and Future Work 149
7.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
7.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
7.3 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
7.3.1 Journal papers . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
7.3.2 International conference papers . . . . . . . . . . . . . . . . . 154
xiii
7.3.3 National conference papers . . . . . . . . . . . . . . . . . . . . 157
7.3.4 Technical reports . . . . . . . . . . . . . . . . . . . . . . . . . . 158
7.3.5 Submitted works . . . . . . . . . . . . . . . . . . . . . . . . . . 158
7.3.6 Related contributions . . . . . . . . . . . . . . . . . . . . . . . 159
7.3.7 Additional contributions . . . . . . . . . . . . . . . . . . . . . 161
7.4 Funds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
7.4.1 National projects . . . . . . . . . . . . . . . . . . . . . . . . . . 162
7.4.2 Regional projects . . . . . . . . . . . . . . . . . . . . . . . . . . 163
7.5 Collaborations with other research groups . . . . . . . . . . . . . . . 164
7.6 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
A Acronyms 167
Bibliography 171
xiv
List of Figures
1.1 Grid layers model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Workflow sample. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Core Services of Globus. . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4 GridWay usage model. . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.1 Example scenario. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.2 Conceptual view of the extensions introduced to GridWay . . . . . . 46
3.3 Autonomic control loop for adapting the TOLERANCE parameter.
The “X" in tX_real and tX_estimate refers to the set {net, cpu} . . . . . . 51
3.4 Grid testbed topology. . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.5 Visualization Pipeline (VP) test. . . . . . . . . . . . . . . . . . . . . . 56
3.6 3node test Average Time. . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.7 3node test. QoS not fulfilled. . . . . . . . . . . . . . . . . . . . . . . . 60
3.8 VP Test Average Completion Time. . . . . . . . . . . . . . . . . . . . 61
3.9 Resource Usage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.1 Meta-Scheduling in Advance Process . . . . . . . . . . . . . . . . . . 70
4.2 Scheduling Order. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.3 The Scheduler in Advance Layer (SA-Layer). . . . . . . . . . . . . . . 74
4.4 Idle periods regions [3]. . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.5 Example of a red–black tree. . . . . . . . . . . . . . . . . . . . . . . . 78
4.6 Workload characteristic. . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.7 Comparison of the different estimation techniques from the Users’
viewpoint. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.8 QoS not fulfilled per accepted Job. . . . . . . . . . . . . . . . . . . . 95
4.9 Comparison of the different estimation techniques from the Sys-
tem’s viewpoint. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
xv
5.1 The Scheduler in Advance Layer (SA-Layer). . . . . . . . . . . . . . . 104
5.2 Grid testbed topology. . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
5.3 Comparison between the scheduling techniques for Workload 1. . . 120
5.4 Comparison between the scheduling techniques for Workload 2. . . 121
5.5 Percentage of Rejected Jobs for Workload 3. . . . . . . . . . . . . . . 124
5.6 Relationship among checked, submitted and canceled rescheduling
for Workload 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
5.7 Resources Usage without fragmentation metrics for Workload 3. . . 126
5.8 Resources Usage when using fragmentation metrics (BoT) for Work-
load 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
6.1 Scheduling Process using SA-Layer. . . . . . . . . . . . . . . . . . . . 134
6.2 Meta-Scheduling in Advance Process. . . . . . . . . . . . . . . . . . . 135
6.3 Scheduling Process with SA-Layer and FSGrid integrated. . . . . . 137
6.4 FSGrid Architecture [4]. . . . . . . . . . . . . . . . . . . . . . . . . . . 138
6.5 A FSGrid policy tree. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
6.6 SA-Layer and FSGrid systems integrated. . . . . . . . . . . . . . . . 142
6.7 Grid testbed topology. . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
6.8 FSGrid convergence rates for an isolated policy tree subgroup. . . . 144
6.9 Failure rates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
xvi
List of Tables
3.1 Characteristics of the resources . . . . . . . . . . . . . . . . . . . . . 55
3.2 Percentage of resource usage by 3node tests. . . . . . . . . . . . . . 59
3.3 Percentage of improvement by using the autonomic implementation
with ExS (ANM-ExS). . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.1 Combination of Fragmentation Metrics. . . . . . . . . . . . . . . . . 123
xvii
xviii
List of Algorithms
1 Resource selection algorithm used in GridWay . . . . . . . . . . . . 45
2 CAC algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3 Scheduling algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4 Estimation of Execution Time (tricpu_estimated(j)) . . . . . . . . . . . . . 53
5 Total Completion Time (TCT) . . . . . . . . . . . . . . . . . . . . . . . 83
6 Execution and Transfer Times Separately (ETTS) . . . . . . . . . . . 84
7 Estimation of execution time (ExecT_Estimation) . . . . . . . . . . . 84
8 Estimation of transfer time (TransT_Estimation) . . . . . . . . . . . 84
9 ETTS extended with Resource Trust (RT) . . . . . . . . . . . . . . . . 86
10 Estimation of Execution Time (ExS Estimation) . . . . . . . . . . . . 87
11 Replanning Capacity Algorithm . . . . . . . . . . . . . . . . . . . . . 106
12 BoT-R Trigger executed every L period . . . . . . . . . . . . . . . . . 109
13 BoT-R Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
xix
xx
CHAPTER
1Introduction
This chapter introduces the topic of interest of this Thesis, the management of
Quality of Service (QoS) in Grids. It presents the objectives to be accomplished,
the methodology used, and finally, the structure of the Thesis.
1.1 Grid computing
The term Grid emerged to denote a highly distributed computing infrastructure
for advance science and engineering [5] which aims at providing computational
power in the same way as power grid does. Consequently, computational Grids
try to provide users with reliable, pervasive and low-cost access to computational
power.
The main difference between Grid computing and the previous distributed
computing techniques is that Grid computing is focused on the resource sharing
across different organizations (worldwide). The resources that can be integrated
within a Grid are, for example, clusters, storage, or networks, but also other
engineering equipment such as telescopes. Consequently, these resources are
heterogeneous, dispersed and may join or leave the Grid dynamically at any
moment [2]. This heterogeneity makes the Grid able to handle a large variety of
different applications.
1
2 Chapter 1. Introduction
The first research motivation was to provide the technologies needed to en-
able the resource sharing since their development was driven by the need of the
scientific community to collaborate over the network. They needed sharing re-
sources such as large data sets, computational resources, software or scientific
instrumental devices (e.g., telescopes). Since then, a wide variety of scientific
applications were developed to leverage the aggregate capability of resources,
and have rapidly increased in complexity and size. These applications are char-
acterized by high needs of data sharing and/or computing power. Due to this
fact, Grid computing emerged as the next generation of parallel and distributed
computing methodology. A well-known example of such applications is the Grid-
based worldwide data-processing infrastructure deployed for the Large Hadron
Collider (LHC) experiments at CERN [6]. This experiment is generating around
10 PB of date per year which have to be processed (some of them several times)
and stored in different data centers around the world.
Thanks to this adoption by the scientific community, Grids have become an
essential infrastructure for resource–intensive scientific and commercial appli-
cations [2] [7] as they enable the sharing and dynamic allocation of distributed
and high-performance computational resources. The associated ownership and
operating costs are reduced and hence flexibility and collaboration among di-
verse organizations are promoted.
In a formal definition, we could say that a Grid is a flexible, secure and coor-
dinated way of sharing resources amongst different organizations, institutions
and companies, building which is known as Virtual Organization (VO). Here is
where the specific and real problem of the Grid concept lies, the coordinated
sharing of resources across dynamic and multi–institutional VOs [5] when prob-
lems resolution.
A Grid environment is created to control the needed resources. The usage of
these resources (CPU cycles, data, software programs, . . . ) is usually character-
ized by its availability outside of its local administrative domain. This means the
creation of a new administrative domain, with different policies, which build a
VO. The efficient usage of these resources is the main aim of the Grid technol-
ogy. However, they are scattered and heterogeneous, and have to be operated
together as a system. Apart from that, they need to be available the most time
and giving the high level performance that the applications need.
1.1. Grid computing 3
Nowadays, apart from making possible the usage of heterogeneous resources
dispersed worldwide, the Grid must provide some Quality of Service (QoS) as Grid
users may require it. This QoS could be measured in terms of response time,
number of jobs finished per unit time [8], jobs finished within its deadlines or
fairshare resources usage. However, the variability of Grid systems makes QoS
highly desirable, though very complex to achieve due to the large scale of inter-
connected networks. Providing end–to–end QoS is really difficult without making
resources reservations. However, these reservations are not always possible into
a Grid environment since not all the resources have this functionality or permit
them. Even there are other types of resources, such as networks, which may be
scattered across several administrative domains, making their reservation (e.g.,
regarding bandwidth capability) not feasible.
1.1.1 Grid Architecture
The architecture of Grid systems [5] is defined with the aim of providing a set
of entities and a nomenclature which may be useful to properly define each ele-
ment in the system, clarifying the functionality of each one and the relationships
amongst them. With the objective of defining a comprehensible and coherent ar-
chitecture it is firstly needed to identify the services that every Grid system will
need as well as their main properties and characteristics. Furthermore, the
protocols that a Grid needs to make possible the communication among the
different elements must be taken into account.
Interoperability [5] is a cornerstone in Grid environments. This means hav-
ing common protocols being the Grid architecture mainly a protocol architec-
ture. They define the basic mechanisms used by the VO users to deal with the
resources. So, the architecture must also be defined by using the maximum
standardization possible.
On the other hand, an open architecture based on standards facilitates the
interoperability, portability and code sharing. Thus, standard protocols make
easier to define standard services which provide better capabilities. For instance,
Application Programming Interfaces (APIs) and Software Development Kits (SDKs)
can be implemented to provide the programming abstractions needed to build
an usable Grid. This technologies and architectures are called “middlewares”.
4 Chapter 1. Introduction
Figure 1.1. Grid layers model.
Hence, the global architecture of the system may be split into different pieces
depending on the different levels where each component works. This leads to
a layer architecture model, as Figure 1.1 depicts, which may be compared with
the Open System Interconnection (OSI) model. The main details of each layer are
explained next.
Fabric
In the lowest level there are services regarding local resources control. This layer
is called Fabric or infrastructure layer and may be related with the data link
layer of the OSI model. This layer provides shared access to resources through
Grid protocols. For instance, computational resources, data centers, network
resources, sensors, laboratory instruments, etc. A resource may also be a logic
entity, like a distributed file system.
Connectivity
The next level is the Connectivity layer, whose main aim is providing the meth-
ods and communication protocols among resources of the previous layer. Thus,
this layer defines core communication and authentication protocols required for
Grid-specific network transactions. Communication protocols enable the ex-
change of data among Fabric layer resources. Authentication protocols are built
on communication services to provide cryptographically secure mechanisms to
verify the identity of users and resources.
1.1. Grid computing 5
Resource
On top of the Connectivity layer, communication and authentication protocols
are built by the Resource layer with the aim of defining protocols (and APIs and
SDKs) for the secure negotiation, initiation, monitoring, control, accounting, and
payment of sharing operations on individual resources. This way, the Resource
layer calls functions of the Fabric layer to access and control local resources.
It must be noted that the Resource layer protocols are concerned entirely with
individual resources and hence ignore issues of global state and atomic actions
across distributed collections. In this layer is where low level middleware works
as, for instance, Globus [9]. The next layer is developed to address the issues
related to manage resources as a group.
Collective
The last layer is called Collective or Resources layer and it is focused on the
coordination of the multiple accessible resources from the lower layer instead of
on interactions with a single resource. In this way, the Collective layer contains
the protocols and services (APIs and SDKs) that are not associated with any
specific resource but rather are global in nature and capture interactions across
collections of resources. High level middleware works in this layer, such as
meta-schedulers like GridWay [10].
Applications
On the other hand, another important aspect when working in Grid environ-
ments are the applications itself and their classification. For instance, Buyya [11]
makes an exhaustive classification of jobs by splitting it into simple jobs, and
workflows. The simple jobs means the ones which just make a simple task. The
workflows are made by many dependent or independent jobs. The independent
multijobs are those which are made of several simple jobs that may be processed
in parallel. Finally, the dependent multijobs are made of several jobs where some
of them have dependences on others. For instance, some jobs may need the data
that other jobs have to generate. One example of this kind of task is depicted in
Figure 1.2, where some jobs have to wait until others complete their executions.
6 Chapter 1. Introduction
Figure 1.2. Workflow sample.
1.1.2 Middleware
Grids are highly variable and heterogeneous systems where searching and using
the resources is a hard task for users. This is because users are in different
domains and under different access policies from the resources where their jobs
are executed. Moreover, in large-scale Grids, with many potentially available
resources, this process is not manually feasible. Hence, the Grid infrastructure
must provide the needed services for automatic resource brokerage which take
care of the resource selection and negotiation process [1]. This infrastructure is
named meta-scheduler [12] and hides this process from the user. So, the user’s
experience of the Grid is determined by the functionality and performance of
these meta-scheduler systems.
A Grid Middleware is a connectivity software which provides a group of ser-
vices that enable the possibility of executing distributed applications over het-
erogeneous platforms. Thus, middlewares avoid the complexity when manag-
ing heterogeneous and distributed resources over different communication net-
works from the user point of view.
An important management function is task scheduling [9], which may be
defined as the process that takes the decisions related to resources of multiple
administrative domains. In the case of Grid systems, task scheduling has two
objectives. One objective is the efficient use of resources, similar with schedulers
found in traditional operating systems. The second objective, not less important,
is related to the VO concept and aims to respond to the requirements stated by
1.1. Grid computing 7
the users in a VO concerning the performance of tasks execution, such as the
response time. This is why the scheduling function has been split into two
components: one which is closer to the resources (the local scheduler), and a
second one (the meta-scheduler) closer to the application.
Scheduling in distributed systems has been significantly improved owing to
the innovations proposed in Grid systems and VO management. They try to
find a suitable computing resource where each job will be executed – this be-
ing called meta-scheduling. Basically, a Grid middleware is the software layer
which provides access to the resources shared amongst the Grid constellation,
converting a radically heterogeneous system into another one virtually homoge-
neous. The main characteristic of Grid meta-schedulers is that they do not own
the resources, and because of that, they cannot directly manage the resources
of a specific site since they do not have control over them.
Grid meta-schedulers have to take decisions based on the actual information
they have or can obtain. They may have information about the job to be exe-
cuted, its characteristics and requirements, as well as some other information
helping the meta-scheduler to take better decisions, such as the users prefer-
ences. On the other hand, it is also needed information regarding the Grid itself:
about the actual status, capacity, . . . . They may help in order to perform an
efficient scheduling. Usually, the meta-schedulers obtain information from the
Grid Information System (GIS) [13], which is in charge of gathering information
about the individual resources.
Three different steps may be identified within the scheduling process [9]:
1. Resource discovery: obtains the list of available resources.
2. Information gathering: obtains information about available resources and
chooses a suitable one (ideally, the best one).
3. Job execution: sends files needed for the execution of the job, executes the
job, cleans up the temporally files and recovery the output files generated.
In a Grid environment, users usually send their jobs through a meta-sche-
duler by providing a job template. From that point, the meta-scheduler is in
charge of interacting with the Grid to make the execution of the job possible.
This means that this entity is the one which selects the resource/s to execute
8 Chapter 1. Introduction
the application, sends the input data, deals with security issues (authentication
and authorization), monitors the job execution and its possible migrations.
Thus, meta-schedulers are a very important field in the Grid computation, as
they are in charge of providing the QoS to end users. This QoS may be provided
through a good resource selection for each application with the aim of reducing
the time needed to execute the job, improving the throughput, or ensuring their
time constraints. A good scheduling not only makes possible a better QoS from
the users point of view, but also better usage of resources, as well as allowing
the possibility of executing a greater number of jobs. Therefore, a good meta-
scheduler should adjust its scheduling strategies depending on the changing
status of the system and the kind of jobs to be scheduled.
However, scheduling in Grid systems is very complicated. The resource het-
erogeneity, the size and number of tasks, the variety of policies, and the high
number of constraints are some of the main characteristics that contribute to
this complexity. The design of scheduling algorithms for a heterogeneous com-
puting system interconnected with an arbitrary communication network is one
of the actual concerns in distributed system research. A large number of tools
are available for local scheduling: Portable Batch System (PBS) [14], Condor [15],
Sun Grid Engine [16], and Load Sharing Facility (LSF) [17]. These tools are in-
cluded in the category of centralized schedulers. Instead, the meta-schedulers
are the subject of projects under development, like GridWay [10] and Globus
CSF [18]. A problem that must be solved for this type of scheduling is the scal-
ability. This aspect is more important in the context of heterogeneous systems
(that require a simultaneous management of multiple clusters, amongst others)
and of the diversity of middleware tools.
Grid middlewares may be classified depending on the services that they pro-
vide and which layer they work on. Some of the well known middlewares used
at Grid environments are:
• Globus [9]: The Globus R©Toolkit is an open source software toolkit used
for building grids. It is being developed by the Globus Alliance and many
others all over the world. It is the “de facto” standard due to its adoption
by the scientific community.
1.1. Grid computing 9
• Legion [19]: An open specification and prototype for a worldwide virtual
computer. It is a research project homed at the University of Virginia (USA).
• GLite [20]: gLite provides a framework for building grid applications tapping
into the power of distributed computing and storage resources across the
Internet. Born from the collaborative efforts of more than 80 people in 12
different academic and industrial research centers as part of the Enabling
Grids for E-sciencE (EGEE) Project [21].
• Condor-G [22]: Condor-G provides the grid computing community with a
powerful, full-featured task broker. Used as a front-end to a computa-
tional grid, Condor-G can manage thousands of jobs destined to run at
distributed sites. It provides job monitoring, logging, notification, policy
enforcement, fault tolerance, credential management, and it can handle
complex job-interdependencies.
• GridWay [10]: It is a meta-scheduler that enables large-scale, reliable and
efficient sharing of computing resources, within a single organization or
scattered across several administrative domains. Moreover, GridWay sup-
ports most of existing Grid middlewares. More details in 1.4.2.
• Unicore [23]: Uniform Interface to Computing Resources offers a ready-
to-run Grid system including client and server software. UNICORE makes
distributed computing and data resources available in a seamless and se-
cure way in intranets and the internet.
The optimization of the scheduling process for Grid systems tries to provide
better solutions for the selection and allocation of resources to current tasks.
The scheduling optimization is very important because it is a main building
block for making Grids more available to user communities. Moreover, QoS is
a requirement for many Grid applications. The optimization methods for Grid
scheduling are the main subject of this thesis. As the scheduling problem is
NP-Hard [24], approximation algorithms are considered which are expected to
quickly offer a solution, even if it is only near–to–optimal.
10 Chapter 1. Introduction
1.2 Motivations
There are several research projects whose main aim is to manage QoS in Grid
systems – the main ones are reviewed in Chapter 2. However, most of them have
a common drawback. Usually, they do not take into account the interconnec-
tion network that links the computational resources when taking the scheduling
decisions. They are focused on reservation of resources, such as processing,
storage or even network resources, but not on the mapping of jobs to computing
resources taking into account the network status.
Since they usually make their meta-scheduling decisions focusing just on the
computing power (and utilization) of the available resources, the meta-scheduler
might decide that the most suitable computing resource to run a user’s job is the
most powerful one. However, if its interconnecting network is overloaded, then
other less powerful computing resource with a less loaded network connection
might be more suitable to run that job. In this case, this less powerful comput-
ing resource would be a better choice. Hence, the network is a key requirement
when managing QoS in Grids in the meta-scheduling of jobs to computing re-
sources – the process of finding a suitable computing resource to execute each
job. A non-efficient meta-scheduling can lead to bad performance. Thus, the
aim of this Thesis is to improve the QoS in Grids by means of efficient meta-
scheduling which considers the network as a key parameter.
On the other hand, and even more important than that, physical resource
reservations are not always feasible as not all the resources permit them or they
do not provide this functionality. Apart from that, there are resources which can-
not be reserved as they belong to other administrative domains, or even there are
other types of resources, such as network, which may be scattered across sev-
eral administrative domains, making the reservation of their capabilities (such
as bandwidth) rather difficult, if not impossible.
Hence, the proposed meta-scheduling system is based on meta-scheduling
in advance decision instead of reservations in advance. This means that the
system selects the resource and the time period to execute a job some time
before the job is actually executed, but without making any physical reservation
of the resources. So, the system needs to estimate the future status of the Grid
resources (network inclusive), the duration of jobs into the resources at some
1.3. Objectives of this Thesis 11
point into the future (making different estimations for the network transfers and
the executions itself) and to keep a trace of the previous scheduling decisions in
order not to overlap future executions into resources.
In a nutshell, this Thesis is based on improving QoS through meta-scheduling
in advance of jobs. To do that, all the challenges that appear when performing
this kind of scheduling are addressed.
1.3 Objectives of this Thesis
The main objective of this Thesis is to equip a Grid system with all the functional-
ity needed to deliver QoS to Grid users. To this end, it is important that proposals
are network-aware, since the features and behavior of the underlying intercon-
nection network should be taken into account when making meta-scheduling
decisions. Developing an entity in charge of deal with the user QoS require-
ments is also essential. Hence, this entity manages the time constraints of jobs
and tries to ensure they are going to fulfill them when taking scheduling deci-
sions.
The study has been carried out by using a real Grid tested. An additional
objective was hence setting up the environment and maintaining it. We have
chosen to carry out our research over real Grid environments in order to develop
an open–source middleware for the Grid community (Scheduling in Advance
Layer (SA-Layer), http://www.i3a.uclm.es/raap/gridcloud/SA-Layer). In
this way, the evaluation results consider natural behavior, the heterogeneity and
the dynamism of Grid resources, which, otherwise, would be rather difficult to
emulate.
These global objectives can be divided into the following partial objectives:
• Setting up and maintaining a Grid environment by using Globus and Grid-
Way as low and high level middlewares, respectively.
• Compilation of literature approaches aimed at the provision of QoS in Grids,
as well as identification of their weak points that need to be solved. Gather-
ing some other approaches similar to our proposals or related to the tech-
niques used to overcome the challenges found.
12 Chapter 1. Introduction
• Modification of an existing meta-scheduler to make it network-aware. Study
of other metrics that could improve the scheduling decisions – the mapping
between jobs and resources.
• Development of meta-scheduling in advance proposals for improving QoS
provision, such as efficient algorithms to select a suitable resource to exe-
cute a job or implementing efficient data structures to store the information
about previous scheduling decisions.
• Development of prediction techniques to estimate the future status of re-
sources and interconnection networks.
• Development of heuristics to estimate the time needed to complete the ex-
ecution of a job in a resource at one specific time in the future.
• Development of techniques to deal with the problems found at the meta-
scheduling in advance process. Mainly job rejections due to fragmentation
and/or unfavorable previous decisions.
• Addition of different levels of QoS depending on specific resource usage
policies for the users, projects and virtual organizations.
1.4 Methodology
Research on the architecture for addressing QoS in Grids has been carried out
on a real Grid environment. To build that Grid environment we have used two
different middlewares which work at different layers. Globus Toolkit 4 (GT4) [9]
has been used as a low level middleware, which is the de facto standard regard-
ing this layer. On top of that, we have installed the GridWay meta-scheduler [10],
which works in the collection layer –high level middleware.
Once the Grid environment was built based on those middlewares, we de-
cided to extend the GridWay meta-scheduler to make it network-aware and with
the aim of improving its functionality and QoS provision. Then, we have im-
plemented another layer (named SA-Layer) on top of GridWay to deal with QoS
issues – job times constraints. This layer provides the functionality needed to
perform scheduling in advance decisions without physical reservations of re-
sources. Finally, we have increased the QoS provided by SA-Layer through the
1.4. Methodology 13
communication with another existing system, named FSGrid [4]. This system
provides SA-Layer with the information needed to manage different levels of QoS
to the different users, depending on the established policy.
Next, a more in depth explanation about the Globus and GridWay middle-
wares is presented as they are the softwares over which this Thesis has been
developed. Also, a brief explanation about the methodology used for developing
these proposals is presented, namely, the addition of new functionality into and
over GridWay and the interconnection with the FSGrid scheduler.
1.4.1 The Globus Toolkit
In 1995, during the conference SuperComputing’95, it was demonstrated that it
is possible to execute several distributed applications from different fields among
17 United Stated centers connected through a high speed network of 155 Mbps.
That experiment was called I-Way and it supposed the starting point of several
projects whose main aim were the distributed computational resources shar-
ing [5]. From this moment, the book “The Grid: Blueprint for a New Computing
Infrastructure”, written by Ian Foster and Carl Kesselman, was the first step to
establish the main ideas of how this new technology should be carried out.
The Globus Toolkit [9] emerged from these ideas. It is an open source project
developed at the Argonne National Laboratory and leaded by Ian Foster with the
collaboration of Carl Kesselman’s group from the University of Southern Califor-
nia. Globus is a basic software to construct a computational Grid. Thanks to
its evolution and adoption by the scientific community, Globus has become the
standard “de facto” in Grid technologies.
This toolkit is basically a group of services and software libraries which deal
with the fundamental issues related to security, resources access, resource
management, data movement, resources discovery, and so forth. So, the Globus
Toolkit was built to remove barriers that prevent collaborations among different
organizations or institutions. Its core services, interfaces and protocols let the
users a remote resources access in the same way as if they were into their own
building but maintaining the local control about who and when can use those
resources.
14 Chapter 1. Introduction
Figure 1.3. Core Services of Globus.
In a nutshell, Globus is not a resource intermediate or broker, neither a user
or application tool, but it is a group of libraries, services, commands and APIs
which build a low level middleware that makes possible to share resources lo-
cated into different administrative domains and under different security policies.
The core services offered by Globus are (see figure 1.3):
• Security: Grid Security Infrastructure (GSI [25]).
• Resource Management: Grid Resource Allocation Management (GRAM and
WS-GRAM [9]).
• Information Services: Grid Resource Information Protocol (GRIP [13]).
• Data Transfers: Grid File Transfer Protocol (GridFTP [26]).
1.4.2 The GridWay Meta-scheduler
The GridWay project [27] started at September 2002. It is a high level middle-
ware, which may use Globus or gLite [20] as a low level middleware, among
others. Due to that fact, GridWay can be used in every infrastructure based
on Globus. The GridWay project is being developed by the Distributed Systems
Architecture Research Group at Complutense University of Madrid [28].
The GridWay meta-scheduler [10] enables large-scale, reliable and efficient
sharing of computing resources: clusters, supercomputers, stand-alone servers,
etc.. It supports different Local Resource Management System (LRMS) (e.g., PBS,
1.4. Methodology 15
SGE, LSF, Condor, . . . ) within a single organization or scattered across several
administrative domains. It also provides a single point of access to all resources
in an organization, from in-house systems to Grid infrastructures and Cloud
providers. As a result, GridWay can be used on main production Grid infras-
tructures and it can dynamically access to Cloud resources.
The first version of this meta-scheduler was developed to research in adaptive
and dynamic schedulers and it was only distributed by request and in a binary
format. The first open source version (GridWay 4.0) and its web site project were
presented on February 2005. The last version, GridWay 5.8, is the result of the
knowledge and experience gained through years of research and development,
and due to the community of users.
Nowadays, there are a big number of commercial and open source broker
systems, each one of them with different computational infrastructures under
them and with different execution profiles. However, GridWay stands out over
the rest due to the fact that it has been designed to work over Globus services,
providing a high functionality and reliability in this kind of infrastructures. As
Figure 1.4 depicts, GridWay over Globus provides a decoupling between applica-
tions and the bottom layer of local management systems. In this figure it must
be observed that users send their jobs through GridWay, which is in charge of
managing and mapping them into the computational resources, by using the
Globus middleware. Finally, GridWay returns the result of their executions.
Consequently, the GridWay framework is a component for meta-scheduling into
a Grid environment addressed to final users and Grid applications developers.
GridWay carries out all the scheduling and execution steps in a transparent
way, and it adapts the execution to the changing behavior of the Grids. To do
that, GridWay provides mechanisms for fail recovery, dynamic scheduling, on
demand job migration and the opportunistic one.
As far as job submission concerns, the users that want to send jobs to the
Grid by using the GridWay meta-scheduler have to generate a job template. This
template includes the needed information to execute the jobs, such as input and
output files names and locations, executables, as well as other management pa-
rameters related to the scheduling process, the performance, the fail tolerance,
and so forh.
16 Chapter 1. Introduction
Figure 1.4. GridWay usage model.
1.4.3 Extensions to GridWay
The QoS framework presented in this Thesis has been implemented on top of
GridWay, although some of the capabilities have been implemented within the
GridWay itself.
The GridWay extension is related to make it network-aware. To this end, two
new techniques to choose the resources are included, which are explained in
Chapter 3. The GridWay meta-scheduler has been chosen among others (such
as [18] [29] [30]), for several reasons: (1) the availability of its source code; (2) it
has a modular structure that allows the easy addition of new criteria to perform
the filtering and sorting of the candidate resources. This is of great aid when
trying to tailor these criteria to some other needs not initially considered by the
GridWay developers.
On the other hand, the implementation made on top of GridWay is in charge
of the meta-scheduling in advance decisions. This is a modular framework be-
tween GridWay and the users, which have been implemented in a incremen-
tal way, providing first the techniques needed for scheduling jobs in the future
without physical reservations. Then, the heuristics needed for predicting the
future status of resources and the duration of jobs into them were implemented
(Chapter 4). Ultimately, the techniques needed to deal with fragmentation and
1.5. Structure of this Thesis 17
unfavorable previous decisions in order to improve the resources usage were
developed – presented in Chapter 5.
1.4.4 Integration with other systems
The next step is the integration of the above development with other existing
systems with the aim of managing more advanced QoS. To this end, the FS-
Grid system (detailed in Chapter 6) is used, which provides our system with a
fairshare resource usage. When both systems work together, it is possible to
address QoS not only in term of jobs finished within their deadlines, but also
depending on the previous and current resource usage made by users. In this
way, it is possible to provide different QoS levels to the different users (or VOs)
depending on the policy established, which may be also changed dynamically.
1.5 Structure of this Thesis
In order to cover the points mentioned in Section 1.3, this Thesis has been
structured as follows:
• Chapter 1: The introduction chapter briefly describes the topic of interest
of the Thesis. Motivation, objectives, and organization of the document are
also described in this chapter.
• Chapter 2: The provision of QoS in Grids by means of efficient meta-
scheduling in advance is the topic of interest of this Thesis. This chapter
reviews proposals developed for the provision of QoS in Grids. This has
been done paying special attention to those proposals which consider the
network and advance reservations.
On the other hand, other research works related to the techniques devel-
oped for the QoS provision are presented. In this way, this chapter studies
several research works regarding efficient data structures, different predic-
tion techniques, fragmentation and co-allocation issues, autonomic com-
puting, service level agreements and fairness usage of resources.
18 Chapter 1. Introduction
• Chapter 3: In this chapter, a first proposal to improve the performance
provided by GridWay is described, paying special attention to the net-
work status. The implementation extends the widely used GridWay meta-
scheduler and relies on Exponential Smoothing (ExS) to predict the ex-
ecution and transfer times of jobs. An autonomic control loop (which
takes into account CPU usage and network capability) is used to alter
job admission and resource selection criteria to improve overall job com-
pletion times and throughput. This Autonomic Network-aware Meta-sche-
duler (ANM-ExS) combines concepts from Grid meta-scheduling with au-
tonomic computing, in order to provide users with a more adaptive job
management system. The architecture involves consideration of the status
of the network when reacting to changes in the system – taking into ac-
count the workload on computing resources and the network links when
making a meta-scheduling decision. Thus, the architecture provides meta-
scheduling of jobs to computing resources and connection admission con-
trol, so that the network does not become overloaded. The implementation
has been tested using a real testbed involving heterogeneous computing
resources distributed across different national organizations. Performance
evaluation illustrates the ability of ANM-ExS to schedule heterogeneous
jobs onto computing resources more efficiently than the conventional Grid
meta-scheduling algorithms used by GridWay.
• Chapter 4: In this chapter the meta-scheduling in advance framework,
named SA-Layer, is presented, together with all the predictions techniques
developed to make possible this kind of scheduling and with the aim of
addressing the QoS required by the users. This QoS is defined in terms
of times constraints, in our case, job start time (first time when a certain
job can start its execution) and job deadline (time when the job must have
finished its execution). Details on how the scheduling is performed, and
about how to select the resources and the time periods to execute the jobs
are presented. Moreover, different prediction techniques are described and
evaluated.
• Chapter 5: Once the SA-Layer is capable of managing scheduling in ad-
vance decisions in a correct and accurate way, the next chapter is based
on presenting the techniques needed to deal with the poor resource utiliza-
1.5. Structure of this Thesis 19
tion due to the fragmentation. The fragmentation appears as a well known
effect in every scheduling process: jobs may be rejected even if the remain-
ing capacity is enough to execute the job. On the other hand, other jobs
may be rejected owing to the fact that the meta-scheduler system is not
capable of foreseeing the future. Thus, jobs may be rejected due to un-
favorable previous decisions. This chapter presents two techniques which
try to avoid both rejection problems, named Replanning Capacity (RC) and
Bag of Task Rescheduling (BoT-R). Finally, a performance evaluation sec-
tion highlights the benefits of both techniques from the resource usage
viewpoint and consequently, from the QoS users perspective.
• Chapter 6: In this chapter, the integration between our meta-scheduling in
advance system and another one in charge of a fairshare resource usage is
presented. This second system is named FSGrid [4]. The integration of both
systems makes possible to manage QoS in terms of jobs finished within its
deadlines, and also in terms of fairshare usage of the resources from the
users point of view by taking into account the established policy. Moreover,
it is possible to define usage policy not only at user level, but also at project
or virtual organization levels. Finally, this chapter presents a performance
evaluation which highlights how our system manages different QoS taking
into account the policy set when using FSGrid system. The improvement
in the FSGrid convergence rate obtained thanks to the predictions made by
SA-Layer is also presented.
• Chapter 7: This chapter summarizes conclusions from the work carried
out in this Thesis, as well as scientific contributions, that is, national and
international publications. Finally, future work guidelines that can be ac-
complished from this Thesis are also presented.
20 Chapter 1. Introduction
CHAPTER
2QoS Provision and
Meta-Scheduling in Grids
This chapter makes a review of several related works into the different fields
that had to be addressed to improve the Quality of Service (QoS) provided to
users into a Grid environment though our network–aware meta-scheduling in
advance system.
2.1 Introduction
The rapid evolution of Grid Computing and the development of new middleware
services make Grid platforms increasingly used not only for best effort scien-
tific jobs but also in industrial and business applications [31]. Due to this fact,
the desire for QoS support has been growing. However, providing this QoS over
current Grid infrastructures is rather difficult as they were originally designed
without any support for QoS. Grid computing first aim was the intent to al-
low different organizations to share heterogeneous resources connected through
the network for collaborative purpose, building what was called VOs. These
VOs provide an abstraction level to manage large tasks but without any require-
ments on completion time or constraints. Software infrastructures required for
resource management and other tasks such as security, information dissemina-
21
22 Chapter 2. QoS Provision and Meta-Scheduling in Grids
tion and remote access are provided through Grid toolkits such as Globus [9]
and Legion [19].
As opposite to Cloud computing, where usually the resources are under the
control of a single management entity and dedicated, in a Grid infrastructure,
resources are shared, non dedicated and the global management policy has to
cooperate with and respect local policies. The management of QoS on the Grid
is therefore a complex problem that spans over all aspects of the Grid, involving
different layers of the Grid architecture [31]. Thus, we propose an approach,
based on a number of cooperating modules and on a specific QoS-management
layer in the Grid architecture, called SA-Layer, to address these issues.
Grids are dynamic and inter-domain environments where the assumptions
on the system behaviors have also to be dynamic and continually updated. Suf-
fice it to say that resources are heterogeneous and belong to different owners,
whilst users change continuously, as well as their job requirements. All these
facts makes QoS support much more complex than in other environments. As
reliability is strongly related to QoS provisioning, Grid systems must assure a
good level of reliability in spite of the complexity, heterogeneity and high dy-
namism of the Grid resources.
2.2 Models for addressing Quality of Service in Grids
From a QoS perspective, three different main models may be distinguished in
the Grid environments [31].
2.2.1 Best Effort Model
The Best Effort model is based just on providing a universal and common plat-
form to share resources across several administrative domains. Thus, an effi-
cient mapping between jobs and resources or the provision of any QoS is not
considered. Resources offered as best effort do not impose any constraint on the
resource owners, as resources are shared when idle.
Condor-G [22] has been one of the first Grid meta-schedulers. It has been
designed as a natural evolution of the Condor scheduler [15], for working over
2.2. Models for addressing Quality of Service in Grids 23
VOs and in an inter-domain way. Through the Condor-G agent, the Grid users
see the Grid as an entirely local resource. Condor-G acts as a scheduler, dis-
patching the submitted jobs to resources, and managing their execution (sus-
pend/resume) and monitoring the state of executing jobs. Nonetheless, Condor-
G shows some design and technical limits. First, being centralized it does not
scale well on large Grids and constitutes a single point of failure. Moreover, it
makes scheduling decisions assuming a total knowledge of jobs and resources,
which is a wrong assumption in dynamic environments like Grids. In general,
such drawbacks are the main cause of the poor efficiency of the first meta-sche-
duler generation.
2.2.2 QoS Model
Although no explicit support of QoS is defined at Grid level, the introduction of
standards, together with the need of QoS, had led to the definition of the first
effective architectures that provide some QoS functionalities, working over the
middleware. Examples of mechanisms dealing with QoS management support
are Grid Quality of Service Management (G-QoSM) [32] or Globus Architecture for
Reservation and Allocation (GARA) [33].
GARA is one of the seminal works providing support to QoS though making
physical reservation of Grid resources. G-QoSM framework is aimed at providing
QoS for applications in a computational Grid. As a framework, it works over the
middleware (Globus Toolkit [9]), which it is used for obtaining basic Grid func-
tionalities. G-QoSM presents a scalable and distributed architecture, replicated
in every administrative domain for managing the QoS at different abstraction
levels. G-QoSM supports all kinds of QoS parameters and functionalities for
optimizing the management of SLAs with explicit QoS constraints, and it is able
to manage applications with strict QoS requirements, like computational steer-
ing. However, the support of G-QoSM for QoS can be further enhanced. A first
drawback consists in the assumption that the user is able to indicate precisely
its QoS requirements in terms of quantitative low–level resources only. Another
limitation is that it relies only on middleware for providing QoS at resource level.
Finally, it does not provide support for managing deadlines on application exe-
cutions.
24 Chapter 2. QoS Provision and Meta-Scheduling in Grids
2.2.3 Economic Model
The main market–based Grid paradigm is defined owing to the growing demand
of QoS by applications. This fact leads to the development of signed contract,e.g.,
by using a Service Level Agreement (SLA) [34]. Under this scenario, owners may
offer their resources at distinct QoS classes (e.g., gold, silver and bronze as
outlined in [35]), requesting different prices for them. The users who pay for
them acquire guarantees of obtaining the access to those resources with the
requested QoS level.
On the other hand, there are approaches for taking into account economics
issues. For instance, the GRACE infrastructure [36] is a scalable component
that works over a basic middleware like Globus Toolkit, providing additional
functionalities with support for economics. In particular, it provides support to
resource discovery based on cost, and APIs for managing a cost for resources
and a budget for users. GRACE is the first approach that introduces explic-
itly a market in the Grid environment. However, economy management is quite
simple, there is no implementation of advanced economy models, and only the
stakeholders are final users and resource owners; in this sense, GRACE intro-
duces economic parameters, such as cost and price, but it does not support a
comprehensive and complex market–based economic system yet. Even though
the approach could support quantitative QoS parameters as extensions, QoS is
not directly managed.
2.3 Proposals for Addressing Quality of Service in Grids
QoS provisioning is the topic where this dissertation is focused on. Having a
good QoS model, which ensure the QoS provided to the users, is the base for
building a strong economic model. Hence, several extensions to the basic func-
tionality of a Grid middleware to enable QoS support are proposed. This QoS-
management layer is aimed at overcoming the limitations of the current ap-
proaches to support QoS on Grids. It lies between the users and the middle-
ware, and is devoted to explicitly manage QoS support and functionalities. It
constitutes a universal interface for both users/applications with QoS requests
and owners which offer QoS on their resources.
2.3. Proposals for Addressing Quality of Service in Grids 25
Next subsections detail a compilation of literature approaches whose main
aim is the QoS provisioning into a Grid environment. They try to address QoS
by using different methods, such as advance reservations or co-allocation of
jobs, among others. They are presented together with their weak points that
need to be solved.
On the other hand, there are also showed other research works related to the
techniques that this Thesis uses to try to manage the QoS in Grid environments.
For instance, a suitable data structure to manage the information about future
usage of resources is needed, and to this end, several of them have been studied
and presented, such as Grid Advanced Reservation Queue (GarQ) [37] or red–
black trees [38]. Following this trend, several research works regarding efficient
data structures, different prediction techniques, ways of measuring the frag-
mentation at the allocation process, co-allocation and rescheduling techniques,
autonomic computing, service level agreements and fairness usage of resources
have been reviewed, as next sections detail.
2.3.1 Scheduling Techniques
In most Grid systems the pending jobs are stored into queues until the meta-
scheduler has available resources to execute them. However, each system may
develop different scheduling algorithms to process this queue [39], such as, First
Come First Serve (FCFS), Shortest Job First (SJF), Earliest Deadline First (EDF)
or EASY Backfilling. They choose the jobs to be executed taking into account
different parameters, such as their init time, the number of resources or the
job execution duration. However, these classic scheduling algorithms are not
ready to support the Grid dynamism and because of that, they cannot provide
any guarantee of jobs being executed. Accordingly, they cannot ensure that a
certain job is going to be executed before a certain deadline. Consequently, no
QoS is provided at all.
A number of research projects have studied the QoS provision in Grid envi-
ronments, such as, GARA [33], G-QoSM [32], Grid Network-aware Resource Bro-
ker (GNRB) [40], [41] or [42]. They use several schedulers for mapping the users’
jobs to resources. For instance, the schedulers used by GARA and G-QoSM are
Dynamic Soft Real Time Scheduler (DSRT) [43] and PBS [44]. However, these
26 Chapter 2. QoS Provision and Meta-Scheduling in Grids
schedulers only pay attention to the load of the computing resources, making
possible to choose a powerful unloaded computing resource with an overloaded
network. This may lead to a deterioration of the performance received by users,
especially when the job is network demanding. The network is thereby a key
component within Grid systems that needs attention when making tasks such
as scheduling, migrating or monitoring [45].
Surprisingly, many of the above efforts do not take network capability into
account when scheduling tasks. Therefore, other meta-schedulers available to-
day, such as GridLab Resource Management System (GRMS) [29], Community
Scheduler Framework (CSF) [18], Grid Service Broker [30], Grid Network Bro-
ker (GNB) [46] or GridWay [10] need to be analyzed. However, they also have
some drawbacks. GRMS is obsolete, as the last Globus version it deals with is
2.4. CSF provides reservations based on a number of features, including the
network, but it is a centralized engine and is not intended for bulk data trans-
fer [47]. Grid Service Broker already includes network information provided by
Network Weather Service (NWS) [48] to perform the meta-scheduling, but it re-
quires information on the effective bandwidth between all the data hosts and all
the compute hosts in the system to perform network–aware scheduling, which
makes the proposal difficult to scale. GNB [46] is an autonomic network–aware
meta-scheduling framework, but it is only tested by means of simulations. Fi-
nally, GridWay is straightforward to install and use, and its modular architecture
allows extensions to be implemented easily. However, the network is not among
the parameters it uses to perform meta-scheduling.
2.3.2 Advance Reservation
On the whole, both Grid resources and interconnection networks may be un-
stable. This means that their performance may vary over time, which may lead
to jobs failing or to very large execution time. Due to this fact, a Grid system
needs an efficient scheduling algorithm. For that purpose, an interesting point
is to try to ensure that a specific resource is available when an application needs
it. In this way, some QoS may be provided. To this end, as several researches
said [49] [50], it is necessary to reserve the resource usage.
2.3. Proposals for Addressing Quality of Service in Grids 27
In the main, the resources that may be reserved or requested are the com-
putational ones, storage systems, bandwidth, . . . , or a combination of some of
them. Moreover, this reservations could be classified into two different kinds,
immediate and in advance, depending on the start time requested.
Thus, an advance reservation is defined as a “possibly limited or restricted del-
egation of a particular resource capability over a defined time interval, obtained by
the requester from the resource owner through a negotiation process [51]”. More-
over, this process may be split into two steps:
1. Scheduling decision: it is the phase where the resource and the time
period to execute the job are selected. However, in this step, there is no
physical reservation of the resources.
2. Negotiation of the reserve: it is the phase where the physical reserva-
tion of the resource takes place. To this end, the LRMS must provide this
functionality, such as Maui [52].
Several projects have aimed at exploring advance reservation of resources
(among others, GARA [53], Grid Capacity Planning [54], or Vertically Integrated
Optical testbed for Large Application (VIOLA) [55]).
Globus Architecture for Reservation and Allocation (GARA) [53] architecture
provides one of the seminal works on Grid resource management system with
improved features like advance reservation of resources, and a first support to
QoS. Reservation becomes important after the signature of an SLA contract
to avoid the resources involved in the contract to be allocated by other users.
More specifically, GARA provides a uniform way to allow users and applications
to manage both reservations and allocations. Since then, advance reservations
have been studied in numerous contexts, such as clusters (Maui Scheduler [52]).
Among the systems that allow resource reservations in a Grid we can find Grid
Capacity Planning [54], that provides users with reservations of Grid resources
through negotiations, co-allocations and pricing. Another important system is
VIOLA [55], which includes a meta-scheduling framework that provides co-allo-
cation support for both computing and network resources. However, all these
techniques have the same main drawback, not all the resources can be reserved
in a Grid environment as not all of them belong to the same administrative
28 Chapter 2. QoS Provision and Meta-Scheduling in Grids
domain (under ownership of a different administrator) or not all the resources
provide this functionality.
In addition to this main drawback, support for reservation in the under-
lying infrastructure is currently limited, in spite of being a required feature
to meet QoS guarantees in Grid environments, as several contributions con-
clude [53] [54]. Nevertheless, there is a performance penalty imposed by the us-
age of advance reservations (typically decreased resource utilization) which has
been studied in [56]. By contrast, Qu [57] describes a method to overcome this
shortcoming by adding a Grid advance reservation manager on top of the local
scheduler(s), which is similar to the way that our proposals are implemented.
Some of the most recent major works on advance reservations in Grids are
[54], [58] and [59]. In [54] a cost–aware resource model is presented in which
reservation for each application task is performed separately by negotiating with
the resource provider. In [58], Elmroth et al. present a resource selection al-
gorithm in which the computational resource is selected based on the results
of computing several benchmarks in each computational resource and network
performance predictions. Authors of [59] propose a multiobjective genetic al-
gorithm formulation for selecting the set of resources to be provisioned that
optimizes applications performance while minimizing costs.
2.3.3 Data Structures
Owing to the limitations that reservation in advance presents, this work aims
at performing scheduling in advance rather than reservations in advance of re-
sources [60]. Using this kind of scheduling requires an underlying infrastructure
capable of managing all the information in an efficient way. To this end, there are
several data structures detailed in the literature that present the strengths that
we need. A survey can be found in [37]. It is worth mentioning Grid Advanced
Reservation Queue (GarQ) [37] , which is a combination of Calendar Queue [61]
and Segment Tree [62], for administering reservations efficiently. However, in
this Thesis, red–black trees are used as they provide us with efficient access to
the information about resource usage, as it has been demonstrated in [38]. One
of the advantages if that this data structure stores only the free time periods
of each resource, in contrast to GarQ that stores all the information about the
2.3. Proposals for Addressing Quality of Service in Grids 29
reservations made. In this way, in case of large number of jobs submitted, the
number of free time periods to store will be lower.
However, in order to perform scheduling in advance two main issues arise.
First, predictions on the level of use of resources are needed. Second, realloca-
tion techniques are needed to modify job allocations when new jobs cannot be
accepted. Related approaches in these two topics are reviewed the next.
2.3.4 Prediction Techniques
Regarding job durations into resources and its waiting times in queues, there
are also several works whose main aim is estimating those times. For instance,
Queue Bounds Estimation from Time Series (QBETS) [63] tries to estimate the
probability of a job waiting no longer than startDeadline minutes if it is submit-
ted at time T . Another interesting work, based on this, is the Virtual Advance
Reservations Queues (VARQ) [64], which implements a reservation by determin-
ing when (according to predictions made by QBETS) a job should be submitted
to a batch queue so as to ensure it will be running at a particular point in future
time.
As opposite to [3] [38], where authors assume that users have prior knowl-
edge on jobs duration, such prior knowledge is not considered in the present
work. Thus, estimations on the completion times of jobs need to be calculated.
With the aim of taking accurate meta-scheduling in advance decisions, the pro-
posed system needs to perform predictions about the future resource status and
about job duration into resources. A survey of some prediction techniques can
be found in [65]. Examples include applying statistical models to previous execu-
tions [66] and heuristics based on job and resource characteristics [67]. In [66],
it is shown that although load exhibits complex properties, it is still consistently
predictable from past behavior. In [67], an evaluation of various linear time se-
ries models for prediction of CPU loads in the future is presented. In this work,
a technique based on historical data is used, since it has been demonstrated to
provide better results compared to linear functions [60].
The prediction information can be derived in two ways [68]: application–
oriented and resource-oriented. For the application–oriented approaches, the
running time of Grid tasks is directly extrapolated by using information about
30 Chapter 2. QoS Provision and Meta-Scheduling in Grids
the application, such as the running time of previous similar tasks. For the
resource–oriented approaches, the future performance of a resource, such as
CPU load and availability, is predicted by using historical information. Then,
these data are used to forecast the running time of a task, given the information
on the resource requirement of the task. In this work, a combination of these two
approaches is used. First, application–oriented approaches are used to sort out
the execution times of the applications. After that, resource–oriented approaches
calculate the time needed to perform the network transfers, and modify the es-
timations of the application execution time considering the predicted status for
the resource where the application is going to be executed.
2.3.5 Fragmentation Problems
On the other hand, fragmentation is a well known effect in every resource allo-
cation process, which decreases the resource utilization, as studied in [56]. So,
whenever a resource allocation fails, even though enough free capacity is avail-
able, fragmentation is easily spotted as cause. But, how the fragmentation can
be quantified in a system requiring continuous allocations, like time schedulers
or memory, is not a trivial issue. Owing to the fact that it presents similari-
ties with memory, different approaches focused on memory allocation have been
review, as such information could help to compare the effects of a scheduling
decision [69].
In [70] [71] the characteristics of dynamic memory allocators were studied.
They have to deal with problems such as finding a free block for satisfying a
malloc() request, choosing one block out of many possible ones, splitting a
block which is larger than the requested one, coalescing two or more adjacent
freed blocks, demanding more memory from the operating system, e.g. with
sbrk(), to serve a malloc() request.
However, the domain of memory management does not map onto the do-
main of Grid resources very well because there is no match for sbrk() (which
increments the data segment size) within the Grid domain. In addition the
main memory can be considered as homogeneous which is not true for Grid
resources. Moreover, a Grid environment even combines two–dimensions: time
and resource dimension. For the main memory, apart from locality effects, it
2.3. Proposals for Addressing Quality of Service in Grids 31
does not matter whether object 1 is in cell 1 and object 2 is in cell 2 or vice
versa. However, for Grid resources it does matter whether reservation 1 is in the
time interval 1 and reservation 2 is in time interval 2 as time interval 2 could be
too late for reservation 1. Analogously, for the other dimension it does matter
whether reservation 1 is assigned to resource 1 and reservation 2 is assigned to
resource 2 as resource 2 may not be capable of handling reservation 1.
To summarize, a drawback which is inherent to all approaches studied so
far is their limitation to one dimension. Grid resources have two dimensions:
time and resource capacity, e.g., number of CPUs, its power or bandwidth. To
this end, the correlation between the measured fragmentation of a schedule
and the future rejection rate was analysed in [69]. That paper presents a new
way to measure the fragmentation of a system and shows that the proposed
fragmentation measure is a good indicator of the state of the system. However,
as they measure the fragmentation of single resources and not of the system
as a whole, further research is needed to address fragmentation issues in the
Grid meta-scheduling domain. To this end, we have studied several metrics to
quantify existing fragmentation in Grid systems. This information can be used
to decide when some scheduling actions (e.g., rescheduling of already scheduled
tasks) have to be triggered in order to reduce the fragmentation when and where
necessary. Thus, the resource usage may be improved and the QoS offered is
consequently improved.
2.3.6 Co-allocation and Rescheduling Techniques
In the literature there are also techniques related to co-allocation of jobs, as well
as rescheduling of them. The first one sends the jobs to more than one compu-
tational resource. In this way, the system tries to ensure that the QoS requested
is fulfilled in expenses of more computational cost. The second kind of tech-
nique is based on changing the computational resource where the job was sent
to, with the objective of moving it to a better one – or at least more appropriate
in that moment. By making this last action the fragmentation may be avoided
or minimized and the number of jobs that may be accepted is increased.
One related work dealing with reallocation of jobs is [72]. This work inves-
tigates how the precision of available information affects resource provisioning
32 Chapter 2. QoS Provision and Meta-Scheduling in Grids
in multiple site environments and uses backfilling to perform that provision of
resources. However, their scenario does not totally map into ours since we take
into consideration not only deadline but also start time constraints for jobs.
Thus, in our case, backfilling is implicit. Every time the system receives a job
execution request with a start time lower than other already scheduled jobs,
if there are enough free time slots to allocate the job, it will be allocated and
consequently executed before the previously scheduled jobs.
The Phosphorus Project [73] is another interesting approach for the provi-
sion of QoS in Grids. The corresponding routing and scheduling algorithms aim
at satisfying two or more QoS requirements, by co-allocating resources either
concurrently or successively taking into account dependencies between commu-
nication and computational tasks.
Another interesting work is presented in [74], where an algorithm to perform
resource selection based on performance predictions is proposed, providing sup-
port for advance reservations as well as co-allocation of multiple resources. This
work also presents an algorithm for displacing already made reservations based
on making co-allocation of jobs, but not among jobs belonging to different users,
which is our case. Apart from that, their performance prediction techniques are
based on benchmarking comparisons, which have to be provided by the users
while in our system these performance predictions are transparently calculated
and the users do not need to know this information about the jobs to submit.
Regarding analysis of task reallocation in Grids, another research is pre-
sented in [75]. Authors present different reallocation algorithms and the study
of their behaviors in the context of a multi-cluster grid environment. However,
unlike our work, that work is center in a dedicated Grid environment.
2.3.7 Autonomic Computing
Apart from moving jobs from one resource to another and trying to submit jobs
to more than one resource, it is really important that the systems are capable of
adapting their behavior to the current status of the environment in an autonomic
way, so that jobs can be efficiently mapped to computing resources and the
overall behavior of the system could be improved. This is known as autonomic
computing.
2.3. Proposals for Addressing Quality of Service in Grids 33
An autonomic system requires sensor channels to detect the changes in the
internal state of a system and the external environment where the system is
situated. On the other hand, mechanisms to react to and counter the effects
of the changes in the environment by changing the system and maintaining
equilibrium are also neede. Sensing, Analyzing, Planning, Knowledge and Exe-
cution are thus the keywords used to identify an autonomic computing system.
A common model based on these ideas was identified by IBM Research and de-
fined as Monitor Analyze Plan Execute (MAPE) [76]. There also exist a number of
other models for autonomic computing [77], [78]. There has consequently been
significant work already undertaken towards autonomic Grid computing, such
as [79] [80] [81] [82].
In [79], an architecture to achieve automated control and management of net-
worked applications and their infrastructure based on Extensible Markup Lan-
guage (XML) format specification is presented. Liu and Parashar [80] present
an environment that supports the development of self-managed autonomic com-
ponents, with the dynamic and opportunistic composition of these components
using high–level policies to realize autonomic applications, and provide runtime
services for policy definition, deployment and execution. In [81], an autonomic
job scheduling policy for Grid systems is presented. This policy can deal with
the failure of computing resources and network links, but it does not take the
network into account in order to decide which computing resource will run each
user application. Only idle/busy periods of computing resources are used to
support scheduling. Ultimately, in [82], a simple but effective policy was formu-
lated, which prioritized the finishing and acceptance of jobs over their response
time and throughput. It was determined that due to the dynamic nature of the
problem, it could be best resolved by adding self-managing capabilities to the
middleware. Using the new policy, a prototype of an autonomous system was
built and succeeded in allowing more jobs to be accepted and finished correctly.
2.3.8 Service Level Agreements
In this QoS context, Service Level Agreements (SLAs) may be the containers of
QoS information, in a formal way, between the user and the owner. More specifi-
cally, SLAs are documents that define a contract between a service requester (the
34 Chapter 2. QoS Provision and Meta-Scheduling in Grids
Grid user) and a service provider (the resource, as a Grid Service), constituting
a basis for the definition of QoS.
Nowadays, SLAs are a hot topic. Many efforts have been done on several
fields, like their management [83], QoS implications [84], semantic and virtual-
ization exploitation [85] and specially on their standardization. The most impor-
tant improvement within SLAs has been the WS-Agreement specification [86],
which is considered the “de–facto” standard. The structure and mechanisms to
deploy SLAs over a system are described from a global point of view. Thanks
to the recent revision of the WS-Agreement specification [87], a new negotiation
protocol has been defined, introducing the renegotiation concept as a multiple
message interaction between user and service provider to achieve better agree-
ments. But WS-Agreement is not the only available specification, SLAng [88]
and Web Service Level Agreement (WSLA) [89] are alternatives to it.
Owing to the Service Level Agreements importance, many projects are inter-
ested on its implementation [90]. Most of them implement the WS-Agreement,
like SLA@SOI [91], AssessGrid [92] or Brein [93]. The first one is focused on
the implantation of SLAs into Service Oriented Infrastructures (SOIs) [91] from a
generic point of view. AssessGrid and Brein have a common purpose, which
is to promote Grid computational environments into business environments
and society. However, AssessGrid is focused on risk assessment for trustable
Grids whilst Brein is focused on an efficient handling and management of Grid
computing based on artificial intelligence, semantic web and intelligent sys-
tems. Another important project within this matter is WS-AGreement for Java
(WSAG4J) [94], which is a generic implementation of the WS-Agreement specifi-
cation developed by the Fraunhofer SCAI Institute as a development framework.
It is designed for a quick development and debug of services and applications
based on WS-Agreement.
It should be noted that not all projects implement WS-Agreement for their
SLA management. An example is NextGrid [95], which is focused on business
Grid exploitation.
An example of a middleware as an SLA manager is GRUBER [96]. It could
be seen as an Over–Middleware application, implemented both in pre-WS and in
WS versions. Whilst Condor-G [22] works on a very simplified model of the Grid
2.3. Proposals for Addressing Quality of Service in Grids 35
structure (as a simple set of resources and jobs), GRUBER considers the Grid as
a three–level hierarchy of users, groups and VOs, so that each user belongs to
one group only, and a group is a part of a VO only. The first GRUBER version
is centralized like Condor-G, whereas a distributed version (DI-GRUBER) grants
higher scalability. GRUBER works like previous meta-schedulers, discovering
and matchmaking resources and jobs independently of user and owner needs,
without any QoS support. However, GRUBER is important for QoS-based Grids
as an SLA manager; in fact, it can be implemented under a QoS framework,
providing more effective services, particularly for SLA negotiation, than basic
middlewares or previous architectures like GARA [33].
2.3.9 Fairness Resource Usage
Finally, if a Grid system wants to manage QoS through SLA contracts, it has to
ensure a fairness usage of resources with the aim of being able to address this
QoS to all users being aware of the previous usage of the system resources made
by users.
To this end, a number of mechanisms for fairshare-based job prioritization
exist. For instance, the Fair Share scheduler [97], which extends the concept of
resource allocation fairness to user level in uni-processor environments. Exist-
ing resource management and scheduling systems such as Simple Linux Utility
for Resource Management (SLURM) [98] and Maui [52] incorporate their own ver-
sions of fairshare mechanisms, but they are typically limited to enforcing usage
quotas and operating on usage data from within ownership domains.
Surveys of Grid fairshare scheduling and resource allocation mechanisms
are presented in [99] and [100]. The former provides a classification of allocation
mechanisms based on categories such as volunteer, agreement-based, and eco-
nomic mechanisms, whilst the latter provides a study based on mathematical
analysis of different strategies for share scheduling in uniprocessor, multipro-
cessor, and distributed systems.
Fair Execution Time Estimation (FETE) scheduling [101] constitutes a version
of Grid fairshare scheduling where jobs are scheduled based on completion time
predictions, as in our case (similar to scheduling in time-sharing systems). How-
ever, this work is focused on minimizing risk for missed job deadlines while in
36 Chapter 2. QoS Provision and Meta-Scheduling in Grids
our case, the predictions are used for making an accurate scheduling of jobs
in advance, and also to try to reach a fairshare resource usage as soon as pos-
sible, even without having finished the execution of any job. Moreover, FETE
proposal is evaluated by using a simulated environment assuming that tasks
get a fair share of the resource’s computational power. Additional algorithms for
fair scheduling focused on Grids environments are presented in [102].
Finally, there is another fairshare job prioritization system, named FSGrid [4],
which is used in this Thesis with the aim of providing different levels of QoS to
the different users. This system provides a distributed system for decentralized
fairshare job prioritization that operates on global (Grid-wide) usage data and
provides fairshare support to resource site schedulers operation across owner-
ship domains. Moreover, it calculates job execution prioritizations not only for
users but also for projects and virtual organizations.
2.4 Summary
The improvement of Quality of Service (QoS) in Grids by means of efficient meta-
scheduling is the topic of interest of this Thesis. This chapter reviews proposals
developed for addressing QoS in Grids over time and trends. This has been done
paying special attention to those proposals which have something in common
with each one of the issues that we had to face up to in the development of our
system.
Among them, GARA [33] and G-QoSM [32] can provide QoS on a variety of
resources (namely, computing resources, storage, and network), though none of
them use the network as a parameter to perform the mapping of jobs to com-
puting resources. To this end, techniques for autonomic computing have been
developed to take into consideration the network information, as well as other
information regarding previous resources status, when making the mapping be-
tween jobs and resources.
On the other hand, the fact that advance reservations are not always fea-
sible in the resources of Grid environments was the reason for developing a
meta-scheduler in advance systems which does not make physical reservation
of resources. Due to the needs of this scheduling technique, some related work
2.4. Summary 37
regarding efficient data structures and prediction techniques are detailed. The
efficient data structures are needed to provide scalable and efficient mapping
between resources and jobs by taking into account the previous scheduling de-
cisions. The prediction techniques are needed to know the future status of re-
sources and the duration of jobs when they are going to be executed on them.
The remaining proposals are centered on different techniques used for man-
aging the QoS at different levels. They are related to the techniques used in the
development of our meta-scheduling in advance system. There are proposals for
measuring the fragmentation generated in the scheduling process, for managing
SLAs, or for improving the fairness in the resource usage among users, projects
and virtual organizations.
38 Chapter 2. QoS Provision and Meta-Scheduling in Grids
CHAPTER
3Including Metrics to Improve QoS
at the Meta-Scheduling Level
One of the key motivations of computational and data Grids is the ability to
make coordinated use of heterogeneous computing resources which are geo-
graphically dispersed. Consequently the performance of the network linking all
the resources present in a Grid has a significant impact on the performance of
an application. It is therefore essential to consider network characteristics when
carrying out tasks such as scheduling, migration or monitoring of jobs. This
chapter focuses on an implementation of an autonomic network–aware meta-
scheduling architecture that is capable of adapting its behavior to the current
status of the environment, so that jobs can be efficiently mapped to computing
resources.
3.1 Introduction
Computational and data Grids allow the coordinated use of heterogeneous com-
puting resources within large–scale parallel applications in science, engineering
and commerce [1]. Since organizations sharing their resources in such a context
still keep their independence and autonomy [2], Grids are highly variable sys-
tems in which resources may join/leave the system at any time. This variability
39
40 Chapter 3. Including Metrics to Improve QoS at the Meta-Scheduling Level
makes QoS highly desirable, though often very difficult to achieve in practice.
One reason for this limitation is the lack of a central entity that orchestrates the
entire system. This is especially true in the case of the network that connects
the various components of a Grid system.
Achieving an end-to-end QoS is often difficult, as without resource reserva-
tion any guarantees on QoS are often hard to achieve. Furthermore, in a real
Grid system, reservations may not be always feasible, since not all the LRMS
permit them. There are also other types of resource properties, such as band-
width, which lack a global management entity thereby making their reservation
impossible.
However, for applications that need a timely response (i.e., distributed en-
gine diagnostics [103] or collaborative visualization [104]), the Grid must provide
users with some assurance about the use of resources – a non-trivial subject
when viewed in the context of network QoS. In a Grid, entities communicate
with each other using an interconnection network – resulting in the network
playing an essential role in Grid systems [33].
In [46], authors proposed an autonomic network–aware Grid meta-schedu-
ling architecture as a possible solution. This architecture takes into account the
status of the system in order to make meta-scheduling decisions – paying spe-
cial attention to the network capability. This is a modular architecture in which
each module works independently of others, thereby providing an architecture
that can be adapted to new requirements easily. It also be noted that this archi-
tecture was proposed from a formal point of view. In this way, it was checked
by means of simulation. In this Thesis, the aforementioned architecture has
been implemented into a real Grid environment, as an extension to the Grid-
Way meta-scheduler [10], with case studies and performance results provided
to demonstrate how it can be used. A scheduling technique that makes use of
ExS [105] to calculate predictions on the completion times of jobs is also devel-
oped. Thus, the main contributions of this chapter are: (1) an implementation of
an architecture to perform autonomic network–aware meta-scheduling based on
the widely used GridWay system; (2) a scheduling technique that relies on ExS
to predict the completion times of jobs; (3) a performance evaluation carried out
using a testbed involving workloads and heterogeneous resources from several
organizations.
3.2. Autonomic Network–aware Meta-scheduling (ANM) 41
The chapter is structured as follows: Section 3.2 discusses a scenario in
which an autonomic scheduler can be used and harnessed. Section 3.3 contains
details about the implementation based on an extension to the GridWay meta-
scheduler. Section 3.4 presents a performance evaluation of our approach, and
a summary of the chapter is presented in Section 3.5.
3.2 Autonomic Network–awareMeta-scheduling (ANM)
The availability of resources within a Grid environment may vary over time –
some resources may fail whereas others may join or leave the system at any time.
Additionally, each Grid resource must execute a workload that combines locally
generated tasks with those that have been submitted from external (remote) user
applications. Hence, each new task influences the execution of existing applica-
tions, requiring a resource selection strategy that can account for this dynamism
within the system. This is the reason why the provision of QoS in a Grid sys-
tem has been explored by a number of research projects, such as GARA [33] or
G-QoSM [32], which use as schedulers DSRT [43] and PBS [44], respectively.
However, these schedulers only pay attention to the load of the computing re-
source, thus a powerful unloaded computing resource with an overloaded net-
work could be chosen to run jobs, which decreases the performance received by
users, especially when the job requires a high network Input/Output (I/O). As
the network is a key component within a Grid system due to the coordinated use
of distributed resources, attention should be paid when carrying out tasks such
as scheduling, migration, or monitoring [45].
Under those conditions, developing an autonomic system that react (adapting
its behavior) depending on the system status is a must. Conceptually, an auto-
nomic system requires: (a) sensor channels to sense the changes in the internal
state of a system and the external environment in which the system is situated,
and (b) motor channels to react to and counter the effects of the changes in the
environment by changing the system and maintaining equilibrium.
Hence, Sensing, Analyzing, Planning, Knowledge and Execution are thus the
keywords used to identify an autonomic computing system, as identified by IBM
Research defining MAPE [76]. There has been significant work already under-
42 Chapter 3. Including Metrics to Improve QoS at the Meta-Scheduling Level
Figure 3.1. Example scenario.
taken towards autonomic Grid computing. For instance, an architecture to
achieve automated control and management of networked applications and their
infrastructure based on XML format specification is presented in [79]. Another
example is [81], where an autonomic job scheduling policy for Grid systems is
presented. This policy can deal with the failure of computing resources and
network links, but it does not take the network into account in order to decide
which computing resource will run each user application.
In contrast, our autonomic approach uses a variety of parameters to make
a resource selection, such as network bandwidth, CPU usage or resource good-
ness, amongst others. A motivating scenario for an autonomic network–aware
meta-scheduler architecture is depicted in Figure 3.1 and includes the following
entities [46]:
• Users, each with a number of jobs/tasks to run.
• Computing resources, which may include clusters running a LRMS, such
as PBS [44].
• GNB (Grid Network Broker), an autonomic network–aware meta-scheduler.
• GIS (Grid Information System), such as [106], which keeps a list of available
resources.
• Resource monitor(s), such as Ganglia [107] or Iperf [108], which provide
detailed information on the status of the resources.
3.2. Autonomic Network–aware Meta-scheduling (ANM) 43
• BB (Bandwidth Broker), such as [109], which is in charge of the adminis-
trative domain and has direct access to routers, mainly for configuration
and topology discovering purposes.
• Interconnection network, such as a Local Area Network (LAN) or the In-
ternet.
The interaction between components within the architecture is as follows:
1. Users ask the GNB for a resource to run their jobs. Users provide fea-
tures of jobs (the “job template”), that includes the input/output files, the
executable file and a deadline, amongst other parameters.
2. The GNB performs two operations for each job:
(a) It performs Connection Admission Control (CAC) by both (a) filtering
out the resources that do not have enough capacity to accept the job
and (b) verifying whether the required QoS can be fulfilled by execut-
ing the job on the selected resource – i.e. whether the execution can
be finished before the deadline set by the user. An estimation of job
execution time is undertaken to support this [110] [111] (explained in
the next section).
(b) It chooses the most appropriate resource for its execution, i.e. one that
exhibits the best tolerance (explained in the next section).
Choosing a tolerance parameter and supporting CAC constitute the main
autonomic capabilities of the system. The tolerance parameter is dynam-
ically adjusted based on estimated and real execution time of jobs on a
particular resource, and admission control is subsequently used to limit
allocation of jobs to particular resources. The autonomic control loop in-
volves a dynamic adjustment of the tolerance parameter to improve job
completion times and resource utilization. Hence, if the selected resource
is either not available or has excessive workload, the resource with the next
best tolerance is checked. This process is repeated until a suitable resource
is found or until a certain number of resources are checked. Finally, if it is
not possible to allocate the job, it will be dropped, since its QoS cannot be
fulfilled given current resource availability.
44 Chapter 3. Including Metrics to Improve QoS at the Meta-Scheduling Level
To achieve this, the GNB first obtains a list of resources from the GIS and
subsequently gets the current load on each of these from the resource mon-
itor.
3. When found, the GNB submits the job to the selected computing resource.
4. Finally, after job completion, the GNB will get the output sent back from
the resource, and will forward it to the user.
Once the GNB has selected a resource to execute the job (step 2), the value
of tolerance is updated. This value indicates how accurately completion times of
jobs can be predicted. To achieve this, the GNB first estimates the job completion
time (prior to actual execution), taking into account data transfer time and CPU
time. Subsequently on job completion, the estimated value is compared with the
real value of job completion time. The difference between these represents the
accuracy with which such estimation can be achieved – in practice often limited
due to some sites and administrative domains not sharing information on the
load of their resources. Resource contention provides another obstacle causing
host load and availability to vary over time, making the completion time estima-
tion difficult [68]. Consequently, it is necessary to estimate how trustworthy a
specific resource is likely to be, or even if it will be available to execute the job.
The resource that is selected to execute a job is the one that has provided the
most predictable behavior up to the point the schedule is generated. Information
on the status of already scheduled jobs is used to obtain network and CPU
tolerances. The CPU execution time is calculated by means of using Exponential
Smoothing functions (ExS) [111] to tune resource status estimations, which are
in turn calculated by using information about past executions of similar tasks.
The information on network latency is obtained by means of Iperf [108], and
information on jobs already scheduled is obtained by means of the GIS – in our
case, from Globus Grid Resource Allocation Management (GRAM) [106].
The monitoring of the network and computing resources is carried out with
a given frequency referred to as the monitoring interval. As the GNB performs
scheduling of jobs to computing resources in between two consecutive moni-
toring intervals, it must take into account the jobs already scheduled on those
resources – i.e., calculate the effective bandwidth taking account of the existing
workload.
3.3. Implementation of ANM 45
3.3 Implementation of ANM
The GNB has been implemented as an extension to the GridWay meta-scheduler.
To achieve this, it was first necessary to make GridWay network–aware, as well
as performing the needed adaptations to develop a scalable and suitable solu-
tion for real Grid environments. Details about how this has been undertaken
along with a description about predictions of network and CPU performance are
provided in this section.
3.3.1 Extending GridWay to be network–aware
GridWay [12] has been modified to take into account the status of the network
when ordering resources in the meta-scheduling process [112]. This value is
calculated by using the Iperf tool [108]. Similarly, monitoring data from com-
puting resources is obtained by Ganglia (already present in GridWay) and the
GIS provided by the Globus Toolkit [9].
GridWay performs meta-scheduling by requiring the user to provide a job
template which specifies the features of the job, including the executable file
and the input and output files, amongst others. The job template has two tags
that specify the criteria used by GridWay for selecting resources to run the job,
namely, REQUIREMENTS and RANK. With the REQUIREMENTS tag, the user can set
the minimal requirements needed to run the job, thus applying a filter on all the
resources known to GridWay. Once the REQUIREMENTS tag is processed, the set
of resources that fulfill the REQUIREMENTS are sorted according to the criteria
posed by the RANK tag. The process is depicted in Algorithm 1. For both tags,
several characteristics such as CPU type & speed, operating system, memory
available, etc. can be specified. Many of these values are gathered through
the Globus GIS module, while others (dynamic ones, such as amount of free
memory) are monitored through Ganglia [107].
Algorithm 1 Resource selection algorithm used in GridWay1: R: set of resources known to GridWay2: R
Req = {r ∈ R / r fulfills REQUIREMENTS condition}3: return r
′ ∈ RReq / ∀r ∈ R
Req, RANK(r′) ≥ RANK (r) & r′ 6= r
46 Chapter 3. Including Metrics to Improve QoS at the Meta-Scheduling Level
Figure 3.2. Conceptual view of the extensions introduced to GridWay
For this implementation, the BANDWIDTH attribute has been introduced into
GridWay. It refers to the effective network bandwidth in the path between the
GridWay node and each computing resource, i.e. the path traversed by the data
(I/O files) needed by the job to run. If an application has a large amount of input
data, it must correspondingly choose an appropriate network path based on the
value of this attribute. Both the REQUIREMENTS and the RANK expressions can
utilize the BANDWIDTH attribute. Thus, the user can filter and/or sort resources
by also taking into account the effective bandwidth from the GridWay node to
each resource. Figure 3.2 illustrates the extensions introduced in GridWay to
be network–aware. The elements in gray shade have been added in this Thesis.
Details about these extensions can be found in [112], where a performance eval-
uation that highlights the improvement obtained by making GridWay network
aware is included. Besides, an evaluation of the tuning of the network tools is
presented in [113].
3.3.2 Autonomic scheduler
Autonomic behavior in GridWay has been implemented by means of (1) per-
forming Connection Admission Control (CAC); (2) adding a new attribute named
TOLERANCE that GridWay uses to perform the filtering and sorting of re-
sources, reacting to changes in the state of the system; and (3) using Exponential
Smoothing (ExS) [105] to tune the predictions on the duration of jobs. This sec-
tion presents details on how these have been implemented.
3.3. Implementation of ANM 47
Algorithm 2 CAC algorithm.
1: R: set of resources known to GridWay {ri / i in [1..n] }2: RCAC : set of resources that fulfill the CAC algorithm3: CPUfree(ri): the percentage of free CPU of resource ri4: j: a job5: deadline(j): deadline of job j
6: tricompletion(j): estimated completion time of job j in resource ri7: MaxRes: maximum number of resources to check8: RCAC= ∅;9: i = 1;
10: while (CPUfree(ri) ≥ thresholdCPU ) AND (i ≤ MaxRes)) do11: if (tricompletion(j) < deadline(j)) then12: RCAC = RCAC + ri13: end if14: increment (i)15: end while
Connection admission control (CAC)
The Connection Admission Control Algorithm (see Algorithm 2) checks resources
(with R being the set of computing resources of the same VO) to identify those
with enough CPU capacity (thresholdCPU ) on which the job can be executed
within its deadline (line 11). If the predicted completion time for the job is lower
than the deadline, the resource is chosen (line 12). Otherwise, the next resource
is checked. This process is repeated until all the resources are checked (line 10).
Not all the known resources have to be checked, for efficiency and scalability an
upper limit may be defined (MaxRes). If RCAC is empty, then the job is rejected
– or alternatively a negotiation process is started.
Scheduling Algorithm
Once the target set of resources (RCAC ) has been calculated by the CAC algo-
rithm, the scheduling algorithm sorts them by taking into consideration their
TOLERANCE (the way of estimating this value is explained in next subsection),
from the lowest to the highest one, as outlined in Algorithm 3.
As discussed next, resources with a high TOLERANCE value are less pre-
dictable, hence even though a job execution time on that resource may be within
the deadline, this does not mean that the deadline will actually be met. It must
be noted that predictions about durations of jobs have to be used, such as
in [110] [111].
48 Chapter 3. Including Metrics to Improve QoS at the Meta-Scheduling Level
Algorithm 3 Scheduling algorithm.1: j: a job2: RCAC : set of resources that fulfill the CAC criteria3: rexe: resource where j will be submitted4: for all ri in RCAC do5: if Toleranceri < Tolerancerexe then6: rexe = ri7: end if8: increment (i)9: end for
On the other hand, and due to performance and scalability reasons, the CAC
and scheduling algorithms are tightly coupled. This means that they are exe-
cuted together in such a way that when a resource which could execute the job
within its deadline is found, the job is submitted to that resource and the pro-
cess stops. Hence, when the CAC is filtering resources, the list of resources to
check have already been sorted by the scheduling algorithm taking into account
their TOLERANCE values.
Including the TOLERANCE attribute in GridWay
In order to filter and order the list of resources known to GridWay, considering
the accuracy of previous scheduling decisions, a new attribute has been added
to the job template. TOLERANCE has been implemented as a new attribute
that can be used both in the RANK and REQUIREMENTS tags, in the same way
as BANDWIDTH. The TOLERANCE attribute reflects the accuracy of predicting
job completion times for each resource. On performing scheduling of jobs to
computing resources and to address QoS of users (e.g. finish jobs before a
deadline), predictions on the completion time of jobs must be calculated. This
includes predictions on the transfer times (transfer of input and output files,
along with the executable file) and execution time.
The calculation of the TOLERANCE attribute is motivated by [46], where
each time a job has to be scheduled both transfer and execution latency of that
job are calculated for each resource known to the meta-scheduler. However this
is a time consuming process and therefore not scalable, so it should only be
carried out after choosing a resource (and not for all resources). Additionally,
the calculation of the TOLERANCE attribute in [46] relies on the millions of
3.3. Implementation of ANM 49
instructions that a job has, which is a measure of the size of the job provided
by the simulator but really hard to obtain in practice. This term has been sub-
stituted by the average execution time of jobs of the same type, which is a more
realistic metric to measure in practice (Section 3.3.3 explains this).
In our approach, the GNB performs scheduling for each job request and uses
the value of the TOLERANCE attribute to filter and sort the set of resources
known to GridWay. TOLERANCE values are calculated (as outlined in Equa-
tion 3.1) after every job completion and associated with the resource on which
the job has been executed. Subsequently, the GNB orders resources based on
their TOLERANCE, from the lowest to the highest, i.e. from the most pre-
dictable to the less one.
TOLERANCE(ri) = TOLERANCEricpu + TOLERANCEri
net (3.1)
TOLERANCErinet =
trinet_real − trinet_estimated
MB(3.2)
TOLERANCEricpu =
tricpu_real − tricpu_estimated
tricpu_real(j)(3.3)
The terms TOLERANCErix , x = {net, cpu}, represent the accuracy of the pre-
vious predictions carried out by the GNB for the resource ri, with i ∈ [1, n].
For TOLERANCErinet, the last measurement of network bandwidth between the
GridWay node and ri is considered, collected from the last update of this mea-
sure before the execution of the job. With this information along with the total
number of bytes to be transferred (MB), an estimation of transfer time of the job
(trinet_estimated) is calculated. After job execution, the actual time needed to com-
plete the transfers (trinet_real) can be obtained. Finally, with these two times, the
updated network tolerance for the resource where the execution took place (ri)
is calculated. The value of TOLERANCErinet reflects how accurate the prediction
50 Chapter 3. Including Metrics to Improve QoS at the Meta-Scheduling Level
on the transfer time has been for the given job. Similarly, TOLERANCEricpu is
calculated for each job after its completion.
Equations 3.2 and 3.3 show the actual formulas used for the last completed
job, where MB represents the size of the job in mega-bytes, and tricpu_real(j) rep-
resents the average execution time of a certain job (j) on a specific resource
(ri). To estimate future values of TOLERANCE an approach similar to the one
used by the Transport Control Protocol (TCP) for computing retransmissions time-
outs [114] may be used. Hence, we can consider:
D = TOLERANCE(ri)− Toleranceri(t) (3.4)
Toleranceri(t+ 1) = Toleranceri(t) +D ∗ δ (3.5)
where δ reflects the importance of the last sample in the calculation of the next
TOLERANCE (Toleranceri(t + 1)). TOLERANCE is only considered for those
resources known to GridWay to have enough available capacity to accept more
jobs. The GNB keeps a TOLERANCE value for the network and CPU capacity
of computing resource and modifies these in response to changes in the system.
Figure 3.3 illustrates the autonomic control loop for modifying the TOLERANCE
parameter, as outlined above. The ∆ means the difference between the predicted
times and the real times. Hence, it is related to the Equations 3.2 and 3.3.
3.3.3 Predicting and tuning resource performance
Two types of predictions are necessary, namely (1) predictions on the transfer
times, and (2) predictions on the execution times. These are explained next.
Calculating the network performance
Once the scheduler has sorted the available resources by using their toler-
ance values, it is necessary to estimate the effective bandwidth between two
3.3. Implementation of ANM 51
Figure 3.3. Autonomic control loop for adapting the TOLERANCE parameter.The “X" in tX_real and tX_estimate refers to the set {net, cpu}
end points in the network – these being between the GNB and the computing
resource where the job will be executed. This prediction is used for the CAC
algorithm. Estimation of link bandwidth was implemented in [112], in which the
Iperf tool monitors the available bandwidth from the GNB to all the computing
resources it knows with a given frequency. However, this only provides the ef-
fective bandwidth at the moment when monitoring is performed, but may not
be the bandwidth when a schedule needs to be defined (as other jobs may have
been scheduled and are being transferred at that moment). It is therefore neces-
sary to infer the effective bandwidth between two monitoring intervals – similar
to [46].
We achieve this by considering the number of jobs that are being submitted
to the selected resource at the point at which a schedule needs to be defined.
Hence the effective bandwidth of the path between the GNB and a computing
resource r at time t+ x can be calculated as follows:
eff_bw(ri)t+x =Bw(ri)t
#PrologJobs+ 1(3.6)
where Bw(ri)t is the last measured value (at time t) of the available bandwidth be-
tween the GNB and the resource ri selected to execute the job; and #PrologJobs
is the number of jobs that are submitting data from the GNB to that resource. In
order to take into account all the data being transferred, this number is updated
52 Chapter 3. Including Metrics to Improve QoS at the Meta-Scheduling Level
with the new incoming connection (“+1” in the Equation 3.6). Note that x must
be between (0, 1), which means that the estimation is made between two real
measures.
Once we have the effective bandwidth of a network path, the latency of the
data transfers for the job over a network path can be calculated by dividing the
I/O file size of the job in MB (megabytes) by the effective bandwidth. These
values (I/O file sizes) are known since I/O files are specified in the job template.
In this way, the estimated time to complete the transfers is obtained by using
Equation 3.7.
trinet_estimated =sizeF ilesIn
eff_bw(ri)t+x
+sizeF ilesOut
Bw(ri)t(3.7)
It must be noted that Bw(ri)t is used for calculating the time needed to com-
plete the output transfers (epilog step) since we know the number of jobs that are
sending input files but we cannot ensure how many jobs will be sending back the
output files when the job being submitted is completed. Additionally, the des-
tination resource of this output file transfers does not have to be the same for
all output transfers. Thus, a Grid meta-scheduler cannot have complete knowl-
edge of the network structure, making it necessary to make assumptions about
effective bandwidth available in the future. These assumptions are obtained by
using an Exponential Smoothing function, which is explained in Section 4.4.5.
Calculating CPU latency
Predictions of job execution time are quite difficult to obtain since there are per-
formance differences between Grid resources and their performance character-
istics may vary for different applications (e.g. resource A may execute an appli-
cation P faster than resource B, but resource B may execute application Q faster
than A). With the aim of estimating as accurately as possible the time needed
to execute the jobs on the selected resources, we apply the techniques devel-
oped in [110] [111]. These techniques use application–oriented prediction tech-
niques to sort out the execution time of the application and resource–oriented
3.4. Experiments and results 53
Algorithm 4 Estimation of Execution Time (tricpu_estimated(j))
1: R: set of resources known to GridWay {ri / i in [1..n] }2: j: the job to be executed3: tricpu_real(j)k: the k − th execution time for the application j in the resource ri4: DB_Resourcesri : the filtered database with the information about the status
of the resource ri5: CPUfree(ri): the mean percentage of free CPU in the resource ri between now
and the deadline of the j, calculated by using the Exponential Smoothing(EsS) function
6: Overload: the extra time needed due to the CPU usage at the chosen resourceri
7: tricpu_estimated(j) =∑
nk=1
tricpu_real
(j)k
n
8: Overload = tricpu_estimated(j) ∗ (1− CPU_free(ri))9: tricpu_estimated(j) = tricpu_estimated(j) +Overload
10: return tricpu_estimated(j)
approaches to recalculate the execution time of the job depending on the pre-
dicted CPU status of the resource.
Predictions on execution times are performed as explained in Algorithm 4
and are based on the average of previous executions of an application on a
particular resource (line 7) – this estimation takes into account the different
input parameters. This average is calculated for each job type, and information
related to previous execution of a specific job is used to determine an average
execution time. After that, the prediction on the future status of the CPU of each
resource is calculated by means of an Exponential Smoothing function. Finally,
the mean execution time is determined by predicting future CPU status of each
resource (line 9). More detailed information about this process is presented in
Section 4.3.4.
3.4 Experiments and results
This section describes the experiments conducted to test the usefulness of this
work, along with the results obtained.
3.4.1 Experiment Testbed
The evaluation of the autonomic implementation has been carried out in a real
Grid environment. The testbed consists of resources located at two different Uni-
54 Chapter 3. Including Metrics to Improve QoS at the Meta-Scheduling Level
Figure 3.4. Grid testbed topology.
versities, as illustrated in Figure 3.4. At the University of Castilla–La Mancha,
(UCLM, Albacete, Spain) there are resources located in two different buildings.
In one building, named Instituto de Investigación en Informática de Albacete (I3A),
there is one machine which performs the scheduling of tasks and several com-
putational resources (10 desktop computers belonging to other users). In a sec-
ond building, named Escuela Superior de Ingeniería Informática (ESII), there is
a cluster machine with 88 cores and with the PBS [14] scheduler, which is also
shared with other users. All these machines belong to the same administrative
domain (University of Castilla–La Mancha (UCLM)) but they are located within
different subnets.
On the other hand, there is another computational resource at the National
University of Distance Education (Universidad Nacional de Educación a Distancia,
UNED, Madrid, Spain), which is also a desktop computer. Thus, the network
which links this computational resource with the UCLM resources is the Inter-
net. Table 3.1 outlines the main characteristics of these computing resources.
Note that these machines belong to other users, so they have their own local
background workload (including the network load). Each non-cluster machine
3.4. Experiments and results 55
Domain Machine Hardware GlobusCPU RAM (version)
UCLM (I3A) GridWayI3A.uclm.es 2 Intel Pentium 4 CPU 3.00 GHz 2 GB v. 4.0.5UCLM (I3A) R1 2 AMD Opteron 244, 1.80 GHz 1 GB v. 4.0.3UCLM (I3A) R2 2 AMD Opteron 244, 1.80 GHz 1 GB v. 4.0.3UCLM (I3A) R3 2 AMD Opteron 244, 1.80 GHz 1 GB v. 4.0.3UCLM (I3A) R4 2 AMD Opteron 244, 1.80 GHz 1 GB v. 4.0.4UCLM (I3A) R5 2 AMD Opteron 244, 1.80 GHz 1 GB v. 4.0.8UCLM (I3A) R6 2 Intel Pentium 4 CPU 3.20 GHz 3 GB v. 4.0.7UCLM (I3A) R7 Intel Core 2 Duo CPU 2,66 GHz 2 GB v. 4.0.8UCLM (I3A) R8 2 AMD Opteron 244, 1.80 GHz 1 GB v. 4.0.7UCLM (I3A) R9 2 Intel Pentium 4 CPU 3.00 GHz 2 GB v. 4.0.4UCLM (I3A) R10 2 Intel Pentium 4 CPU 3.00 GHz 1 GB v. 4.0.8UCLM (ESII) Cluster1 22 AMD bipro dual core Opteron CPU 2.4 GHz 4 GB v. 4.0.8
UNED Uned R1 Intel Core 2 Duo CPU 2,80 GHz 4 GB v. 4.0.8
Table 3.1. Characteristics of the resources
is the desktop computer of a member of the staff (and not a dedicated machine)
at UCLM or National University of Distance Education (UNED), so they may have
different CPU and network background workloads, which are not defined in the
testbed.
3.4.2 Workload
To evaluate our implementation we use one of the GRASP [115] benchmarks,
named 3node. The 3node test consists of sending a file from a source node to
a computation node, which performs a search pattern, generating an output file
with the number of successes. The output file is sent to the result node. This test
is meant to mimic a pipelined application that obtains data at one site, computes
a result on that data at another, and analyses the result on a third site.
Furthermore, this test has parameterizable options to make it more com-
pute intensive (compute_scale parameter), which means that the run time is
increased, and/or it can become more network demanding (output_scale param-
eter), which means that the files to be transferred are bigger. This versatility
is the reason why we have chosen this test to measure the performance of our
approach. With these two parameters, it is possible to generate different types of
jobs. Therefore, in order to emulate the workload used in [46], the compute_scale
parameters takes the value 10 and the output_scale 1. Besides, the input file size
is 48 MB, and these values of output_scale create output files whose size is the
same as the input file size.
56 Chapter 3. Including Metrics to Improve QoS at the Meta-Scheduling Level
Figure 3.5. Visualization Pipeline (VP) test.
To better validate and evaluate this implementation, one of the NAS Grid
Benchmarks (NGB) [116], named Visualization Pipeline (VP), has also been used.
This test has different workflow dependencies. Some jobs are more computation-
ally intensive whilst others are network demanding. Therefore, the VP allows us
to explore a big spectrum of running conditions. Figure 3.5 shows the workflow
of this test. VP represents chains of compound processes, like those encoun-
tered when visualizing flow solutions as the simulation progresses. It comprises
three NAS Parallel Benchmarks (NPB) [117] problems, namely BT, MG, and FT,
which fulfill the role of flow solver, post processor, and visualization module, re-
spectively. This triplet is linked together into a logically pipelined process, where
subsequent flow solutions can be computed while postprocessing and visual-
ization of previous solutions are still in progress [116]. The red circular nodes
depict BT jobs, blue square nodes are MG jobs and black trapezium nodes are
FT jobs. All of them (BT, MG and FT ) are defined in the NGB benchmarks.
3.4.3 Performance evaluation
This section compares the following meta-scheduling schemes: (1) the original
GridWay meta-scheduler [10], which chooses the first discovered resource with
enough free CPU to execute a job (labeled as GW in the figures), (2) the GridWay
meta-scheduler using the CPU power to select resources (labeled as GW-MHZ), (3)
the network–aware GridWay extension presented in [112] (labeled as GW-Net),
(4) the autonomic network–aware meta-scheduler with ExS disabled (labeled as
3.4. Experiments and results 57
ANM), and (5) the autonomic network–aware meta-scheduler with ExS enabled
(labeled ANM-ExS). For the last two schemes, the CAC functionality has been
disabled in order to make a fair comparison with the other scheduling techniques
which do not have this feature, which means that it does not matter if the QoS
could be reached or not when accepting jobs.
GridWay has been used in several research articles to compare meta-sche-
duling techniques. Among others, Vazquez-Poletti et al. [118] present a com-
parison between EGEE and GridWay over EGEE resources. This comparison is
both theoretical and practical through the execution of a fusion physics plasma
application on the EGEE infrastructure, and shows the better performance of
GridWay over LCG-2 Resource Broker. Several theoretical comparisons, among
others [34] [119] [120] compare GridWay with other meta-scheduling techniques,
highlighting the fact that this is a valuable and versatile tool to manage dynamic
and heterogeneous Grids.
To evaluate the performance of the aforementioned scheduling techniques in
our environment, we emulate a workload similar to [46] by using the 3node test.
To do this, we simulate 5 different users. Each of them submits its jobs with
one type of scheduling technique. User requests consist of 1000 jobs of 3node
type with the parameters set as explained in Section 3.4.2. Results of these
submissions are presented in Figure 3.6. Figure 3.6 (a) represents the average
completion time of the 3node test with each scheduling technique. Figure 3.6 (b)
represents a boxplot about the average execution time of the 1000 3node ex-
ecutions. The last one is a convenient way of graphically depicting groups of
numerical data through their five-number summaries: the smallest observation
(minimum), lower quartile (Q1), median (Q2), upper quartile (Q3), and largest ob-
servation (maximum).
As we can see on both figures, the average completion time is lower when
using ANM and ANM-ExS, in spite of having the CAC functionality disabled. The
best results are obtained for ANM-ExS. This scheduling technique achieves a
time reduction of 26.11 % over GW, of 17,47 % over GW-MHZ and of 15.75 % over
GW-Net. Moreover, using the Exponential Smoothing predictions also results in
a gain of 9.35 % over the completion times obtained by ANM. Furthermore, as
Figure 3.6 (b) depicts, there are other important results that highlight the better
behavior of ANM-ExS technique. These are:
58 Chapter 3. Including Metrics to Improve QoS at the Meta-Scheduling Level
GW GW-MHZ GW-Net ANM ANM-ExS0
50
100
150
200
Com
ple
tion
Tim
e (
Secon
ds)
GW GW-MHZ GW-Net ANM ANM-ExS0
100
200
300
400
500
600
Com
ple
tion
Tim
e (
Secon
ds)
(a) Average Completion Time (b) Average Completion Time Boxplot
Figure 3.6. 3node test Average Time.
• Median time reduction: 33.92 % compared with GW, 24.16 % compared with
GW-MHZ and 26.62 % compared with GW-Net.
• Maximum time reduction: there exists a clear reduction obtained on this
metric, since ANM-ExS selects a resource whose behavior is more predictable.
Because of that, the probability of choosing a bad resource which delays
the execution is quite low. Thus, the maximum time reduction obtained is
of 53.94 % over GW, of 66.75 % over GW-MHZ and of 55.53 % over GW-Net.
Moreover, comparing ANM and ANM-ExS, the latter performs better since the
completion time predictions are more accurate when using Exponential Smooth-
ing and this makes the TOLERANCE value more reliable. Thus, ANM-ExS ob-
tains a reduction of 9.96 % over ANM for the median time, and of 53.61 % for
the maximum time. The worst case is therefore clearly improved by using the
ANM-ExS compared to other techniques.
It must be noted that for the GW scheduling technique, the box is narrower
since the selected resource is the first discovered resource (whenever possible).
Consequently, most of the jobs are executed on the same resource, so the time
needed to complete the executions is more uniform.
On the other hand, the resource usage is also improved by using autonomic
behavior. Information about resource usage is presented in Table 3.2. The first
line represents the percentage of resources used for executing the 3node test
with each type of scheduling technique. The second line shows the usage of the
3.4. Experiments and results 59
GW GW-MHZ GW-Net ANM ANM-ExS
% of used resources 25 % 38.4 % 50 % 75 % 62.5 %Maximum % of resource usage 79 % 42.5 % 32 % 42 % 46 %Minimum % of resource usage 21 % 1 % 12 % 1 % 8 %
Table 3.2. Percentage of resource usage by 3node tests.
most saturated resource – the one which each scheduling technique sends more
jobs to. Finally, the last line represents the load submitted to the least used
resource. There is a higher number of hosts used when ANM and ANM-ExS are
running (as the first row depicts). Also, the load is spread over the resources
in such a way that there are not overloaded resources (as second row depicts).
Finally, those resources whose behavior is not predictable are less used. From
these results it can be seen that the use of Exponential Smoothing improves the
predictability of the resources.
Despite the fact that ANM uses more resources than ANM-ExS (which may
make us think that ANM balances load more efficiently), this does not necessary
mean that ANM uses resources more efficiently. On the one hand, it could be
better to submit more jobs to a resource having a better behavior, even if the
load is not totally balanced. It may also not be advantageous to keep balancing
the load since it may mean that worse resources (i.e. those that do not have the
exactly desired capability) are used – rather than focusing our load on the best
resources. Furthermore, in some cases ANM may not be accurate enough about
its predictions and an unsuitable resource may be selected. This fact can be
deducted from the minimum percentage of resource usage for ANM, as Table 3.2
shows. In that case, the resource was selected because of its TOLERANCE.
However, the resource did not present such a predictable behavior since the
Exponential Smoothing function was not used.
Regarding the QoS perceived by the users, measured as the number of jobs
that are executed fulfilling the deadline set by the users, another experiment
has been conducted taking into account the previous results. In this case, we
enable the CAC system for the ANM-ExS (labeled ANM-ExS CAC) and compare its
results against the previous ones by setting different deadlines for the submitted
jobs. Figure 3.7 depicts the number of jobs that would not have fulfilled the QoS
if the deadline had been set to 300, 180 and 120 seconds, respectively. As the
other techniques do not have CAC capability, we use the information obtained
60 Chapter 3. Including Metrics to Improve QoS at the Meta-Scheduling Level
GW MHZ GW-Net ANM ANM-ExS ANM-ExS CAC0
20
40
60
80
100
% o
f Jo
bs w
hic
h n
ot
fulfi
ll t
he d
ead
lin
e
300 Sec. 180 Sec. 120 Sec.
Figure 3.7. 3node test. QoS not fulfilled.
in the previous test to know how many jobs would have finished on time. Hence,
we count the number of jobs for which the execution time was lower than the
deadline. It must be noted in the ANM-ExS CAC all the jobs rejected due to
the CAC algorithm are computed as jobs which not fulfill the QoS requirements.
The same information for both ANM techniques is also presented to highlight the
improvement obtained by using the CAC algorithm. As this figure depicts, for
a 300 seconds deadline, the differences are negligible, since almost all the jobs
can finish their execution before the deadline. In this case, the worst behavior is
presented by GW-MHZ due to the way in which resources are selected. Sometimes
resources with low network connectivity are selected, hence the time needed to
complete the transfers is high, which leads to missing the established deadline.
For a deadline of 180 seconds, it is again not useful to focus purely on CPU
speed. Moreover, it is also highlighted that the autonomic behavior is better con-
sidering the three cases that use it (ANM, ANM-ExS and ANM-ExS CAC). However,
ANM-ExS CAC seems to work worse than when the CAC is disabled. This is due
to the fact that there may be jobs that are not accepted since it is estimated that
their deadline cannot be met, which is not happening when CAC is disabled.
Additionally, there may be the case of jobs whose estimations for completion
time are a bit larger than the deadline (e.g., the estimation says 181 seconds
and the deadline is 180 seconds). These jobs are rejected when CAC is enabled.
However, when CAC is disabled, this job is executed, and its execution time may
be within the deadline.
Nonetheless, these scenarios do not involve deadlines that are hard to fulfill.
However, as shown in Figure 3.7, for a 120 seconds deadline the behavior is
3.4. Experiments and results 61
35 20 T_VP0
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
2400
2600
2800
3000
3200
Com
ple
tion
Tim
e (
Secon
ds)
GWMHZGW-NetANMANM-ExS
Figure 3.8. VP Test Average Completion Time.
different. In this case, it is really important to reject jobs when it is clear that
their QoS cannot be fulfilled. This way, there will be fewer executions, and it is
more likely that the remaining jobs finish on time. For these reasons, ANM-ExS
CAC shows an improvement of 34.7 % over ANM-ExS.
Next, an evaluation of the performance received by users emulating a more
realistic situation is presented. In this test, several jobs were submitted to the
Grid testbed during a long time interval. Moreover, jobs are submitted at the
same time for all the meta-scheduler configurations. Thus, there is a compet-
itive behavior among all the jobs submitted in all the tests. Furthermore, the
duration of this interval is not fixed and depends on the way tests are submit-
ted. Different cases have been analyzed, being more or less demanding over the
Grid environment. This test illustrates how different Grid workloads affect the
user performance. Three different user behaviors which imply different stresses
on the Grid system have been used, namely, case 1, in which VP tests (from the
NGB suit [116]) are submitted every 35 minutes; case 2, which consists in sub-
mitting VP tests every 20 minutes; and case 3, where one VP test is submitted
when the previous one has just already finished (labeled as T_VP). Hence, case
1 is the less stressing, and case 2 is the most stressing. For all the cases, 5 VP
tests were submitted.
In this experiment, the metric used for evaluating the performance obtained
by the user is the average completion time of all VPs. Figure 3.8 depicts the
results for each submission frequency for each scheduling technique, and Ta-
62 Chapter 3. Including Metrics to Improve QoS at the Meta-Scheduling Level
35 Min. 20 Min. T_VP Average
ANM-ExS vs GW 31.53 % 25.19 % 18.37 % 25.03 %
ANM-ExS vs GW-MHZ 17.09 % 21.72 % 5.21 % 14.67 %
ANM-ExS vs GW-Net 10.92 % 9.03 % 6.97 % 8.98 %
ANM-ExS vs ANM 8.6 % 4.8 % 3.27 % 5.56 %
Table 3.3. Percentage of improvement by using the autonomic implementationwith ExS (ANM-ExS).
ble 3.3 presents a summary of such information. As can be seen in Figure 3.8,
the best performance for all the submission frequencies are again obtained by
ANM and ANM-ExS, even taking into account that the CAC is disabled for a fairer
comparison. Hence, the stability of the behavior of a resource is the best way of
choosing the resource to run a job.
On the other hand, these results also highlight the usefulness of using the
Exponential Smoothing function to estimate the time needed to complete the
execution of a job. This leads to predictions which are more accurate and it is
possible to obtain an improved value of the resource TOLERANCE. Hence, the
resource selection process is better and the time needed to complete a job is
decreased. For these reasons, the ANM-ExS technique obtains the best results.
The main differences arise when ANM-ExS is used, although the largest dif-
ference is between using network–aware (GW-Net, ANM, and ANM-ExS) and non
network–aware techniques (GW and GW-MHZ) due to the fact that VP is very net-
work demanding. Moreover, the largest differences are obtained for 35 minutes
submission frequency, since at this rate there is less load in the Grid. There are
more free resources and it is possible to select a better resource owing to the fact
that the system has more idle resources to choose from.
To sum up, these results highlight the usefulness of using the TOLERANCE
parameter to perform better selection of resources, and consequently, in the
QoS delivered to users. Moreover, the benefits of using Exponential Smoothing
to predict the future status of resources are also illustrated. This way, we ob-
tain, on average, around 25 % completion time reduction compared with GW (as
presented in Table 3.3), around 15 % compared with GW-MHZ and almost 10 %
compared with the network–aware GridWay implementation. Furthermore, the
improvement provided by the use of Exponential Smoothing function means a
completion time reduction of more than 5 % over ANM.
3.4. Experiments and results 63
Figure 3.9. Resource Usage.
Finally, from a system point of view, the autonomic techniques (ANM-ExS and
ANM) also present better behavior as the workload is better balanced over the
resources. This can be seen in Figure 5.8, which depicts the percentage of jobs
submitted to each resource when using each technique. It must be noted that
when using GW or GW-MHZ, the resource usage is not balanced as resources are
selected based on the order in which they are discovered or their CPU speed
(which are static parameters), rather than just taking into account available ca-
pacity (such as percentage of free CPU) to be able to execute the incoming job.
Moreover, if GW-Net is used, the resource usage is also not balanced since only
the resources with better bandwidth are selected. However, when using auto-
nomic techniques (ANM-ExS and ANM), the scheduling of jobs is more balanced
since almost all the resources are used.
Additionally, some resources are also more used than others due to the fact
that they have a different performance. This means that from the TOLERANCE
point of view, they have a better behavior because they present a more pre-
dictable performance. This is specially true when ExS is used, since the time
needed to complete the jobs is better estimated due to the fact that it takes into
account the predicted status for the resource. This makes the resources us-
age slightly more balanced when using ExS. Hence, it is a better solution to
choose the more predictable resources with the aim of reducing the time needed
to complete the jobs.
64 Chapter 3. Including Metrics to Improve QoS at the Meta-Scheduling Level
3.5 Summary
This chapter presents a working implementation of an architecture which com-
bines concepts from Grid scheduling with autonomic computing [121] in order
to provide users with a more adaptive job management system. The architecture
involves consideration of the status of the network when reacting to changes in
the system – taking into account the load on computing resources and the net-
work links when making a scheduling decision. This architecture was originally
presented and tested by means of simulations in [46]. This work presents an
implementation based on GridWay [10] – an open source Grid meta-scheduler.
The architecture provides scheduling of jobs to computing resources so that
the network does not become overloaded. In order to perform the implementa-
tion of the autonomic network–aware meta-scheduler, a first step was the ex-
tension of GridWay meta-scheduler to make it network–aware. Subsequently,
the autonomic behavior is implemented by means of (1) adding a new attribute
TOLERANCE that GridWay uses to perform the filtering and sorting of re-
sources; and (2) performing Connection Admission Control (CAC). The term
TOLE− RANCE and the CAC were originally introduced and tested by means of
simulations in [46], but both had to undergo adaptation when used in the Grid-
Way implementation. Moreover, the use of Exponential Smoothing alongside the
CAC algorithm is a novelty of this work.
The main contributions of this chapter are: (1) an implementation of the
architecture to perform autonomic network–aware meta-scheduling based on
GridWay; (2) a scheduling technique that relies on ExS to predict the completion
time of jobs; (3) a performance evaluation carried out using a real testbed involv-
ing several workloads and heterogeneous resources from several organizations.
Several ways of performing the scheduling of jobs to computing resources
are evaluated, namely GW, GW-MHZ, GW-Net (presented in [112]), ANM (presented
in [46]) and ANM-ExS (novelty of this work). This evaluation uses different work-
loads and heterogeneous resources belonging to different organizations, showing
that the autonomic behavior based on Exponential Smoothing improves the per-
formance received by users and yields a better load balance among resources.
CHAPTER
4Adding Support for Meta-
Scheduling in Advance:
The SA-Layer
As it has been stated through all this dissertation, the provision of Quality of
Service (QoS) in Grid environments is still an open issue that needs attention
from the research community. One way of contributing to the management
of QoS in Grids is by performing meta-scheduling of jobs in advance, that is,
jobs are scheduled some time before they are actually executed. In this way, it
becomes more likely that the appropriate resources are available to run the job
when needed, so that QoS requirements of jobs are met (i.e. jobs are finished
within a deadline).
This chapter presents a framework built on top of Globus and the GridWay
meta-scheduler to improve QoS by means of performing meta-scheduling in
advance. This framework manages idle/busy periods of resources in order to
choose the most suitable resource for each job, and uses red–black trees for this
task. Furthermore, no prior knowledge on the duration of jobs is required, as
opposed to other works using similar techniques.
65
66 Chapter 4. Adding Support for Meta-Scheduling in Advance: The SA-Layer
This framework uses heuristics that consider the network as a first level re-
source, and presents an autonomous behavior so that it adapts to the dynamic
changes of the Grid resources. The autonomous behavior is obtained by means
of computing a trust value for each resource and performing job rescheduling in
case of resource failure. All this set of features make this framework suitable for
real Grids.
4.1 Introduction
The heterogeneous and distributed nature of the Grid along with the different
characteristics of applications complicate the brokering problem. To further
complicate matters, the meta-scheduler typically lacks total control and even
complete knowledge of the status of the resources. This poses a heavy challenge
for the provision of QoS.
Current scheduling systems adopt three different approaches to tackle these
problems [122]: scheduling based on just–in–time information [10] from Grid In-
formation System (GIS) [13], performance prediction [123], and dynamic resche-
duling at run time [124]. These approaches are not exclusive. For instance,
it is possible to use a mixture of several approaches like doing performance
prediction and dynamic rescheduling at run time. Getting resource static infor-
mation, such as CPU frequency, memory size, network bandwidth or file sys-
tem is feasible. But runtime information, such as CPU load, available memory,
and available network bandwidth, is more difficult to obtain. This is because
of performance fluctuation which in turn is due to contention among shared
resources.
One key idea to solve the scheduling problem is to ensure that a specific re-
source is available when a job requires it. This is the reason why reserving or
scheduling the use of resources in advance becomes essential. Reservation in
advance can be defined as a restrictive or limited delegation of a particular re-
source capacity for a defined time interval [51]. The objective of such reservation
in advance is to provide Quality of Service (QoS) by ensuring that a certain job
uses the resources it needs when they are requested. However, incorporating
such mechanisms into current Grid environments has proven to be a challeng-
4.1. Introduction 67
ing task due to the resulting resource fragmentation [38], in spite of enabling
QoS agreements with users and increase the predictability of a Grid system [58].
Our work is based on meta-scheduling in advance in Grids rather than reser-
vations in advance, as reservations may not always be possible. The meta-sche-
duling in advance algorithm can be defined as the first step of the reservations
in advance algorithm, in which the resources and the time periods to execute
the jobs are selected (and the system keeps track of the decisions already made
and the usage of resources) but making no physical reservation.
This chapter presents the next main contributions. First, a framework built
on top of Globus [9] and the GridWay meta-scheduler [10] to manage QoS by
means of performing meta-scheduling in advance is presented. The use of this
framework allows jobs to be executed within their deadlines. Second, a proposal
of a new autonomic network–aware algorithm to tackle the scheduling in advance
problems. Thereby, the heuristics presented are concerned with the dynamic be-
havior of the Grid resources, their usage, the variable availability of resources,
and the characteristics of the jobs. Hence, no prior knowledge on the job dura-
tion is considered, as opposite to [3]. Thus, estimations on the completion times
of jobs need to be calculated. The autonomous behavior is obtained by means of
computing a trust value for each resource and performing rescheduling of failed
jobs. The resource trust means the accuracy of the previous estimations made
for jobs executed in each resource. Third, heuristics to calculate predictions on
the completion time of jobs are presented, which consider the network as a first
level resource.
The chapter is organized as follows. In Section 4.2 a brief overview of the
meta-scheduling in advance problem is presented. Section 4.3 explains the
framework to perform meta-scheduling in advance, which is the main contri-
bution of this chapter, paying special attention to the blocks that implement the
key functionalities mentioned in the paragraph above. The prediction techniques
developed are detailed in Section 4.4 whilst Section 4.5 presents the experiments
carried out for evaluating these proposals. Finally, the summary of the chapter
is outlined in Section 4.6.
68 Chapter 4. Adding Support for Meta-Scheduling in Advance: The SA-Layer
4.2 Network–aware meta-scheduling in advance
Grid resources may vary dynamically as they may fail, join or leave the Grid
at any time. Moreover, this dynamism is also affected by the fact that every Grid
resource needs to execute local tasks as well as tasks from Grid applications.
It must be noted that from the Grid applications point of view, all the tasks
from both local users and Grid users are loads on the resource. Therefore,
everything in the system has to be evaluated by its influence on the execution of
the applications.
Owing to this fact, several projects have aimed at exploring advance reser-
vation of resources (among others, GARA [53], Grid Capacity Planning [54],
or VIOLA [55]), as they have been shown to increase the predictability of the
system [125]. A Grid reservation in advance process can be divided into two
steps [51]:
1. Meta-scheduling in advance: Selection of a resource to execute the job,
and the time period when the execution will be performed, but without any
physical reservation.
2. Negotiation for resource reservation: Consists on the physical reserva-
tion of the resources needed for the job, which may not always be possible.
There are two concepts: requesting a reservation and committing a reserva-
tion. A reservation request contains the start time and the requested length
of the reservation. For committing a reservation, the meta-scheduler up-
loads a commit message containing the job id. At this moment, the job
starts its execution in the previously selected computing resource.
Nevertheless, support for reservation in the underlying infrastructure is cur-
rently limited [53] [54] as they impose some performance penalty [56], typically
decreased resource utilization. Owning to these limitations that reservations in
advance present, this Thesis is based only on the first point, the meta-schedu-
ling in advance step, since reservations in advance may not always be possible
in a real Grid environment. Many resources cannot be reserved, due to the fact
that not all the Local Resource Management System (LRMS) permit them. Apart
from that, there are other type of resources such as bandwidth (e.g. the Inter-
net), which belong to several administrative domains making their reservation
4.2. Network–aware meta-scheduling in advance 69
more difficult or even impossible. This is the reason to perform meta-scheduling
in advance rather than advance reservations in order to address QoS in Grids.
This means that, the system keeps track of the meta-scheduling decisions al-
ready made in order to make future decisions without overlapping executions
but also without making any physical reservation. So, assuming a stable situa-
tion in which all the resources are available, if only Grid load exist, this would
be enough to manage QoS since the meta-scheduler would not overlap jobs on
resources. As many resources have a local load besides Grid load, monitoring
and prediction techniques are needed.
The algorithms for meta-scheduling in advance need to be efficient so that
they can adapt themselves to dynamic changes in resource availability and user
demand without affecting system and user performance. Moreover, they must
take into account resource heterogeneity as Grid environments are typically
highly heterogeneous. For this reason, it could be useful to employ techniques
from computational geometry to develop an efficient heterogeneity–aware sche-
duling algorithm [3]. In our research, the techniques proposed in [38] which
convert the information about resource usage into a geometric way are used to
select efficiently the resource and the time period which are suitable for meeting
the QoS requirements of the job. On the other hand, the network is also taken
into account in the meta-scheduling process since it has a major impact on jobs
performance, as studied in [46] [112] [126] [127], among others. Thereby, our
research focuses on low–cost computational heuristics to perform meta-schedu-
ling in advance that consider the network as a first level resource.
In this work we focus on applications where jobs do not have workflow de-
pendencies [11]. In this type of applications the user provides both the input
files and the application itself. However, with the start time and the deadline of
each job, we can set a specific workflow among jobs.
Taking into account these assumptions, an scheduling in advance process
follows the next steps (see Figure 4.1):
1. First, a user sends a request to the meta-scheduler at his local adminis-
trative domain. Every request must provide a tuple with information on
the application and the input QoS parameters: (in_file, app, t_s, d). in_file
stands for the input files required to execute the application, app. In this
70 Chapter 4. Adding Support for Meta-Scheduling in Advance: The SA-Layer
Figure 4.1. Meta-Scheduling in Advance Process
approach the input QoS parameters are just specified by the start time, t_s
(earliest time jobs can start to be executed), and the deadline, d (time by
which jobs must have been executed) [38].
2. The meta-scheduler communicates with the Gap Management entity and
executes a gap search algorithm. This algorithm obtains both the resource
and the time interval to be assigned for the execution of the job. The heuris-
tic algorithms presented here take into account the predicted state of the
resource (both for computational resources and interconnection networks),
the jobs that have already been scheduled and the QoS requirements of the
job.
3. If it is not possible to fulfill the user’s QoS requirements using the resources
of its own domain, communication with meta-schedulers from other do-
mains starts.
4. If it is still not possible to fulfill the QoS requirements, a renegotiation
process is started between the user and the meta-scheduler in order to
define new QoS requirements.
4.2. Network–aware meta-scheduling in advance 71
As it can be seen in Figure 4.1, there may be one or more meta-schedulers
in each domain. However, all of them have to communicate with the same Gap
Management, which is in charge of the resource usage of that domain. This
entity (Gap Management) should be replicated in order to avoid the single point
of failure problem. It is also possible to split the resources in the local domain
into several subdomains if needed when the number of resources in a domain
grows too high. So, this model is scalable as it is possible to assign some re-
sources to one Gap Management and other resources to another one. Each
meta-scheduler asks to its local Gap Management entity to allocate the jobs. In
this way, the Gap Management of each domain only has to maintain the status
of its own resources, whilst the meta-scheduler is the entity in charge of asking
other meta-schedulers in the case that it is not possible to allocate the job in the
local domain.
In the case that the local domain lacks the resources to allocate the job, a
communication with meta-schedulers of other domains starts, as the third step
depicts. In order to perform the inter-domain communications efficiently, tech-
niques based on Peer-to-Peer (P2P) systems (as proposed by [128] [129] [130] [131],
among others) can be used. In this way, the meta-scheduler at each domain
knows some of the meta-schedulers at other domains, and can forward jobs to
them when needed.
In the case that the QoS required by the user cannot be addressed (not even
in other domains), a renegotiation process may start (fourth step). As a result of
the renegotiation process, the user could resubmit the job with less strict QoS
requirements, or try again later with the same requirements, or just quit the
execution of the job.
This renegotiation, as well as the overall interaction with users, can be con-
ducted by means of Service Level Agreement (SLA). A scheme for advancing and
managing QoS attributes contained in Grid SLAs contracts, following the Open
Grid Service Architecture (OGSA), can be implemented. For instance, in [132] it is
introduced an Execution Management Service which collaborates with both the
application services and the network services in order to provide an adjustable
quality of the requested services. Hence, in the proposed framework, an imple-
mentation where the components that manage and control the job submissions
interact with an SLA-related service can be used for negotiating (and renegoti-
72 Chapter 4. Adding Support for Meta-Scheduling in Advance: The SA-Layer
Figure 4.2. Scheduling Order.
ating when needed) the desired QoS with the user who submitted the job. A
proposal based on these ideas with the aim of providing this capability to our
system, has been presented in [35] [133] [134].
It must be noted that, under this scenario, jobs are not executed in the order
they are scheduled, as Figure 4.2 depicts. This order depends on several facts,
such as the time constraints of jobs, the previous allocated jobs, the status of
the resources, and so forth. As it may be seen in Figure 4.2, the first submitted
job is not going to be executed first due to its start time restrictions. Those
restrictions lead to Job 1 being allocated into “resource A” at slots 7 to 12. After
that, Job 2 arrives, but its job start time restriction is “slot 1”. So, as it just
needs 5 slots to be executed in “resource A”, it may be executed before “Job 1”,
using slots 1 to 5. The same reasoning applies to “Job 5”.
On the other hand, in this meta-scheduling in advance process, the accu-
racy of available information regarding the resources is very important for the
scheduling tasks to be performed efficiently. However, the independence and
autonomy of domains is an obstacle due to the fact that some domains may
not want to share private information, such as information on the load of their
resources. Another obstacle is the resource contention in Grid environments
which causes host load and availability to vary over time. Hence, the prediction
of this information is a key parameter, even though it is quite difficult due to the
dynamic behavior of the Grid [68].
4.3. Meta-Scheduling in advance implementation 73
The prediction information can be derived in two ways [68]: application–
oriented and resource–oriented. For the application–oriented approaches, the
running time of Grid tasks is directly predicted by using information about
the application, such as the running time of previous similar tasks. For the
resource–oriented approaches, the future performance of a resource such as the
CPU load and availability is predicted by using the available information about
the resource, and then such predictions are used for predicting the running time
of a task, given the information on the resource requirement of the task.
In our case we use a mixture between these two approaches. We use applica-
tion–oriented approaches to sort out the execution time of the application and
resource–oriented approaches to calculate the time needed to perform the net-
work transfers and to tune the estimations made about job execution times.
4.3 Meta-Scheduling in advance implementation
One of the main contributions of this Thesis is the implementation of a frame-
work for network–aware meta-scheduling in advance, which is detailed in this
section. It is implemented as a layer on top of the GridWay meta-scheduler,
in the same way as Qu [57] described a method to overcome the shortcoming of
performance penalty by adding a Grid advance reservation manager on top of the
local scheduler(s). First, the structure of the framework is presented. Next, the
data structures used for managing this information, followed by the policies for
allocating jobs in resources are shown. Subsequently, the fault tolerance sup-
port is explained. Finally, the prediction techniques are discussed, within their
details on how the autonomic behavior of the framework has been implemented
by means of the aforementioned resource trust.
Our proposal is implemented as an extension to the GridWay meta-schedu-
ler [10], called Scheduling in Advance Layer (SA-Layer). This is an intermediate
layer between the users and the on-demand Grid meta-scheduler, as Figure 4.3
depicts. The SA-Layer is a modular component that uses functions provided by
GridWay in terms of resource discovery and monitoring, job submission and ex-
ecution monitoring, etc., and allows GridWay to perform network–aware meta-
scheduling in advance decisions. As the original parameters supported by Grid-
74 Chapter 4. Adding Support for Meta-Scheduling in Advance: The SA-Layer
Figure 4.3. The Scheduler in Advance Layer (SA-Layer).
Way do not consider the network condition, GridWay has been extended to in-
tegrate network status information into the meta-scheduling process [112], as
explained in Chapter 3.
The SA-Layer stores information in databases concerning previous applica-
tion executions (called DB Executions), and the status of resources and network
over time (called DB Resources). The memory overhead of the SA layer is negli-
gible. In fact, the information is stored in a compact way. For instance, it there
are two or more execution of an application with the same input paramenters,
the information about execution and transfers times is summarized and only the
average time is saved. In this way, those files only have summarized information
about the resources and the job execution times.
A new parameter has been added to GridWay’s job template, named JOB_IN-
FORMATION. In this new parameter the user may indicate some information about
the job. First, if the user knows the size of the files to transfer in order to both
start and finish the execution of the job. It must be highlighted that this informa-
tion is not compulsory. After that, the user may set other characteristics related
to the jobs, such as job arguments, which enable a more accurate prediction for
the job execution time.
On that purpose, the execution time of jobs in a given resource is estimated
by the Predictor module, taking into account the characteristics of the jobs, the
power of the CPU of the resources and the network future status. In addition,
4.3. Meta-Scheduling in advance implementation 75
the Resource Trust component calculates the trust in resources in order to tune
the predictions depending on the information about the accuracy of the latest
jobs execution estimations for those resources. By processing this information
about applications and resources, a more accurate estimation of the completion
time of the job in the different computational resources can be performed.
In this implementation, resource usage is divided into time slots, whose du-
ration is a customizable parameter. Then, we have to schedule the future usage
of resources by allocating jobs in resources at a specific time (taking one or more
time slots). For this reason, data structures to keep track of the usage of slots
are needed, along with allocation policies (carried out by Gap Management mod-
ule in Figure 4.3) to find the best slots for each job. Furthermore, the way how
the framework has been implemented avoids deadlocks, as once a job is sched-
uled to be executed at a specific resource, if its deadline expires, the job will be
dropped. Thus, if a job has time–slots assigned but cannot use them (e.g. the
local user is using the machine), this will not affect other jobs being submitted
to the same resource.
This framework also presents an autonomic behavior which allows it to adapt
itself to changes in the Grid system. This autonomic behavior is made of two
different functionalities, namely Resource Trust and Job Rescheduling. The main
characteristics of the SA-Layer components are explained next.
4.3.1 Gap Management
The Gap management module represents the information of the tree data struc-
ture in a geometrical way. This module is in charge of using and keeping up–
to–date the information stored in the data structure. Each job is represented by
a single point in the plane as Figure 4.4 depicts. Labeled points represent the
idle periods (gaps) with start and finish time. The job coordinates are starting
time and ending time. P represents the earliest start and end times, whilst P’
represent the latest ones, for the current job. Thus, the line between P and P’
represents the periods when this new job can be scheduled. All the points above
and to the right of this line represent possible gaps to allocate the job. Notice
that the job allocation influences how many jobs can be scheduled due to the
generated fragmentation. In this work, fragmentation refers to the free time slots
76 Chapter 4. Adding Support for Meta-Scheduling in Advance: The SA-Layer
Figure 4.4. Idle periods regions [3].
in between two consecutive allocations. Different ways of searching and allocat-
ing jobs into resources can be developed considering both the already scheduled
jobs and the generated fragmentation.
A First Fit policy has been considered, which selects the first free gap found
that fits the new job. It can create big fragmentation, as a result of which many
jobs may be rejected. There also exist other techniques like Best Fit. This last
policy selects the free gap which leaves less free time slots after allocating the
job. The fragments are smaller, but it is harder to use those free slots to allocate
new jobs.
Although Best Fit usually outperforms First Fit, it is more computationally
complex since all the resources must be searched to find the most suitable gap.
As opposed to it, First Fit does not have to search all the resources, which makes
it more scalable. Furthermore, this Thesis has also worked on re-scheduling
techniques that can fix the fragmentation created by First Fit and enhance the
overall performance of the system.
On the other hand, the idle periods are split into subsets (the strips in the
Figure 4.4) enabling the natural implementation of a variety of strategies for
selecting one among multiple feasible idle periods and reduce the complexity of
the gap selection. There is a red–black tree for each strip. In this way, only the
gaps within the limits of each strip are in the tree of that strip. The size of these
strips is a customizable parameter. If this parameter has a high value, there will
be a great number of gaps per tree and fewer number of trees. By contrast, if
4.3. Meta-Scheduling in advance implementation 77
this parameter has a low value, there will be a greater number of trees but with
fewer number of gaps.
Apart from that, Castillo explains in [38] that the trees can be divided into
two regions, named R1 and R2, as Figure 4.4 depicts. R1 region represents the
gaps which start at or before the job’s ready time. Therefore, any idle period
in this region can accommodate the new job without delaying its execution. R2
region represents the gaps which start later than the job’s ready time.
It is important to recall that a job scheduled in an idle period will create at
most two new idle periods: one between the beginning of the gap and the start
of the job (the leading idle period), and one after the end of the job and the end
of the original idle period (the trailing idle period). Consequently, the leading idle
period will have zero length at any point in the region R2, since the start time of
this gap is later than the job start time. Thus, R2 region is searched first in order
to reduce the generated fragmentation. Note also that the later the starting time
of a gap is, the longer the execution of the new job will be delayed. So that, this
region is searched from top to bottom in order to minimize the job turnaround
time. On the other hand, if there is not any available gap in R2 region, then a
feasible gap is searched in R1 region from bottom to top. Again, the reason is to
generate less fragmentation.
4.3.2 Data Structure
One of the most important aspects in the registration of the resource free time
slots is the data structure used. So, the scheduling technique presented here
needs to have a suitable data structure to be able to manage all the information
efficiently. A suitable data structure yields better execution times and reduces
the complexity of algorithms. Furthermore, the data structure will also influence
on the scalability of those algorithm.
There are several structures for managing this information needed by the
scheduler, and a survey can be found in [37]. For instance, Grid Advanced
Reservation Queue (GarQ) [37] is a combination of Calendar Queue [61] and
Segment Tree [62], for administering reservations efficiently. In this work, red–
black trees are used as they provide us with efficient access to the information
about resource usage, as has been demonstrated in [38]. This data structure
78 Chapter 4. Adding Support for Meta-Scheduling in Advance: The SA-Layer
Figure 4.5. Example of a red–black tree.
stores the free time periods of each resource in contrast to GarQ which stores
the information about the reservations made. As a result, when the number of
jobs submitted is high, the number of free time periods would be lower.
That is the reason why the data structure used in this work is red–black
trees [135]. The objective of using this type of trees is to develop techniques that
efficiently identify feasible idle periods for each arriving job request, without
having to examine all idle periods [3]. A red–black tree [135] is a special type of
binary tree where each node has a color attribute – which can be either red or
black (see Figure 4.5). This kind of trees has additional requirements over the
ordinary requirements imposed on binary search trees.
These constraints enforce a critical property of red–black trees: the longest
path from the root to any leaf is no more than twice as long as the shortest path
from the root to any other leaf in that tree. So, these trees are roughly balanced,
and as a result of that, inserting, deleting and finding values require worst–case
time proportional to the height of the tree (O(log n)). Thereby, this theoretical
upper bound on the height allows red–black trees to be efficient in the worst–
case, unlike ordinary binary search trees. Besides, this data structure is more
scalable since it is possible to have several red–black trees, each one keeping
information about the resources usage of a certain time period (the strips in
Figure 4.4).
The red–black trees used in this framework differ from [3] in the informa-
tion stored in the leaves, namely the mean instead of the median. So, a low
computational cost implementation is obtained for a real Grid system.
4.3. Meta-Scheduling in advance implementation 79
4.3.3 Job Migration
In order to make the system fault tolerance, the SA-Layer system also needs
a mechanism to deal with resource failures (whatever the problem could be)
to try to build a reliable system. This feature is very important in Grids as
resources may join and leave the Grid at any time, and failures of resources are
the rule rather than the exception. Hence they should be taken into account
in order to provide a reliable service [136]. This feature improves the autonomic
behavior of the framework because of the variable availability of resources. Thus,
when a resource quits the system (e.g. the resource fails or it is shutdown), the
jobs scheduled on it (including currently running jobs) have to be reallocated to
other hosts. The way how jobs are rescheduled is the same as when they were
submitted the first time.
This task is performed by the Job Migrator module (see Figure 4.3). This
module is in charge of monitoring currently active resources and it is checked
every time slot. When this module detects that a resource is no longer available,
it performs the next steps:
1. Select the jobs scheduled for the unavailable resource.
2. Delete those scheduling decisions and releases the reserved time slots.
3. Re-schedule the jobs to other resources whenever possible (by ussing the
Job Rescheduler module).
As it can be seen, this module is highly related to the Job Rescheduler mod-
ule, which is explained in next chapter (Section 5.3).
4.3.4 Predictor
This type of scheduling needs to know the job duration into resources and its
waiting times in queues. To this end, there are works which aim at estimat-
ing this queue waiting times to be able to make the scheduling process right.
For instance, Queue Bounds Estimation from Time Series (QBETS) [63], can es-
timate the probability that a job will wait no longer than startDeadline minutes
if it is submitted at time T . Based on this, the Virtual Advance Reservations
80 Chapter 4. Adding Support for Meta-Scheduling in Advance: The SA-Layer
Queues (VARQ) [64] implements a reservation by determining when (according
to predictions made by QBETS) a job should be submitted to a batch queue so
as to ensure it will be running at a particular point in future time.
On the other hand, scheduling in advance needs to perform predictions about
the future network status and about job duration into resources. A survey of
some predictions techniques can be found in [65]. Techniques for such predic-
tions include applying statistical models to previous executions [66] and heuris-
tics based on job and resource characteristics [46] [67]. In [66], it is shown
that although load exhibits complex properties, it is still consistently predictable
from past behavior. In [67], an evaluation of various linear time series models
for prediction of CPU loads in the future is presented.
However, predictions of job execution time are quite difficult to obtain since
there are performance differences between Grid resources [68]. Furthermore,
their performance characteristics may vary for different applications (e.g. re-
source A may execute an application P faster than resource B, but resource B
may execute application Q faster than A). However, as many resources have a
local load besides Grid load, monitoring and prediction techniques are needed.
In this Thesis (a) application–oriented approaches are used to sort out the
execution time of the application. This means that the running time of Grid
tasks is directly predicted by using information about the applications, such
as the running time of previous similar tasks. Moreover, (b) resource–oriented
approaches are also used to calculate the time needed to perform the network
transfers. This means that the future performance of a resource is predicted (by
using the available information about the resource) and used for predicting the
running time of a task. Then, (c) again a resource–oriented technique is used
to tune the predictions. However, to do that, some information about status of
resources and previous job executions need to be stored. For these reasons, in
the SA-Layer there are two databases which store the information about pre-
vious resource status (DB Resources) and about previous job executions (DB
Executions).
In this way, the algorithm proposed by Castillo [3] is extended to take into ac-
count the heterogeneity of Grid resources by means of prediction techniques. A
straightforward implementation of this algorithm is used as a comparison model.
4.4. Prediction Techniques 81
Then, different extensions to Castillo algorithm have been implemented within
the proposed framework in an incremental way with the aim of making the job
durations estimations as much accurate as possible. First, we have developed
an implementation which calculates the total completion time of jobs in each
resource based on log data from previous executions (named Total Completion
Time (TCT)). In a second phase, that implementation has been modified to esti-
mate the total completion time of jobs by taking into account the times needed
for executing the jobs and the time needed for transferring the files separately
(named Execution and Transfer Time Separately (ETTS)). Third, based on the sec-
ond implementation, we have implemented some heuristics to take into account
the resource trust to tune the prediction on the execution time of jobs (named
Resource Trust (RT)). Finally, based on the previous one, we have developed
a technique to make predictions about the future status of the Grid resources
(network inclusive) based on exponential smoothing functions (named Exponen-
tial Smoothing (ExS)). This information is used for adjusting the job duration
estimations by considering the possible future status of resources and intercon-
nection networks. The main points of these prediction heuristics are explained
in next section.
4.4 Prediction Techniques
Before explain in detail each techniques, it is important to highlight that, for all
the implementations, predictions are only calculated when a suitable gap has
been found in the host. In this way, there is no need to calculate the prediction
times for all the hosts in the system – which would be quite inefficient. Also
note that two applications are considered to belong to the same application type
when they have the same input and output parameters – in terms of number,
type and size.
One of the main advantages of all the techniques presented here is that they
pay attention to the heterogeneity of Grid resources and do not assume that
users have prior knowledge on the duration of jobs. Therefore, the information
needed to estimate the times (completion times, execution times and transfer
times) is stored in the Databases (DBs). The information related to previous
executions of the applications is stored in DB_Executions, and includes the com-
82 Chapter 4. Adding Support for Meta-Scheduling in Advance: The SA-Layer
pletion time, the execution time, and the input and output transfer times. If
there is not any historical data about the execution of a certain application, a
defect time value is assigned to this first execution of this application in the case
that the user does not provide any information about that. As a result of that, at
least one execution of each type of application needs to have been executed to be
able to obtain a somehow reliable prediction. On the other hand, DB_Resources
stores the information related to the previous status of the resources (network
bandwidth included). This data base keeps a trace of the status of the CPU, the
RAM memory, and the bandwidth.
With the aim of clarifying the notation of the following algorithms, a summary
of the common notation used is outlined next, being:
• R: set of resources known {ri / i in [1..n] }
• j = the job to be executed
• n = the number of samples of the completion times for the application j in
the resource ri
• tricompletion(j)k = the k − th completion time for the application j in the re-
source ri
• triexecution(j)k = the k− th execution time for the application j in the resource
ri
• triestimated(j) = the execution time estimation for the application j in the re-
source ri
• trireal(j) = the real execution time for the application j in the resource ri
• initT = the start time of the job
• d = the deadline for the job
• size = the number of bytes to be transferred
• sizeIN = the number of input bytes to be transferred
• sizeOUT = the number of output bytes to be transferred
• Bw(ri, t) = the bandwidth for the resource ri for the t minute of the day
before
4.4. Prediction Techniques 83
Algorithm 5 Total Completion Time (TCT)
1: for each ri having a gap do
2: TCTri =∑
nk=1
tricompletion
(j)k
n
3: end for
• RT (ri) the trust value for the resource ri
• RT rij the total time to execute j in resource ri
• CPU_free(ri, initT, d) the mean percentage of free CPU in the resource ri
between initT and the deadline d, calculated by using the Exponential
Smoothing (ExS) function
• Overloadri : the extra time needed due to the CPU usage at the chosen re-
source ri
4.4.1 TCT Technique
The first implementation mentioned above takes into account the heterogeneity
of Grid resources by calculating the total completion time of jobs in each re-
source based on log data about past executions. Because of this, it is called
Total Completion Time (TCT). These estimations consider transfer and execution
times (including execution and queueing times) altogether.
This implementation is explained in Algorithm 5. It shows that, for each
resource having a gap in which the application j can be executed (line 1), an
estimation on the completion time of that application on that resource is calcu-
lated. This estimation is calculated following line 2 and works as follows. An
estimation of the next execution of the application j in the resource ri is calcu-
lated using the stored completion times of the executions of j in ri (which are
stored in the module DB_Executions), by working out the average of all of these
stored executions.
4.4.2 ETTS Technique
The second implementation considers the times needed to execute the jobs and
the time needed to transfer the files separately [110]. Thus, it is called Execution
and Transfer Time Separately (ETTS), and is explained in Algorithm 6. As before,
84 Chapter 4. Adding Support for Meta-Scheduling in Advance: The SA-Layer
Algorithm 6 Execution and Transfer Times Separately (ETTS)
1: for each ri having a gap do2: Prolog = TransT_Estimation(ri, initT, d, sizeIN )3: Epilog = TransT_Estimation(ri, initT, d, sizeOUT )4: ExecT = ExecT_Estimation(j, ri)5: ETTSri = Prolog + ExecT + Epilog
6: end for
Algorithm 7 Estimation of execution time (ExecT_Estimation)
1: ExecT_Estimation =∑
nk=1
triexecution
(j)kn
2: return ExecT_Estimation
Algorithm 8 Estimation of transfer time (TransT_Estimation)
1: MeanBw =∑
dt=initT
Bw(ri,t)
(d−initT )
2: TransT_Estimation = sizeMeanBw
3: return TransT_Estimation
resources are explored in the seek of a suitable gap for the execution of this job,
and for each gap found (line 1), estimations on the transfer times of the input
files (line 2), output files (line 3), and execution time (line 4) in the resource ri
having the gap are calculated. After that, the prediction on the total completion
time of the job in the resource ri is calculated by adding the aforementioned
estimations (line 5).
For the execution time, an estimation is calculated using information of pre-
vious executions, as it is depicted in Algorithm 7. This algorithm uses all
the execution times records in the database (which are stored in the module
DB_Executions) for the application j in a resource ri to calculate the mean exe-
cution time for j in ri – this includes execution and queueing times.
The way of calculating the transfer times is outlined in Algorithm 8. In this
case, the mean bandwidth of the day before for the time interval where the job
will be allocated is calculated (line 1). The measures for this time interval are
stored in DB_Resources. In this work, the day before log is used for calculating
the future bandwidth because, as presented in [66], the future status is pre-
dictable from past behaviors, and it is an easy technique that works well enough.
Moreover, these estimations leave a margin (they are overestimated 20 % more)
in order to ensure that the transfers can finish on time if the predictions are not
enough accurate. The time needed to complete the transfers is estimated using
this information, along with the total number of bytes to transfer (line 2).
4.4. Prediction Techniques 85
On the other hand, estimating those times into a separated way give us the
possibility of overlapping network transfers and job execution times into re-
sources. This means that when a resource is executing a job, the files needed
by the next job to be executed in that resource can be being transferred too.
4.4.3 RT Technique
Predictions of job execution times could be inaccurate as resource performance
can quickly change depending on use (owner user load and/or other Grid user
loads). So, the system needs to react to those changes in performance to be
able to predict the job times as accurately as possible. That is the reason why
our system takes into account the previous resource performance where the
executions will take place in order to retune those previous predictions about job
execution times. Accordingly, the system has to be able to autonomously tune
the predictions calculated for the execution time of jobs by using the information
about latest N errors made in predictions for each resource.
This implementation is based on ETTS but tuning the predictions on the ex-
ecution time of jobs by means of obtaining how reliable is the resource perfor-
mance, named Resource Trust. This resource trust is calculated by using the
Equation 4.1,
RT (ri) =
∑nj=(n−N)(t
riestimated(j)− trireal(j))
N(4.1)
being RT (ri) the trust in the resource ri; triestimated(j) is the job execution time
estimation made for the jth jobs execution in resource ri; and trireal(j) is the real
execution time of job jth in resource ri.
As a result, the confidence in the estimations depends on how trust–worthy
is the resource in which the job will be run in the moment when the scheduling
process takes place. This resource trust may be different from the trust in the
moment when the job is executed. This function takes into account the last N
execution times and their predictions in a specific resource ri to calculate the
trust in that specific resource, ri. The output of this function is the mean of the
errors made in those N predictions and it is used to tune the prediction made for
86 Chapter 4. Adding Support for Meta-Scheduling in Advance: The SA-Layer
Algorithm 9 ETTS extended with Resource Trust (RT)
1: for each ri having a gap do2: Prolog = TransT_Estimation(ri, initT, d, sizeIN )3: Epilog = TransT_Estimation(ri, initT, d, sizeOUT )4: ExecT = ExecT_Estimation(ri, j)5: if RT (ri) < 0 then6: ExecT = ExecT + |RT (ri)|7: end if8: RT ri
j = Prolog + ExecT + Epilog
9: end for10: return RT ri
j
the job execution times in that resource. N is a customizable parameter which
depends on the how long in the past we want to go when estimating the recent
resource behavior. It must not be a high value as we want to measure just the
last performance. Nevertheless, it must also not be too low since this may lead
to a wrong measurement of the resource performance. Based on that, an after
a tuning process, we have set this value to 20, meaning a sample of how each
resource have been behaving in the last 20 minutes to 1 hour, depending on the
total Grid load.
This implementation is called RT and is detailed in Algorithm 9. Estimations
on execution and transfer times are calculated in the same way as explained
for ETTS (using Algorithms 7 and 8). With this information and the information
about trust in resource ri, labeled as RT (ri), the execution time is tuned (line 6)
and an estimation for the total completion time of the job, RT rij , is calculated
(line 8).
4.4.4 ExS Technique
Finally, we have developed another improvement to the previous predictions
techniques. Again, the predictions for the duration of jobs are calculated by
estimating the execution time of the job and the time needed to complete the
transfers separately, as it has been demonstrated that they yield better result
than performing predictions for execution and transfers time altogether [110].
The new improvement is based on an exponential smoothing function which
is used to tune the predictions of both the execution time of the job and the
4.4. Prediction Techniques 87
Algorithm 10 Estimation of Execution Time (ExS Estimation)
1: ExecutionT ime =∑
nk=1
triexecution
(j)kn
2: Overloadri = ExecutionT ime ∗ (1− CPU_free(ri, initT, d))3: ExecutionT ime = ExecutionT ime+Overloadri4: return ExecutionT ime
network transfers times. This exponential smoothing function is detailed in next
subsection.
Regarding predictions on the execution times, they are performed as ex-
plained in Algorithm 10. An estimation for the execution time is calculated as
the average of previous executions (line 1). After that, the prediction on the fu-
ture status of the CPU of each resource is calculated by means of an exponential
smoothing function, as well as the overload generated due to the resource status
(line 2). Finally, the mean execution time is tuned by using the prediction about
the future CPU status of each resource (line 3).
With regard to transfer times, the mean bandwidth within the time period
between the start time of the job and its deadline is also calculated by using an
exponential smoothing function. Again, using this information along with the
total number of bytes to transfer, the time needed to complete the transfers is
estimated. The next section details the exponential smoothing functions used in
this work.
Finally, when both times have been estimated, the last behavior of the chosen
resource (resource trust) is calculated following Equation 4.1 and it is used in
the same way as in Algorithm 9 to tune the final execution time estimated for
the job.
4.4.5 Exponential smoothing predictions
Even though the development of statistical techniques is out of the scope of this
dissertation, we have used and adapted the ExS prediction method to calcu-
late predictions on the status of resources. So, to better understand the ExS
technique we detail it next.
Exponential Smoothing (ExS) [105] is a statistical technique for detecting sig-
nificant changes in data by ignoring the fluctuations irrelevant to the purpose
88 Chapter 4. Adding Support for Meta-Scheduling in Advance: The SA-Layer
at hand. It provides a simple prediction method based on both historical and
current data [137]. In ExS (as opposed to moving average smoothing) older data
is given progressively–less relative weight (importance) whereas newer data is
given progressively–greater weight. In this way, ExS assigns exponentially de-
creasing weights as the observations get older. Hence, recent observations are
given relatively more weight in forecasting that the older ones.
ExS is a procedure for continually revising a forecast in the light of more
recent experience and it is employed in making short-term forecasts. There are
several types of ExS’s. In this work a triple exponential smoothing is used,
which is also named Holt–Winters [105]. With this kind of ExS the trend and
seasonality of data are taken into account for the predictions. Trend refers to
the long term patterns of data, whilst seasonality is defined to be the tendency
of time-series data to exhibit behavior that repeats itself every L periods. Such
trends are apparent when a user wants to execute a large, complex simulation
on a Grid at periodic intervals (such as for analyzing sales data or running a
scientific experiment with new observations). Conversely, at vacation periods,
probably not all the staff may take vacation at the same time, and the load on the
resources would be progressively decreasing. We have chosen ExS because our
data is likely to present both behaviors. ExS also provides a simple and efficient
method which can be implemented without slowing down the performance of the
system.
Regarding seasonality, it is likely that CPU availability increase at night, even
more in our scenario in which resources are shared with other users. Similarly,
resource usage may also depend on the week day being considered, with greater
workload during the week and greater availability at weekends. Thus, data col-
lection and analysis needs to be run at different times of the day to make an
accurate prediction about resource status. To this end, a weekly log is used
as input to the ExS function for predicting the future status of network and re-
sources for the next whole day, as we are then able to account for both seasonal
behaviors. In our approach, predicted information is updated every 30 minutes
to improve the results from time to time depending on the new knowledge ob-
served from the recent resource and network behaviors. The forecasting method
used is presented in Equation 4.2. At the end of a time period t and xt being the
observed value of the time series at time t (in our case, the CPU usage), ft+m is
4.4. Prediction Techniques 89
the forecasted value for m periods ahead, Tt is the trend of the time series, Lt is
the deseasonalized level and St is the seasonal component.
ExS =deadline∑
m=initT
ft+m =deadline∑
m=initT
(Lt + Tt ∗ (m+ 1))St+m−L (4.2)
Lt = α ∗xt
St−L
+ (1− α) ∗ (Lt−1 + Tt−1) (4.3)
Tt = β ∗ (Lt − Lt−1) + (1− β) ∗ Tt−1 (4.4)
St = γ ∗xt
Lt
+ (1− γ) ∗ St−L (4.5)
The deseasonalized level (Lt) is calculated as shown in Equation 4.3, tak-
ing into account the previous values obtained for trend and seasonality and the
actual value observed. The new trend of the time series (Tt) is the smoothed
difference between two successive estimations of the deseasonalized level as de-
scribed in Equation 4.4. Finally, the seasonal component (St) is calculated using
Equation 4.5. This expression contains a combination of the most recently ob-
served seasonal factors given by the demand xt, divided by the deseasonalized
series level estimate (Lt) and the previous best seasonal factor estimate for this
time period. Thus, seasonality indicates how much this period typically deviates
from the period (in our case weekly) average. At least one full season of data is
required for the computation of seasonality.
In the equations, α, β and γ are constants that must be estimated in such
a way that the mean square error is minimized. These weights are called the
smoothing constants. For each component (level, trend, seasonal) there is a
smoothing constant that falls between zero and one. It is important to set a
correct value for them to predict the behavior of resources and network as ac-
90 Chapter 4. Adding Support for Meta-Scheduling in Advance: The SA-Layer
curately as possible. In our work, the R program [138] is used for calculating
these parameters. We need to use at least a two week log, divided into weekly
data sets. Using the first of these data sets (one week log), the R program tries
to estimate these values for the last week. These results are then compared with
the real status registered for the following week, and the α, β and γ values are
adjusted to minimize the mean square error.
4.5 Evaluation
This section describes both the experiments conducted and the results ob-
tained in order to test the usefulness of the SA-Layer in a real Grid environment.
In this section, a comparison between the SA-Layer with the four techniques
of calculating the job completion time detailed before and a straightforward im-
plementation of the algorithm proposed by Castillo et al. in [3] is presented. The
techniques to calculate the job completion times are (1) estimations on the To-
tal Completion Time of jobs (labeled as TCT); (2) estimations on Execution and
Transfer Time Separately (labeled as ETTS); (3) ETTS extended with Resource
Trust (labeled as RT); (4) and RT extended with exponential smoothing functions
that predict the future status of resources and network (labeled as ExS).
As state above, they are compared with the algorithm proposed by Castillo et
al. [3] (labeled as Castillo). This straightforward implementation is based on a
linear function which estimate the time needed to complete a job in a resources
taking into account its inputs parameters. It does not take into account the
different resource performance over time, only the input and output parameters
of the job and the knowledge about its behavior. Hence, using this kind of
estimation to predict the job execution time, all the predictions for the execution
times of an application with the same parameters into a specific resource is going
to obtain the same value, without taking into consideration the current or future
resource status. To evaluate the performance of those techniques of estimating
the time needed to complete the job, several statistics are used.
• Scheduled job rate: fraction of accepted jobs, i.e., those whose deadline can
be met [38].
4.5. Evaluation 91
• QoS not fulfilled: means the number of rejected jobs, plus the number
of jobs that were initially accepted but their executions were eventually
delayed. Thus, their QoS agreements were not fulfilled (e.g. the deadline
was not met).
• Overlap: records the number of minutes that a job execution is extended
over the calculated estimation.
• Waste: records the number of minutes that are not used to execute any job
because the predicted execution time of jobs was longer than their actual
execution time.
The first two statistics are measures from the QoS perceived by the user.
Recall that in this work, QoS not fulfilled includes rejected jobs since they are
jobs which have not been executed with the QoS requested and the QoS specified
for each job is always reasonable in this experiment. The job QoS requirements
could be more or less strict but always (if the system is empty) the job could
be scheduled and executed meeting the QoS requested. On the other hand, the
last two statistics are measures of system performance (meta-scheduler point of
view), and they are related to the accuracy of predictions.
4.5.1 Testbed
The evaluation of the implemented framework has been carried out in the same
real Grid environment described in Chapter 3 at Section 3.4.1.
4.5.2 Workload
As in the previous chapter, one of the GRASP [115] benchmarks, named 3node,
has been run to evaluate the implementation. Again, the reason to select this
test was its versatility at generating different kind of jobs (computational inten-
sive and network demanding) in an easy way.
However, there are other important parameters that must be considered in
the workload when measuring the performance of the framework, as can be seen
in Figure 4.6. T_max reservation represents how far in advance we can schedule
a job. T_Execi is the time needed to execute the job i. Scheduling Window shows
92 Chapter 4. Adding Support for Meta-Scheduling in Advance: The SA-Layer
Figure 4.6. Workload characteristic.
the time interval in which the job has to be scheduled. Taking the last two
parameters together we can obtain the Laxity which represents how strict the
user is when scheduling a job. It means the difference between the Scheduling
Window and the T_Exec for a job. Finally, Arrival Ratio depicts the average time
between two jobs sent.
For this evaluation, both the compute_scale and the output_scale take values
between 0 and 20, being the average 10. The input file size is 100 MB, and these
values of output_scale create output files whose size is between 100 MB and
2 GB. Both values are related to T_Exec of Figure 4.6 (the greater they are, the
longer the jobs execution will be). Thanks to this, both compute and/or data in-
tensive applications are created, which are mixed in the submission. Therefore,
the different jobs of the workload have different QoS requirements and different
behaviors. Recall that in this work we focus on applications where users provide
both the input files and the application itself.
On the other hand, the T_max reservation is up to 1 hour, with an average
of 30 minutes. The Laxity is set between 0 and 10 minutes, being the average
5 minutes. In this way, users always make a request to run a job with a reason-
able QoS as the laxity is never negative. The submission rate is from 1 to 6 jobs
per minute (following a random uniform distribution) and the total submission
time takes 1 hour. Finally, time slots last 1 minute and each strip last 10 min-
utes (there is a tree for every 10 minutes, with the aim of not having big trees
with may delay the searching techniques).
As jobs executed in a Grid environment have a delay in their executions (set-
ting the environments, communication between resources, transfer input and
output files, execution itself, . . . ), an execution that longs less than a minute is
not quite common. Therefore, having slots of 1 minute means sufficient granu-
4.5. Evaluation 93
0
20
40
60
80
100
1 2 3 4 5 6
% o
f S
ch
ed
ule
d J
ob
s
Number of Submitted Jobs per minute
CastilloTCT
ETTSRT
ExS
(a) Jobs Scheduled
0
5
10
15
20
25
30
35
40
45
1 2 3 4 5 6
% o
f Q
oS
No
t F
ulfill
ed
Number of Submitted Jobs per minute
CastilloTCT
ETTSRT
ExS
(b) QoS not Fulfilled
Figure 4.7. Comparison of the different estimation techniques from the Users’ view-point.
larity for Grid environments. Moreover, it also leaves some margin to the predic-
tions in case of inaccuracies. The results shown are the average of 5 executions
for each case.
4.5.3 Experiments and results
Results from user point of view are depicted in Figure 4.7 and 4.8, whilst the
result from the system viewpoint are depicted in Figure 4.9. First, Figure 4.7 (a)
94 Chapter 4. Adding Support for Meta-Scheduling in Advance: The SA-Layer
represents the percentage of scheduled jobs – the meta-scheduler has enough
free slots to allocate them, meeting the QoS requirements. The average, maxi-
mum and minimum obtained results are also plotted. It depicts that the more
jobs there are in the system, the more lost jobs there are. All the algorithms
have a similar behavior at low loads. The differences appear when the system
load is higher and more jobs are rejected for all the techniques.
When using different estimations for the transfers and for the execution itself
(ETTS, RT and ExS), the number of accepted jobs is noticeable higher then when
making just estimations for the total job duration (Castillo and TCT). Among the
techniques that present the better behavior, there is not large differences. As
RT and ExS take into consideration the lastly resource performance, and the
second one also the predicted status for the network and for the computational
resource, they make more accuracy estimations about jobs duration in spite of
accepting a slightly lower number of jobs. This will also lead to lower jobs not
meeting their QoS requirements after having been accepted to be executed (as
Figure 4.8 depicts).
Figure 4.7 (b) shows the percentage of jobs that were not executed with the
QoS requested, including lost jobs and jobs completed beyond the agreed dead-
line. Again, the more jobs there are in the system, the more jobs not executed
with the requested QoS there are. For lower submission rates (1 and 2 jobs
per minute), it is not essential to make separate estimations for executions and
transfer times since there are enough free slots for allocating most of the jobs.
Although better results are again obtained for the metrics which take into ac-
count that the resources present different performance over time.
However, for higher submission rates (3 jobs per minute onwards) the pre-
diction algorithm becomes very important. As Figure 4.7 (b) shows, a noticeable
reduction in the number of lost jobs is achieved when making separately predic-
tions for transfers and job execution (ETTS, RT and ExS), and it is more high-
lighted when the different resource performance over time is taken into account
(RT and ExS).
From those plots it may be deducted that ExS presents the best results from
the users’ point of view. In spite of not accepting as many jobs as ETTS or RT,
ExS performs more accurate estimations and it is less likely that accepted jobs
4.5. Evaluation 95
0
5
10
15
20
1 2 3 4 5 6
% o
f Q
oS
No
t F
ulfill
ed
pe
r A
cce
pte
d J
ob
Number of Submitted Jobs per minute
ETTS RT ExS
Figure 4.8. QoS not fulfilled per accepted Job.
do not fulfill the agreed QoS. For low loads, there is not large differences among
them since not so many jobs fails after being accepted owing to the fact that
the system is not very overloaded. However, when the system load is higher,
giving some extra time for the execution of the jobs (number of slots reserved)
depending on the predicted status of resources, as ExS actually does, leads to
few jobs failing its QoS requirements. Then, in spite of accepting a slightly
lower number of jobs, the predictions are more accurate and it is less likely that
they need more time to finish their executions. Hence, it is a more conservative
technique, but it also presents a more reliable behavior regarding QoS provided
per accepted job, as Figure 4.8 depicts. That figure highlights the accuracy of the
predictions made by the ExS, presenting more uniform behavior regarding the
percentage of accepted jobs that finally do not fulfill the agreed QoS regardless
the number of jobs submitted per minute. In fact, this percentage is quite small
compared with the other two techniques (60 % reduction on average regarding
RT and 80 % regarding ETTS).
It must also be noted that ExS usually presents bigger variability than the
rest of techniques regarding accepted jobs and QoS not fulfilled. This is because
it takes heed of changing resource behavior and consequently, depending on the
environment status, it adapts its predictions and accept a different number of
jobs.
From the point of view of the system, Figures 4.9 (a) and (b) depict the mean
overlap and waste times when calculating the job completion time estimations,
respectively.
96 Chapter 4. Adding Support for Meta-Scheduling in Advance: The SA-Layer
0
20
40
60
80
100
120
140
160
1 2 3 4 5 6
Me
an
Ove
rla
p p
er
Jo
b (
in s
ec.)
Number of Submitted Jobs per minute
CastilloTCT
ETTSRT
ExS
0
20
40
60
80
100
120
140
160
1 2 3 4 5 6
Me
an
Wa
ste
pe
r Jo
b (
in s
ec.)
Number of Submitted Jobs per minute
CastilloTCT
ETTSRT
ExS
(a) Overlap time (b) Waste Time
Figure 4.9. Comparison of the different estimation techniques from the System’sviewpoint.
Figure 4.9 (a) shows a greater overlap when using ETTS as there is a greater
number of running jobs. However, when using RT or ExS, the number of ac-
cepted jobs is close to ETTS, but they present much less overlap. What more,
ExS presents the more reliable and uniform behavior since the generated overlap
grows as load does. As a result of having lower overlap time, there will be fewer
jobs that do not meet the QoS requirements. This actually explains the results
depicted in Figure 4.8.
On the other hand, the reduction in the ETTS overlap from 4 jobs per minute
onwards is due to the fact that the system cannot accept more jobs since it
is saturated. Thus, the generated fragmentation may help the system not to
increase the overlap. If there would be a job in those free gaps, and other job
execution were longer than expected, the total overlap would be increased since
the overlap for the next jobs would consequently be bigger. It must also be noted
that, for very high loads in the system (5 and 6 jobs per minute), Castillo and
TCT techniques produce less overlap, but having accepted much fewer jobs than
the other three techniques.
Figure 4.9 (b) highlights that even having a higher number of running jobs,
the waste is lower when using ETTS, RT or ExS than when using TCT or Castillo.
That is, estimating execution and network times separately clearly has a good
influence in the performance of the system. The basic for this is that estimations
are more accurate. This also explains the results showed in Figure 4.7 (a),
inasmuch as having lower waste time, more jobs can be accepted since each
4.6. Summary 97
accepted job requires fewer reserved slots. Resource utilization is also better,
having less wasted time in between executions of jobs. So, resources will be
idle for less time. Note that ETTS gets the best results regarding waste time.
However, its overlap is the highest of the five techniques. This fact can make
jobs not meet their QoS requirements, as Figure 4.8 shows.
ExS also presents a logical tendency regarding waste time, which is having
more waste time as there are more jobs into the system. It presents a slightly
bigger waste than ETTS and RT techniques, because ExS assigns some extra
slots whenever it predicts that the resource could be more busy than usual
at the moment when the job is executed or if the network will be overloaded
when the files are transferred. Hence, more time could be waste but resulting
in a noticeable reduction in the overlapping time, which leads to a remarkable
improvement in the QoS provided, as shown in Figures 4.7 and 4.8.
4.6 Summary
Several research works aim at providing QoS in Grids by means of advance
reservations. However, making reservations of resources is not always possible
for several reasons. For this reason, we propose scheduling in advance (first step
of the reservation in advance process) as a possible solution to address QoS to
Grid users.
This type of scheduling requires to estimate whether or not a given application
can be executed before the deadline specified by the user. This requires to tackle
many challenges, such as developing efficient scheduling algorithms that scale
well or study how to predict the jobs completion time into the different resources
at different times. For this reason, making predictions about Grid resources
status is essential. It must be noted that network must be considered as another
Grid resource, as highlighted by previous studies [46] [112].
This chapter proposes an autonomic framework to perform meta-scheduling
in advance, which self-tunes the predictions made on the job execution time,
to improve the QoS offered to the users. This new system is concerned with
the dynamic behavior of the Grid resources, their usage, and the characteristics
of the jobs. Furthermore, this system takes into account the accuracy in the
98 Chapter 4. Adding Support for Meta-Scheduling in Advance: The SA-Layer
recent predictions for each resource in order to calculate a resource trust. By
using this information the system retunes its predictions to better fit the usage of
resources in the future. Along with this, the variable availability of resources is
also tackled by means of rescheduling failed jobs, which improves the autonomic
behavior of the framework.
Apart from presenting the general framework, a comparison between several
strategies that perform estimations on completion time of jobs is included. These
strategies are based on using estimations on the Total Completion Time (TCT),
Execution and Transfer Time Separately (ETTS), Resource Trust (RT) and fu-
ture status of resources through Exponential Smoothing functions (ExS). Those
techniques are compared with an implementation of the original scheduling in
advance algorithm proposed by Castillo et al. [3]. This comparison highlights
the importance of making network estimations independently and of taking into
account the different and variable resource performance as it improves the re-
source usage, allowing thus more jobs to be scheduled.
CHAPTER
5Optimizing Resource Utilization
through Rescheduling Techniques
In highly heterogeneous and distributed systems, like Grids, it is rather difficult
to provide Quality of Service (QoS) to the users. As reservations of resources may
not always be possible, another possible way of enhancing the QoS perceived is
performing meta-scheduling of jobs in advance, where jobs are scheduled some
time before they are actually executed. Thank to this, it is more likely that the
appropriate resources are available to execute the job when needed.
However, when using this type of scheduling, fragmentation appears and may
become the cause of poor resource utilization. Because of that, some techniques
are needed to perform rescheduling of tasks that may reduce the existing frag-
mentation. To this end, two techniques have been developed to tackle fragmen-
tation problems, which consist of rescheduling already scheduled tasks. Under
such scenario, knowing the status of the system is a must. However, how to
measure and quantify the existing fragmentation in a Grid system is a challeng-
ing task. Thus, different metrics aiming at measuring that fragmentation, not
only at resource level but also taking into account all the resources of the Grid
environment as a whole, are presented.
99
100 Chapter 5. Optimizing Resource Utilization through Rescheduling Techniques
5.1 Introduction
The way of scheduling proposed in the previous chapter (Chapter 4) may produce
resource fragmentation as a result of the allocation process [69]. Fragmentation
is a well known effect in resource allocation, which decreases the resource uti-
lization [56]. This means that a job execution request may be rejected even if
the overall remaining capacity of the resources is sufficient to handle it. Thus,
it is easy to identify fragmentation as an individual reason for the rejection of a
single allocation.
In our case, as jobs may have deadline along with start time constraints, it is
possible to have great fragmentation in the system even if the load is not high.
This means that a job execution request could be rejected due to fragmenta-
tion. Therefore, the main aim of this chapter is to present different techniques
that measure the generated fragmentation and react depending on the values
obtained. But how the fragmentation can be quantified in a system requiring
continuous allocations like time schedulers or memory is really complicated.
To tackle these problems, the job rescheduling module has been developed,
over the SA-Layer framework for meta-scheduling in advance presented in [60],
which alleviates the fragmentation problem. This module performs two types of
rescheduling: a reactive and a preventive technique. When a job fails its allo-
cation, the reactive technique reallocates an already scheduled job (keeping its
QoS agreements) and uses its released slots to allocate the incoming job. With
this aim, heuristics have been implemented to decide which job has more prob-
ability of being reallocated and which job has assigned time–slots which could
be useful to allocate the new incoming job. The preventive technique [139] [140]
performs rescheduling of tasks, from time to time, by sorting the jobs already
scheduled in a certain time interval by using its start time information (instead
of by using its incoming time), in the same way as a Bag of Tasks (BoT). There-
fore, the allocation process has more information about jobs to be scheduled
and free slots are put together. By rescheduling those tasks in this new order,
the resource fragmentation is reduced, improving both the scheduled job rate
and the resource usage. Hence, by using these two rescheduling techniques,
the fragmentation problem is alleviated, resource utilization is improved and the
QoS perceived by users is also increased as more of their jobs can be executed.
5.2. Scheduling Problems 101
Whilst for the reactive technique it is quite easy to know when the replanning
must be done (every time a job fails its allocation), for the preventive reschedu-
ling technique this is not so easy. Owing to this fact, metrics to measure the
existing fragmentation are implemented with the objective of knowing (1) if it is
needed to perform the replanning process, (2) what resources must be involved
in, and (3) what time intervals and jobs must be replanned.
The structure of the chapter is as follows. Section 5.2 describes problems
often found in resource allocation processes, being one of them fragmentation.
Section 5.3 details the main contribution of this chapter, which is the imple-
mentation of a job rescheduling module with the preventive and reactive resche-
duling techniques described above in order to tackle fragmentation problems
in Grids. In Section 5.4, several ways of measuring the existing fragmentation
into a Grid system are detailed. A performance evaluation of the approaches is
presented in Section 5.5. Finally, Section 5.6 draws the summary of the chapter.
5.2 Scheduling Problems
A well known effect in every job allocation process is fragmentation, which de-
creases resource utilization, as studied in [56]. So, when a job allocation fails,
even though enough free capacity is available, fragmentation is easily spotted as
a cause. Among the reasons for rejecting job allocation requests in Grid envi-
ronments we can find the next ones [69]:
• High utilization: If there is no vacancy on the resources the meta-sche-
duler will reject the request. The resource owners are interested in high
utilization since it maximizes their revenue.
• Fragmentation: If free parts of the resources are scattered in space and
time, rejection because of fragmentation will appear. However, if optimiza-
tion could compact the reservations to form a large free block, then new
incoming jobs may be admitted. So, the fragmentation could be analyzed
as a way to describe the status of the Grid, as it is detailed in Section 5.4.
• Unfavorable previous decisions: Even if the utilization and fragmentation
are both very low, a job may be rejected. This rejection neither comes
102 Chapter 5. Optimizing Resource Utilization through Rescheduling Techniques
from fragmentation nor from utilization. It is possible that a previously
scheduled job uses the only blocks which fit the new incoming job, so this
new job has to be rejected. The problem origins from the inability to foresee
the future. Usually, the meta-scheduler considers only the requested job
when making decisions.
The last two issues even become worst when having jobs with both start time
and deadline restrictions. As far as fragmentation concerns, if there are jobs
which have both time restrictions, the system may be forced to allocate the job
into time slots which may not be near to any other allocated job. Thus, the
fragmentation appears really soon and this could even be a problem with just a
few allocations.
Apart from that, it must be noted that even if there is no fragmentation, jobs
could be rejected due to unfavorable previous decisions. The main reason is due
to the fact that the meta-scheduler does not know what is going to happen in
the future regarding job allocation requests. Because of that, it has to make
the scheduling decisions only based on its current knowledge about the system
usage. For instance, the system allocates a job which does not have any strong
requirements using some particular free slots. Later, perhaps another job re-
quest arrives which could only be allocated into the slots that were assigned to
the previous job – which was not urgent. Thus, the system has to reject this
last job even if there is a chance of allocating both of them, just by allocating
both jobs in the opposite order. In the next section, two techniques devised for
addressing fragmentation will be presented.
5.3 Tackling fragmentation
Examining how fragmentation can be measured in the special domain of Grid
resources is a must. The nearest solution would be to reuse ideas of how to mea-
sure fragmentation in other domains, e.g., in file systems and main memory. For
instance, in [71] the characteristics of dynamic memory allocators were studied.
However, the domain of memory management does not map onto the domain of
Grid resources, as the main memory can be considered as homogeneous whilst
this is not the case for Grid resources.
5.3. Tackling fragmentation 103
On the other hand, in [69], a new way to measure the fragmentation of a
system, as well as their correlation with jobs rejection, is presented. It shows
that the proposed fragmentation measure is a good indicator of the state of the
system. However, they measure the fragmentation in the system resource by
resource and not all of them as a whole. Thus, further research is needed to
address fragmentation issues in the Grid meta-scheduling domain as such in-
formation could help to compare the effects of a scheduling decision.
The problem to tackle the fragmentation can be formulated as follows:
• A set of jobs j1, j2, ...jn have to be executed and are allocated for their future
execution on a set of m unrelated resources denoted by r1, r2, ....rm.
• Every job will start its execution after its starting time, ts and should finish
before its deadline, d.
• Each job consists of a single non-preemptable task that has to be executed
on one of the resources.
• The execution time of a job depends on the chosen resource and on the
time interval in which it is executed.
The objective is to allocate the n jobs to the m resources by minimizing the
fragmentation so that next job, n+1, can be successfully allocated into the m re-
sources. That is, the main objective function is to minimize the sum∑
i∈[1..m](Gi)
where Gi is the set of available gaps in resource ri.
This problem is NP-complex [24], so this strategy has to take into account
some key aspects in order to be efficient for the Grid system and low–cost in
computation. First, it has to be decided when the rescheduling technique must
be applied. For instance, it can be run periodically or triggered depending on the
status of the system. Next, the algorithm to avoid fragmentation is performed.
At this point, it must be highlighted that only the jobs that have been already
scheduled in-advance but have not started their execution yet are considered.
Hence, there is not any associated cost to the re-scheduling of that jobs regard-
ing their runtimes or data transfers. Moreover, it is a must that they have to
fulfill their QoS agreements after that re-allocation process. An aggressive tech-
nique can be used to reallocate the n + 1 jobs again. That is, the scheduling
104 Chapter 5. Optimizing Resource Utilization through Rescheduling Techniques
Figure 5.1. The Scheduler in Advance Layer (SA-Layer).
algorithm would consider all the jobs as requesting at the same time. On the
other hand, a more conservative scheme can be used in which the minimum
changes are done in order to allocate the n+ 1 jobs.
Based on those two options, two rescheduling techniques, named Replanning
Capacity (RC) and Bag of Task Rescheduling (BoT-R) [140] have been developed
in this dissertation. These techniques have been implemented in the so called
modules showed in Figure 5.1, and are explained in next section.
In the literature there are examples of moving jobs from one resource to an-
other to try to avoid fragmentation and to improve the resource usage. One
related work dealing with reallocation of jobs is [72], which highlights the im-
portance of having accurate information available when provisioning resources
in multiple domains. It uses backfilling to perform that provision of resources.
On the other hand, in [74] an algorithm to perform resource selection based
on performance predictions and also provides an algorithm for moving already
made reservations by making coallocation of jobs. However, no mechanism is
provided to do that among different users as we actually do. Finally, in [75]
it is analyzed the task reallocation in Grids, presenting different reallocation
algorithms and studying their behaviors in the context of a multi-cluster Grid
5.3. Tackling fragmentation 105
environment. However, unlike this Thesis, this work is center in a dedicated
Grid environment and evaluated though a simulated environment.
5.3.1 Reactive techniques: Replanning Capacity (RC)
The Replanning Capacity (RC) is categorized into the reactive techniques as it is
only applied when a job allocation fails. Its functionality is implemented in the
module called Replanning Capacity (shown in Figure 5.1) and it is a technique
triggered by the Gap Management module.
Essentially, this technique works as follows. Every time that a job request
is rejected, this technique is triggered. When this happens, it first selects the
already scheduled jobs which could be suitable for being rescheduled without
affecting their QoS requirements – without affecting their expected completion
time; and whose reserved slots may be suitable for the new incoming job. Once
this set of jobs is decided, the systems tries to make room for the new incoming
job (the one which otherwise would be originally rejected) by means of reschedu-
ling one of the suitable jobs chosen before. If this can be done without affecting
the expected completion time of the already scheduled job, then the new incom-
ing job is allocated into the resource and time slots that were just released.
The way how the RC technique works is detailed in Algorithm 11 and is ex-
plained the next. When a job request is rejected, the RC technique is triggered
(line 8). First, the set of target scheduled jobs must be selected (line 9). To do
this, the system filters out the jobs based on two steps. As the first step, jobs
which do not have any reserved slot in the time interval between the start time
and the deadline of the new incoming job (the one whose allocation failed) are
filtered out. Second, jobs less likely to be successfully rescheduled are also fil-
tered out. To this end, the system calculates the laxity for each target job [141].
The laxity is calculated by using Equation 5.1.
Laxity =SchedulingWindow − ExecutionT ime
ExecutionT ime(5.1)
In Equation 5.1, SchedulingWindow is the time interval in which the job has
to be executed (deadline − startT ime), and ExecutionT ime is the time the system
106 Chapter 5. Optimizing Resource Utilization through Rescheduling Techniques
Algorithm 11 Replanning Capacity Algorithm1: Let j = the new incoming job2: Let R = set of resources known {ri / i in [1..n] }3: Let laxity_threshold = the threshold used to filter resources4: Let SelectJ = set of already scheduled jobs {J1, J2,. . . ,Jm} which have reserved
slots in between the start time and the deadline of the j, and whose laxity isabove threshold
5: Let SortJobsByLaxity(J) the function which sorts the jobs of the list J by itsprobability of being reallocated
6: Let GapR = set of available gaps in R
7: Let TSJl,ri = time slots reserved for job Jl in resource ri8: if j allocation fails then9: SelectJ = filterJobs(laxity_threshold)
10: SelectJ= SortJobsByLaxity(SelectJ)11: for each SelectJl ∈ SelectJ do12: if TSSelectJl,ri is feasible for j then13: for each GapRk ∈ GapR do14: if GapRk is feasible for SelectJl then15: Allocate SelectJl at GapRk16: Allocate j at TSSelectJl,ri
17: Exit18: end if19: end for20: end if21: end for22: end if
estimated that the job would need to complete its execution for the resource
and the time interval where it was allocated. The laxity means the probability
of making a successful replanning of a job. This value is used for filtering out
the target jobs. Therefore, those jobs whose associated laxity is lower than a
specific value will not be taken into account as candidates for being reallocated,
since their scheduling window is too tight. In case of best–effort requests without
time requirements, the deadline to finish the job is assumed to be infinity and
their laxity will also be infinite. Hence, these best–effort jobs are prone to be
rescheduled.
Finally, the resulting list of jobs which have slots that could be suitable for
allocating the new incoming job is sorted by their probability of reallocation.
Those jobs are sorted by decreasing laxity (line 10). With the aim of keeping the
time needed to allocate the incoming job as short as possible, the resulting list
is truncated to the first x jobs of the list. In the case that the system fails at
trying to reallocate the first x target jobs, the process would be canceled and the
incoming job would not be accepted. This x value is set to not increase the time
5.3. Tackling fragmentation 107
needed to perform the rescheduling action in excess. To this end, in this Thesis
a maximum number of 10 jobs checked is established.
Once the list of target jobs is obtained and sorted (SelectJ in Algorithm 11),
the system acts as follows for each job SelectJl in SelectJ :
• Estimate the number of slots that the new incoming job (j in Algorithm 11)
would need for its execution in the resource. With this information, it is
decided if the number of slots of the job SelectJl (TSSelectJl,Ri ) is enough for
allocating j (line 12). In negative case, the slots of the next job of the list
(SelectJl+1) are checked. Otherwise, the next point is performed.
• If the number of slots of SelectJl (TSSelectJl,Ri ) is enough for allocating j, the
system tries to allocate the already scheduled job (SelectJl) in other resource
and/or time interval (line 14). In negative case, the algorithm continues to
the next loop iteration and the next job of the list (SelectJl+1) is checked.
Ultimately, when the previous conditions have been fulfilled for a job in the
list, the system reallocates the already scheduled job (SelectJl) to its new re-
source and/or time interval (line 15). Its previous reserved slots are used for
allocating the new incoming job (line 16). In this way, both jobs are allocated
leading to a better utilization of the resources. Consequently, the load increases
on the computing resources since more jobs can be executed. On the other
hand, as the rescheduled job is reallocated taking into account the jobs sched-
uled after its first allocation, the system has more information when making the
new scheduling decision, and the fragmentation is therefore decreased. It must
be noted that the job to be reallocated is rescheduled in the same way as it was
scheduled the first time – following the same algorithm.
5.3.2 Preventive techniques: Bag of Task Rescheduling (BoT-R)
Apart from the reactive technique, we have also developed a prevention technique
named Bag of Task Rescheduling (BoT-R). The BoT-R technique is regularly ap-
plied in order to avoid job allocation failures or at least to reduce them to the
minimum. However, this process is made only at intra-domain level with the aim
of not generating delays into network transfers and to make it more scalable. In
this way, there is not any substantially increment in the network usage.
108 Chapter 5. Optimizing Resource Utilization through Rescheduling Techniques
BoT-R is carried out at intervals, meaning that at most only the jobs which
will be executed in the studied interval will take part into the replanning process.
The jobs whose executions start before the beginning of the interval are also
excluded of the replanning process. No matter what interval we study, jobs
being executed will not take part into the replanning process. Thus, there will
not be preemption of jobs.
Furthermore, it must be noted that the BoT-R technique could also be used
together with the previously explained RC with the aim of avoiding job allocation
failures in between two contiguous executions of the BoT-R algorithm. With all
these assumptions, the BoT-R algorithm can be separated into two steps:
1. Trigger phase: Estimate if there is any necessity of performing rescheduling
of tasks. In such case, the time intervals which will be involved in the
rescheduling process are selected for the next step (this is presented in
Algorithm 12). It is explained in next subsection.
2. Filtering phase: Perform the BoT rescheduling process for the selected time
intervals, but only over the resources and tasks that need to be involved
in the process (this is presented in Algorithm 13). It is explained in detail
later.
In spite of the fact that the rescheduling is performed every period of time,
neither there is need of performing it in all periods nor involving all the re-
sources. Therefore, in order to perform the BoT rescheduling just when needed
and over the resources that really need it, resource fragmentation must be mea-
sured since it could be used as a forecast of how likely future allocations may
fail [69].
To this end, different metrics have been implemented to try to make good es-
timations about existing fragmentation and to take this information into account
when performing the two phases aforementioned. Those metrics are explained
in Section 5.4.
Trigger Phase
Regarding the triggering phase, Algorithm 12 has been implemented. It checks
several state variables to verify if rescheduling is needed. Moreover, the algo-
5.3. Tackling fragmentation 109
Algorithm 12 BoT-R Trigger executed every L period
1: Let StfJ = the first start time of the scheduled jobs2: Let EtlJ = the latest end time of the scheduled jobs3: Let P = the period between [StfJ , EtlJ ]4: Let Pi = the ith period between [StfJ , EtlJ ] in strips of strip slots5: Let OcuPi
= the percentage of resource occupation into period Pi
6: Let ruhigh = the maximum resource usage threshold7: Let rulow = the minimum resource usage threshold8: Let Frag(Pi) = the percentage of fragmentation at Pi interval9: Let FragThreshold = the minimum percentage of fragmentation needed to
making rescheduling of tasks10: Let #GapsPi
= number of gaps into period Pi
11: Let BoT −R_Algorithm(Pi) = the BoT-R function over period Pi
12: if (#Jobs > #Resources) then13: for each Pi ∈ P do14: if (#Jobs > 2 ∗ #Resources) and (OcuPi
in (rulow,ruhigh)) and (#GapsPi>
#Resources) and (Frag(Pi) > FragThreshold) then15: BoT-R_Algorithm(Pi)16: end if17: end for18: end if
rithm has to calculate in which period of time the system needs to apply Algo-
rithm 13. Thanks to this, the rescheduling is performed only on a number of
resources, periods of time and jobs, performing the rescheduling over all the
resources, periods of time and jobs would be against scalability.
On explaining Algorithm 12, it must be noted that, in order to reduce the
computational cost some conditions are first checked. It is checked if there are
more jobs than resources (line 12). In negative case, there is no need of replan-
ning since the system load is insignificant. In case of a positive answer, extra
conditions need to be evaluated. In such case, the period of time between the
first and the last scheduled job is split into intervals of strip slots. Subsequently,
these subintervals are checked separately so that replanning is only applied on
the time period which presents fragmentation. For each subinterval, it has to be
checked the information related to the load of the system and to the status of the
Gap Management.
Regarding the system load, the heuristic applied consists on checking if the
number of jobs is more than twice the number of resources for the selected
time interval (line 14). If not, there is no necessity of replanning as there are
not enough jobs to have fragmentation problems. It is also checked if resource
usage is between two thresholds (rulow and ruhigh), because unless this happens,
110 Chapter 5. Optimizing Resource Utilization through Rescheduling Techniques
there is no necessity of replanning. The rationale is that either if there are a lot
of free time slots (no need to reschedule jobs to allow more allocations) or if there
are too few free time slots (rescheduling would not make suitable gaps to allow
new allocations), the rescheduling of the jobs in such cases would not show any
improvement. With this objective, those thresholds are respectively set to 40 %
and 95 % in this Thesis.
With regard to the Gap Management status, and taking into account if all
the aforesaid conditions were fulfilled, information related to the fragmentation
is calculated and evaluated, such as the number of gaps (#GapsPi> #Resources
in line 14). To this end, different metrics measuring the fragmentation gen-
erated due to the scheduling decisions are used (represented by Frag(Pi) >
FragThreshold in line 14), which are explained in Section 5.4.
Filter Phase
Once the system has estimated which subintervals need tasks replanning to re-
duce the fragmentation, the rescheduling algorithm (Algorithm 13) is performed,
for each of them. The rescheduling problem involves a set of n jobs that has to
be processed on a set of m unrelated machines. Each job has to be executed on
one of the machines, taking into consideration that the processing time of a job
depends on the chosen machine where the processing is performed. In addition,
a release time is given for each job, meaning the time at which the job is avail-
able for processing. Moreover, each job has it own deadline. Since this problem
in just a single machine is already NP-hard in the strong sense, the problem
for multiple unrelated machines is NP-hard in the strong sense too [24]. Even
if the jobs are distributed to the machines, there is not any pseudo-polynomial
algorithm for sequencing optimally the jobs distributed to some machine. For
this reason the simple way of reallocating the jobs, presented in Algorithm 13,
has been implemented. This algorithm takes as input the interval to perform re-
planning – defined by start time and end time of the period. Then, the following
process is applied, following Algorithm 13:
1. The resources involved in the replanning process must be decided (line 9).
To this end, a filtering process is applied over the available resources so
that the resources which present a very high load are not taken into ac-
5.3. Tackling fragmentation 111
Algorithm 13 BoT-R Algorithm
1: Input: start time and end time of the interval to perform replanning (Pi)2: Let R = set of resources known {ri / i in [1..n] }3: Let J = set of already scheduled jobs {J1, J2,. . . ,Jm}4: Let Pi = the period to replan5: Let ResourcesFilter() = function which obtains the resources to defragment,
considering their workload6: Let JobFilter(R) = function which obtains the jobs scheduled to resources R
7: Let JobIntervalFilter(Pi,J ) = function which obtains the jobs of J whose fullexecution is within period Pi
8: Let SortJobsByStartTime(J ) = function which obtains a sorted list of the jobsJ taking into account its start time information
9: R′ = ResourcesFilter()10: J ′ = JobFilter(R′)11: JtoDefrag = JobIntervalFilter(Pi,J ′)12: Jsorted = SortJobsByStartTime(JtoDefrag)13: for each ri ∈ R′ do14: for each j ∈ Jsorted do15: if j can be allocated in ri then16: Schedule j to ri17: Jsorted = Jsorted − j
18: end if19: end for20: end for
count, as they do not have usable fragmentation. Only the resources that
present fragmentation in their allocations are studied.
2. The jobs scheduled to the resources obtained in step 1 are selected (line 10).
3. The jobs obtained in step 2 are filtered in order to take into account only
those whose full execution is within the defined period (line 11).
4. Once the list of jobs to replan is obtained, it is sorted by the job start time
restriction (line 12).
5. Finally, for each resource selected to be defragmented (line 13), the sorted
list of jobs is scanned in order (line 14), with the aim of allocating as many
jobs as possible in each resource (line 16), reducing the number of free
slots in between contiguous allocations. When it is not possible to allocate
any more jobs, the next resource is used for allocating the jobs which are
not allocated yet, and so on.
It must be noted that the network transfer times are recalculated with the
new information, with is more up–to–date than when it was estimated the pre-
112 Chapter 5. Optimizing Resource Utilization through Rescheduling Techniques
vious time – thus, the transfer estimations are more accurate. These times are
overlapped with the execution times of the previous job (in case of prolog phase)
and with the next job (in case of epilog phase). This fact reduces the resource
usage fragmentation to the minimum.
As it is possible that this rescheduling technique does not find a suitable
solution for reallocating all the jobs, the whole process could be canceled and
the jobs would keep its initial scheduled slots in the case that it was not possible
to reallocate all the involved jobs.
In addition, this process of rescheduling could be performed on another ma-
chine, so that the computational cost were reduced – it could even be executed
as another task submitted to the Grid.
However, measuring the status of the system (the existing fragmentation) is
required to perform this technique. To this end, different metrics aimed to this
issue are detailed in next section.
5.4 Fragmentation metrics
As stated before, a key point on the rescheduling algorithms is how to know
when there is a necessity of performing rescheduling of tasks and over what
resources it has to be applied. The most suitable information to trigger the
rescheduling process is a metric capable of measuring existing fragmentation.
However, how to measure the fragmentation in a real Grid system is a challeng-
ing task which still needs to be studied. There are studies in others domains,
like memory, but they do not map into Grid domains since Grids have more
constraints which must be taken into consideration. For this reason, different
ways of measuring the status of the Grid resources are presented and evaluated
in this chapter.
To overcome the fragmentation problems at the scheduling process, a tech-
nique based on rescheduling already scheduled tasks in a Bag of Tasks has
been developed [140]. The way of knowing when and over what resources is go-
ing to be applied is carried out in a two phases, named trigger phase and filtering
phase. To improve the functionality of those steps, different metrics have been
5.4. Fragmentation metrics 113
implemented that measure the fragmentation presented in evary single resource
belonging to the systems and also taking into account all of them as a whole.
Those metrics try to make good estimations about existing fragmentation. In this
way, the performance of the BoT-R technique will be improved by better knowing
the status of the Grid system.
It must be noted that, for all the metics next detailed, the first checked condi-
tion is the average occupancy of all the resources within the target time interval.
This occupancy has to be between two threshold, meaning that there is enough
occupancy to take advantage of performing a rescheduling process and it is not
so high as to have negligible improvement after having done it.
5.4.1 Trigger Phase
With the aim of properly estimating if a subinterval presents high fragmen-
tation (Frag(Pi) value) and consequently needs to be replanned (higher than
FragThreshold), three different metrics have been implemented. They are the
next:
• Gap: One possible way of estimating how properly the resources are be-
ing used, is by counting the number of free time intervals, named gaps,
and their average size (Equation 5.2). Whenever the number of gaps is
higher than a specific threshold and their average size is lower than another
threshold (when the gaps size are too large, this is not considered real frag-
mentation but free intervals that may allocate new jobs), the rescheduling
process is started – execution of Algorithm 13 over the selected interval.
AverageGapsSize =
∑nj=1 gj
n(5.2)
• Fragmentation: Another way of weighting the proper use of resources
by using the free time intervals generated in the allocation process is the
metric proposed in [69]. In that work, a metric to measure the fragmen-
114 Chapter 5. Optimizing Resource Utilization through Rescheduling Techniques
tation generated at the scheduling process by using the next equation is
presented:
Frag(ri) = 1−
∑nj=1 g
pj
(∑n
j=1 gj)p
(5.3)
where gj is the jth gap size in resource ri; and p is a variable that makes
the equation resistant to small negligible fragments as long as one large
fragment exits. It is a value that boost the influence (exponentially) of the
large gaps over the small ones. Nonetheless, in [69] authors measure the
fragmentation resource by resource, not all of them as a whole. For this
reason, we use the average of the measured fragmentation in each resource
as the fragmentation of the whole Grid system. Then, in case that the
estimated total fragmentation is greater than FragThreshold (line 14), the
algorithm to reschedule the task (Algorithm 13) will be executed for the
selected interval (line 15).
• Max_Fragmentation: Finally, the fragmentation of the whole system is
measured as the highest fragmentation found within a resource (measured
following Equation 5.3) instead of the average of them. In the same way
as the previous point, when the estimated fragmentation is greater than
the established threshold (FragThreshold in Algorithm 12), the algorithm
to reschedule the tasks (Algorithm 13) will be executed for the selected
interval.
All the established thresholds when using the above explained metrics have
the aim of correctly measure if there is fragmentation in the system or it is just
low or too high usage. Moreover, they try to trigger the rescheduling process
just when the chances of successfully ending the process is reasonable high.
To this end, after testing their behavior for different thresholds, we chose the
ones with provides the best behavior most of the times. In this case, for the
Gap metric, when the average gaps size is greater than 15 slots, it is observed
that they are not usually fragmentation, but free slots due to not too high us-
age. Also, the number of gaps needs to be greater than the number of resources.
As Fragmentation metric concerns, it is observed that, following Equation 5.3,
real fragmentation (which supposes a problem at the scheduling process) only
5.4. Fragmentation metrics 115
appears when the average value is above 10. For values lower than that, the
rescheduling process would be triggered without a real necessity of doing it. Re-
garding Max_Fragmentation, as it is chosen the maximum value obtained for
all the resources, this threshold must be greater to avoid performing useless
rescheduling processes. After trying different values, threshold was set to 30
since it resulted in a good rate between triggered rescheduling actions and suc-
cessfully committed.
5.4.2 Filter Phase
With regard to the second step of the BoT replanning process, the system has to
figure out which resources should be involved in the process whenever it is esti-
mated that in a certain subinterval there exists enough fragmentation to trigger
the rescheduling techniques. Therefore, and bearing in mind the aforementioned
ways of measuring the fragmentation in the system, two different techniques are
implemented to discern which resources will take part in the process:
• Gap: This technique selects the resources by taking into account the time
slots in use in the selected interval. When the number of used slots is
above a specific threshold, the resource will be filtered out and it will not
be taken into consideration in the rescheduling process. The reason for this
is that when the resource is too loaded, close to 100 %, the fragmentation
presented in that resource is negligible (or at least we cannot take advan-
tage of it) and the rescheduling process is not going to improve its usage
notoriously. Hence, taking this resource into account at the rescheduling
process will not provide any improvement considering the time needed to
carry out this process.
• Fragmentation: This technique uses Equation 5.3 to obtain the fragmen-
tation presented in each resource. Then, the resources which do not have
enough fragmentation will be dropped of the list of resources which will be
included into the BoT rescheduling process. Consequently, the jobs already
allocated in that resource will also not be included into that process.
Those metrics are in charge of filtering out the resources (and its jobs) that
present a high usage with a good enough scheduling (not having almost frag-
116 Chapter 5. Optimizing Resource Utilization through Rescheduling Techniques
mentation). To this end, for Gap metric, the threshold to estimate this fact is
set to 95 %. Hence, resources with a usage greater than that are not taken into
account in the rescheduling process. With regard to Fragmentation metric , it
is set another threshold for the value obtained using Equation 5.3 for each re-
source. Hence, the resources with a value below the threshold are filtered out.
After different experiments, this threshold was set to 30.
All these different ways of measuring the fragmentation are mixed to try to
find a good balance between the necessity of reducing fragmentation and the
possible computational cost that the rescheduling technique may involve.
5.5 Evaluation
This section describes the experiments conducted to test the usefulness of
the rescheduling techniques, along with the results obtained.
In this section, several implementations of the SA-Layer are compared with a
straightforward implementation of the algorithm presented by Castillo [38]. The
SA-Layer implementations first compared here are:
1. The new implementation with the re-scheduling capabilities presented in
this chapter. The Bag of Task Rescheduling is labeled as BoT-R, the Replan-
ning Capacity as RC and finally, when both techniques are working together
is labeled as BoT-RC;
2. The previous implementation of the SA-Layer presented in Chapter 4 (la-
beled as ExS in figures) which does not provide re-scheduling of jobs. So,
ExS does not include any mechanism to deal with fragmentation and poor
resource utilization as a result of the allocation process, apart from the way
of searching the target gaps by the Gap Management subsystem.
These techniques use Exponential Smoothing functions to better estimate the
time needed to complete the execution of a job in a specific resource – including
networks transfers.
After that, the performance of the SA-Layer using the different fragmentation
metrics presented in Section 5.4 and detailed in Table 5.1 is evaluated (they
5.5. Evaluation 117
are labeled in figures as they are named in that table). Their comparison with
Castillo and ExS is also outlined.
To evaluate the performance of those scheduling techniques several statistics
are used. Among the ones related to the user viewpoint:
• Scheduled job rate is the fraction of accepted jobs, i.e., those whose deadline
can be met.
• Rejected job rate, which means the number of jobs that are not accepted
to be executed since the system estimates that it is not possible to execute
them fulfilling their QoS requirements under the current Grid conditions.
• QoS not fulfilled, meaning the number of rejected jobs, plus the number
of jobs that were initially accepted but their executions were eventually
delayed. Thus, their QoS agreements were not fulfilled (e.g. the deadline
was not met).
Recall that in this work, QoS not fulfilled includes rejected jobs since they
are jobs which have not been executed with the requested QoS, and the QoS
specified for each job is always reasonable. The job QoS requirements could be
more or less strict but in all cases the job could be scheduled meeting the QoS
requested if the system is empty.
On the other hand, and from the system point of view, there are statistics
related to the frequency of replanning (how often a rescheduling process is per-
formed) and regarding how likely those rescheduling actions cannot end suc-
cessfully. They are named Submitted replanned and Aborted replanned, respec-
tively. Finally, the Resources Usage is another metric that shows variety of
information, such as the number of resources used, the way they are used and
during how long they are being used.
5.5.1 Testbed
The evaluation of the rescheduling implementation has been carried out in the
same real Grid environment detailed in Chapter 4, but with the addition of more
resources belonging to a new University. The testbed is made of resources lo-
118 Chapter 5. Optimizing Resource Utilization through Rescheduling Techniques
Figure 5.2. Grid testbed topology.
cated in two different Spanish Universities and in the University of Umeå (UmU),
Sweden, as it is depicted in Figure 5.2.
Note that these machines are administrated and operated by their respective
owners. Each non–cluster machine is a personal computer (hence not a dedi-
cated machine) of a member of the staff of UCLM, UNED or UmU. Individual
workloads (including network load) of these machines vary greatly and are not
defined in the testbed or the experiment setup. Moreover, they could fail or leave
the Grid at any moment, making our system even more realistic.
5.5.2 Workload
One more time, the selected test for evaluating our proposals is the 3node of the
GRASP [115] benchmarks, due to its versatility to make jobs network and/or
computation demanding.
For this evaluation, the compute_scale takes random values between 0 and
20, whilst the output_scale takes values between 0 and 2, both following a uni-
form distribution. So, we have up to 63 different kind of jobs (21 ∗ 3). The input
5.5. Evaluation 119
file size is 100 MB, and these values of output_scale create output files whose
size is up to 200 MB. The rationale of not setting the output_scale between 0 and
20, as it has been done in Chapter 4, is based on the fact that in this perfor-
mance evaluation we are interested in measuring the fragmentation presented
in the resources, which is related just to computational time. So, sending bigger
or smaller files is not really important for the study carried out in this section.
In this evaluation three different kind of workloads are used. First, a work-
load is generated by submitting the 3node jobs one after another (named Work-
load 1). So, when one job is submitted, and the system has accepted or rejected
it, the next job is submitted. A more realistic kind of workload has also been
studied, named Workload 2, where the submission rate is varied among 2, 4
and 6 jobs per minute, following a uniform distribution to generate the random
values for each case. In both cases, the total number of jobs submitted was
500 jobs and each result presented in this paper is the average of 5 executions.
Finally, to evaluate the fragmentation metrics, another workload made by the
jobs above defined is used, named Workload 3. This time, 1000 jobs have been
submitted for each test following a uniform distribution with an average of 8 jobs
per minute.
5.5.3 Rescheduling techniques
The aim of this test is to prove that rescheduling techniques improve the re-
source usage by modifying unfavorable previous decisions and BoT improves
resource usage by also reducing the fragmentation. As a result, more jobs may
be accepted. Moreover, when this technique is used in conjunction with the
preventive technique (BoT-RC), even better results are obtained.
The first statistic presented here is the scheduled job rate, which is presented
in Figure 5.3 (a). The x–axis represents the number of submitted jobs and the
y–axis the number of jobs that were actually accepted. So, the progression of
accepted jobs over the submitted jobs is depicted.
This figure highlights the improvement obtained by using the rescheduling
techniques, as they clearly outperform the other two techniques when the load
becomes higher. In fact, when only 200 jobs had been submitted, and using
any rescheduling technique, more jobs are accepted than when 500 jobs were
120 Chapter 5. Optimizing Resource Utilization through Rescheduling Techniques
50 100 150 200 250 300 350 400 450 500
Submitted Jobs
0
100
200
300
400
500
Accep
ted
Job
s
Castillo ES RCBoT-R BoT-RC
Castillo ES RC BoT-R BoT-RC0
10
20
30
40
50
60
70
% Q
oS
Not F
ulfi
lled
(a) Scheduled job rate (b) QoS not Fulfilled
Figure 5.3. Comparison between the scheduling techniques for Workload 1.
submitted using Castillo. With slightly more than 350 jobs submitted, the RC
technique exhibits more accepted jobs than when 500 jobs were submitted ap-
plying the ExS technique. Moreover, when using BoT-R or the mixture of both
rescheduling techniques, with just 300 jobs submitted, the number of jobs ac-
cepted is greater than when 500 jobs were submitted using ExS.
Another trend may be seen as load is increased. In case of medium or low
load (less than 250 jobs), both rescheduling techniques exhibit a similar be-
havior. Even there are not large differences between using or not rescheduling
techniques. At this loads the fragmentation is not a big issue as there are enough
slots for allocating most of the jobs. When the amount of jobs is higher (up to
400 jobs) the BoT-R rescheduling technique presents better results. The ratio-
nale is that there is enough fragmentation to take advantage of it. However,
when the load is even higher, the resource usage is rather high and the frag-
mentation is not usable. Therefore, the algorithm in charge of triggering the Bag
of Task rescheduling decides that it is worthless to make this replanning and it
is only triggered over a few resources and time intervals in which the existing
fragmentation may be used. When the load is very high (from 400 to 500 jobs)
the RC technique presents a better behavior than BoT-R as this technique tries
to improve the resource usage by modifying unfavorable previous decisions.
The BoT-RC technique presents the best behavior due to the fact that it takes
advantage of the fragmentation and it is able to modify unfavorable previous
decisions. In this way, BoT-RC presents the best results of all the rescheduling
5.5. Evaluation 121
2Jobs/Min 4Jobs/Min 6Jobs/Min0
10
20
30
40
50
60
70
80
90
100
% J
ob
s A
ccep
ted
Castillo ES RC BoT-RBoT-RC
2Jobs/Min 4Jobs/Min 6Jobs/Min0
10
20
30
40
50
60
% Q
oS
not f
ulfi
lled
Castillo ES RC BoT-R
BoT-RC
(a) Scheduled job rate (b) QoS not Fulfilled
Figure 5.4. Comparison between the scheduling techniques for Workload 2.
techniques as load is increased. Therefore, the BoT-RC technique obtains an im-
provement on accepted jobs of a 114 % over Castillo and around 40.2 % over ExS.
The performance difference between using only one rescheduling technique or
both together is lower but also remarkable. The BoT-RC technique outperforms
BoT-R in 19 % and RC in 6.5 %. In spite of not having noticeable differences with
RC when 500 jobs were submitted, if we pay attention to the whole progress, the
difference on accepted jobs between BoT-RC and RC is up to 19.6 %.
Figure 5.3 (b) depicts the percentage of jobs which did not fulfill the requested
QoS. The improvement obtained by the rescheduling techniques is still main-
tained and even improved when they are used in conjunction. BoT-RC obtains a
reduction in the QoS not fulfilled of 70.2 % over Castillo and of 56 % over ExS.
On the other hand, the conjunction of both rescheduling techniques obtains an
improvement over the cases when only one of them is used of 21.4 % over RC
and of 41.7 % over BoT-R.
The results for the Workload 2, which represents a more realistic experiment,
are depicted in Figure 5.4, showing the scheduled job rate and the percentage
of jobs which finally do not meet their QoS requirements. Both graphics are
presented, in spite of being almost the inverse of each other, with the aim of
highlighting that even making a more stressful use of the resources, this fact
does not lead to a noticeable increment in the number of jobs that finally do
not fulfill their QoS requirements due to mispredictions or inaccuracies in the
estimations of jobs duration.
122 Chapter 5. Optimizing Resource Utilization through Rescheduling Techniques
As the plots depict, Castillo technique is again clearly outperformed by the
other techniques. Comparing ExS and RC, there are small differences at low
loads since the fragmentation and resource usage are not as high as when the
submission frequency is increased. The rejections due to fragmentation or un-
favorable previous decisions are less likely. However, when the system load in-
creases, the differences are more remarkable and the RC technique outperforms
ExS.
For the BoT-R and BoT-RC cases, these differences are noticeable for all the
submission frequencies. At low loads (2 Jobs/Min.), and thanks to the Bag of
Task replanning, both BoT-R and BoT-RC can allocate all the jobs due to the fact
that the fragmentation in between allocation is reduced from time to time and
there are free slots for all the allocations.
When the submission rate rises to 4 jobs/min., jobs start to be rejected in
spite of applying this technique. It also must be noted that a submission rate of
4 jobs/min. onwards supposes a very high load and there is not enough com-
putational resources to allocate all the jobs. Note that submitting 4 jobs/min.
does not means executing 4 jobs/min. as these jobs may have a start time and
deadline constraints. Because of that, there may be periods of time where just
a few jobs have to be executed. By contrast, there may also be other periods of
time in which the number of jobs to be executed would be much greater than
the submission rate. This fact can be seen in Castillo statistics, which depicts
a QoS below 50 % of the submitted jobs. At this submission frequency, BoT-
RC improves the accepted job rate by 101.3 % over Castillo, by 37 % over ExS,
by 10.6 % over RC and by 2.3 % over BoT-R. With regard to the QoS not ful-
filled, BoT-RC reduces this statistic by 88.5 % compared with Castillo, by 80 %
compared with ExS, by 58 % compared with RC and by 33 % compared with
BoT-R.
From this point, the differences between using or not rescheduling tech-
niques keeps increasing. However, the differences between using only the Re-
planning Capacity or both rescheduling techniques are smoothing. The reason
for this is that the resource usage is quite close to the full usage and there is not
any usable fragmentation. In this way, just a few jobs may be moved to try to
avoid rejection due to unfavorable previous decisions for both techniques, and
the Bag of Tasks replanning is hardly committed. Consequently, the differences
5.5. Evaluation 123
Table 5.1. Combination of Fragmentation Metrics.G-G F-G F-F MaxF-F
Intervalsto be replanned Gaps Fragmentation Fragmentation Max_Fragmentation
Resourceswithin the process Gaps Gaps Fragmentation Fragmentation
between BoT-R and BoT-RC are a bit more remarkable as BoT-RC has the chance
of moving several jobs to try to avoid some unfavorable previous decisions whilst
BoT-R not.
These results emphasize the goodness of the rescheduling techniques to in-
crease the resource usage and the QoS received by users. By applying them,
more jobs are accepted as it is possible to reallocate already scheduled jobs. In
this way, the jobs which have less restrictive QoS requirements may be reallo-
cated in order to be able to allocate a new incoming job that has more restrictive
requirements. Hence, both jobs (the rescheduled and the new incoming one) can
be executed.
Moreover, several jobs may be reallocated in a BoT way having more informa-
tion about the job to schedule than the information that the system had about
them when their first allocation take place.
5.5.4 Fragmentation metrics
On the other hand, a number of experiments are undertaken in the previously
detailed Grid environment testbed to evaluate the efficiency and accuracy at
measuring the real fragmentation in the whole Grid system when using the frag-
mentation metrics above presented. Table 5.1 shows how those metrics are
mixed to be evaluated.
To evaluate those metrics, the Workload 3 (see Section 5.5.2) is used. For
the sake of clarity, this evaluation is based on the performance of the BoT-R
technique when using the different fragmentation metrics. Thus, the results
for RC and BoT-RC are not depicted as they have been studied in the previous
section.
The aim of the next tests are to highlight the importance of properly quanti-
fying the status of a Grid system and of making a good balance between com-
124 Chapter 5. Optimizing Resource Utilization through Rescheduling Techniques
Castillo ES G-G F-G F-F MaxF-F0
5
10
15
20
25
% o
f R
eje
cte
d J
ob
s
Figure 5.5. Percentage of Rejected Jobs for Workload 3.
putation time needed to perform the rescheduling actions and the advantages
obtained by using them.
Figure 5.5 depicts the evaluation regarding the rejection job rate. This is a
metric from the user point of view as it influences on the vision that the user
has about the system performance. This figure highlights the importance of
making good predictions (all the techniques clearly outperform Castillo’s imple-
mentation) and the benefits of using techniques that reduce the fragmentation
generated at the scheduling process. All the BoTs techniques outperform ExS,
reducing the percentage of rejected jobs between 50 % and 70 %. Moreover, it
can be seen that the different ways of measuring the Fragmentation result also
in different performance regarding the number of jobs rejected. The techniques
using Fragmentation as a metric to estimate what resources are taken into ac-
count into the rescheduling process present better behavior that the ones using
Gaps.
On the other hand, and regarding performance from system viewpoint, Fig-
ure 5.6 shows how often the rescheduling techniques are executed and how
successfully they are. Thus, this figure represents how much overload implies
the rescheduling techniques when using each fragmentation metric.
From these plots, it could be said that MaxF-F presents the best behavior.
When using MaxF-F, the fragmentation is better quantified. As a consequence of
that, it is much less likely that the rescheduling actions do not end successfully.
5.5. Evaluation 125
G-G F-G F-F MaxF-F0
10
20
30
40
50
60
70
80
% of Submitted Replanned % of Aborted Replanned
Figure 5.6. Relationship among checked, submitted and canceled reschedulingfor Workload 3.
In fact, a rescheduling action has almost always success when it is submitted
by using these fragmentation metrics.
F-G is the techniques that entails less overload (less than 50 % of the evalu-
ated rescheduling action are actually executed) but at the expense of more job
rejected. Moreover, the chances of having a rescheduling process aborted are
remarkable higher than with MaxF-F. This is the same for the other two tech-
niques. For instance, F-F presents the best behavior regarding rejected jobs,
though it is the techniques that requires more computational time to perform
the rescheduling actions. The rescheduling process is submitted more times
than using MaxF-F with similar cost for each one, as both use Fragmentation as
a technique to select the resources and jobs involved into the process. Suffice
it to say that this fact also leads to have greater chances of having the process
aborted, which clearly means a useless waste of time. Hence, MaxF-F is the
technique that presents a better balance between the overload produced and the
number of jobs rejected.
Finally, Figure 5.7 depicts resources usage without using any rescheduling
technique and Figure 5.8 presents the results obtained when the rescheduling
technique is performed using the different fragmentation metrics explained in
Section 5.4. Those figures shows the resource usage along time slots, distin-
guishing each resource usage by using a different color tone. In this experi-
ments, for clarity reasons, the number of available resources is set to 12. So,
126 Chapter 5. Optimizing Resource Utilization through Rescheduling Techniques
10 30 50 70 90 110 130 150 170 190
Time Slots
0
2
4
6
8
10
12
Nu
mb
er o
f resou
rces u
sed
Figure 5.7. Resources Usage without fragmentation metrics for Workload 3.
the aim is to make a efficient usage of those resources by achieving a nearly
100 % resource usage when there is enough number of jobs to do it.
Those figures highlight the advantages of performing rescheduling techniques
and the importance of how measuring the existing fragmentation. Without using
rescheduling techniques, the resulting resources usage presents more fluctua-
tions. And what is even more important, affecting a greater number of resources.
This last fact can be seen in the more frequent (greater in number) and bigger
(deeper) alterations that have the colored stripes in Figure 5.7.
Regarding the differences among the fragmentation metrics presented (plots
of Figure 5.8), the best behavior is again obtained when using MaxF-F. All of
them present a more uniform behavior (fewer fluctuations) than ExS technique.
However, there are remarkable differences among them. First of all, when using
MaxF-F the fluctuations regarding existing fragmentation (without taking into
account the increase or decrease owing to the fact that the total number of jobs
in the system is increasing or decreasing) are fewer than in the other cases.
Furthermore, when fragmentation appears, it is presented just in one resource
whilst in the other cases fragmentation may affect to more than one resource.
That is, the peaks are smaller (just one level hop) using MaxF-F than using the
other fragmentations metrics (hops of up to 3 levels).
On the other hand, both G-G and F-F have certain periods of time where the
rescheduling process seems not to have been submitted or to have been aborted.
This is shown at time slots 140 to 180 for G-G and between 100 and 120 for F-F.
5.6. Summary 127
10 30 50 70 90 110 130 150 170 190
Time Slots
0
2
4
6
8
10
12
Nu
mb
er o
f resou
rces u
sed
10 30 50 70 90 110 130 150 170 190
Time Slots
0
2
4
6
8
10
12
Nu
mb
er o
f resou
rces u
sed
(a) G-G (b) F-G
10 30 50 70 90 110 130 150 170 190
Time Slots
0
2
4
6
8
10
12
Nu
mb
er o
f resou
rces u
sed
10 30 50 70 90 110 130 150 170 190
Time Slots
0
2
4
6
8
10
12
Nu
mb
er o
f resou
rces u
sed
(c) F-F (d) MaxF-F
Figure 5.8. Resources Usage when using fragmentation metrics (BoT) for Work-load 3.
Hence, as far as resource usage concern, the metrics that present better per-
formance are F-G and MaxF-F. However, taking into account the other statistics
outlined above, the best behavior is provided by MaxF-F. It presents low rate
of rejected jobs, with a low computational overload (not as good as F-G, but, in
contrast, MaxF-F almost always is able to finish the rescheduling process) and
with the better performance from the resource usage viewpoint.
5.6 Summary
Providing QoS in such a distributed and heterogeneous system, such as Grid
environments are, is a challenging task. Advance reservations are usually pro-
posed to this end, but they are not always possible in real Grids. This Thesis
proposed meta-scheduling in advance as a possible solution to provide QoS to
Grid users. However, this kind of scheduling requires tackling many challenges,
such as developing efficient scheduling algorithms that scale well or studying
128 Chapter 5. Optimizing Resource Utilization through Rescheduling Techniques
how to predict the jobs duration into resources. Another important point that
has to be addressed is the necessity of dealing with the poor utilization of re-
sources due to the generated fragmentation at the scheduling process.
This chapter proposes an extension to the framework presented in Chapter 4
to be able to overcome the fragmentation and unfavorable previous decisions
which lead to poor resource utilization. These new features allow the system
to make rescheduling of already scheduled jobs with the aim of being able to
allocate a greater number of jobs by using resources more efficiently. Hence,
the main contributions are the development of the two rescheduling techniques
mentioned above, aimed at improving resource utilization and QoS provision in
Grids by reallocating already scheduled jobs (keeping their previous QoS agree-
ments). The reactive approach is called Replanning Capacity (RC), and it is exe-
cuted every time a job fails its allocation. In this case the system tries to resched-
ule a target job, maintaining its QoS requirements. Therefore, the reallocation of
that job would release the time slots needed to accept the incoming job, in such
a way that both jobs are executed meeting their respective QoS requirements.
The preventive approach, called Bag of Tasks (BoT), reschedules jobs (in a
Bag of Tasks way) by its start time instead of by its arrival time. Hence, the real-
location of those tasks will create fewer fragmentation into resources. Moreover,
different ways of measure the fragmentation generated in the allocation process
are presented and used in the implementation to trigger the BoT rescheduling.
Moreover, this chapter presents different metrics to measure the fragmenta-
tion presented in a Grid system which need to be used into the BoT Rescheduling
algorithms. These metrics are used to trigger the rescheduling of tasks, but just
when needed. Apart from that, they are useful to decide over which resources
they have to be applied. Consequently, the computational time needed to com-
plete the rescheduling process may be shorter.
Along with the improved framework, comparisons between different fragmen-
tation metrics and about whether or not using the rescheduling strategies are in-
cluded. This comparison highlights the importance of performing these resche-
duling processes so that a higher number of jobs can be allocated into resources.
Therefore, not only the use of resources but also the QoS perceived by users are
5.6. Summary 129
improved. The importance of accurately measure the status of a Grid system, by
using different techniques that measure the fragmentation, is also highlighted.
130 Chapter 5. Optimizing Resource Utilization through Rescheduling Techniques
CHAPTER
6Improving Grid QoS by means of
Adaptable Fair Share Scheduling
Federated Grid resources typically span multiple administrative domains and
utilize multiple heterogeneous schedulers, which complicates not only provi-
sioning of quality of service but also management of end–user resource utiliza-
tion quotas. The system developed and detailed in previous chapters (SA-Layer)
does not have any mechanism to deal with different resource usage policies. To
overcome these problems, this chapter proposes a solution based on the combi-
nation of the predictive SA-Layer meta-scheduling framework and a distributed
fairshare job prioritization system.
The SA-Layer is designed to provide scheduling of jobs in advance by use of
heuristics and prediction methods. The aim of SA-Layer is to ensure resource
availability for future job executions, and as such, the system provides quality
of service to end–users in terms of fulfillment of job deadlines. The fairshare job
prioritization system, FSGrid [4], provides a distributed system for decentral-
ized management of resource allocation policies and an efficient mechanism for
fairshare-based job prioritization.
The integrated architecture presented combines the strengths of both sys-
tems, providing a scheduling solution that improves end–user quality of service
131
132Chapter 6. Improving Grid QoS by means of Adaptable Fair Share Scheduling
by managing reliable resource allocations adhering to usage allocation policies
whilst also improving the performance of both systems.
6.1 Introduction
Grids are distributed systems that enable coordinated use of dispersed hetero-
geneous resources. Federation of computational resource in Grids facilitates the
existence of large–scale parallel applications in science, engineering and com-
merce [1]. A core feature of Grids is that the systems are comprised of resources
shared among several organizations that maintain site independence and auton-
omy [2]. As such, Grids are highly variable systems in which resources may join
or leave the systems at any time. This variability makes QoS regarding job dead-
lines highly desirable but very difficult to achieve in practice. One reason for
this limitation is the lack of central coordination of the system. This is especially
true in the case of the networks that connect the various components of a Grid
system. Thus, achieving good end–to–end QoS is difficult, as without resource
reservations guarantees of QoS are hard to satisfy. In real Grid environments,
reservations are not always feasible as not all Local Resource Management Sys-
tem (LRMS) permit them. In addition, there are types of resources, e.g., network,
which may lack global management entities making the reservation of resource
capacity infeasible.
A key idea to solve the scheduling problem is to ensure that a specific re-
source is available when a job requires it, and to this end, it was developed the
SA-Layer [60] [111] presented in Chapters 4 and 5. The SA-Layer is a schedu-
ling system that performs meta-scheduling of jobs in advance through efficient
allocation techniques and prediction heuristics for job durations and resource
status (including network). This system improves resource utilization and the
QoS provided to Grid users by ensuring that jobs finish on time. However, the
system does not take user priority into account when scheduling jobs. Hence, in
this chapter, improving Grid resource utilization QoS is addressed by combining
our approach to meta-scheduling jobs in advance with a distributed mechanism
for fairshare job prioritization. The resulting system distributes resource ac-
cess according to pre-specified resource allocation policies and thus improves
resource utilization QoS from the end–user point of view.
6.2. Improving end-user QoS: Sample Scenario 133
A number of scheduling systems exists that support fairshare prioritiza-
tion of jobs, e.g., Maui [52] and Simple Linux Utility for Resource Management
(SLURM) [98]. These are however typically not designed to support Grid en-
vironments that span multiple administrative domains, utilize heterogeneous
schedulers, and require support for site autonomy in allocation policies, but are
typically limited to enforcing usage quotas and operating on usage data from
within ownership domains. Regarding to the Grid environments, FSGrid [4] is
a system for decentralized fairshare job prioritization that operates on global
(Grid-wide) usage data and provides fairshare support to resource site sched-
ulers operating across ownership domains. In essence, FSGrid calculates job
execution prioritizations for users, projects, and virtual organizations, and can
thus be used to complement SA-Layer as a job scheduling order mechanism.
The integration of these two mechanisms provides improved end–user QoS
not only in terms of jobs finishing in time to meet deadlines, but also in dis-
tributed resource utilization quotas influencing job scheduling order to improve
scheduling fairness. Then, resource usage is improved and balanced by taking
into account pre-defined usage allocation policies.
The rest of the chapter is structured as follows. First, the motivation and a
sample scenario are presented in Section 6.2. In the next section, the FSGrid
(Section 6.3) system is introduced. Then, in Section 6.4 a new architecture with
both systems working together is presented. After that, Section 6.5 details a
performance evaluation investigating the goodness of the presented approach.
Finally, Section 6.6 outlines a brief of the whole chapter.
6.2 Improving end-user QoS: Sample Scenario
As mentioned before, SA-Layer does not have any kind of user prioritization,
so jobs are scheduled in the order they arrive. It must be noted that this fact
does not mean that first scheduled jobs are going to be executed first. This
order depends on the time constraints of jobs, the previous allocated jobs, the
status of the resources, and so forth. This fact is depicted in Figure 6.1. As
illustrated, the first submitted job is not going to be executed first (due to start
time restrictions). The fact that User 1 has already submitted jobs before does
134Chapter 6. Improving Grid QoS by means of Adaptable Fair Share Scheduling
Figure 6.1. Scheduling Process using SA-Layer.
not have any influence on the order in which jobs are going to be submitted next.
User 2 and 3 have to wait as they submitted their jobs execution requests after
User 1. This results in User 1 having 100% of jobs allocated whilst the other two
just have 50% success. So, it does not matter if User 2 or 3 have higher priority
than User 1, all jobs are scheduled in the order they arrive. Hence, mechanisms
to deal with fairness in scheduling in a way that leads to fair resource usage are
desirable.
This is the motivation to include a system which could be in charge of this
fairshare scheduling. In this Thesis, the FSGrid system is selected as it is a
scalable distributed fairshare system capable of prioritization of users at multi-
ple levels (e.g., among users, projects or virtual organizations). However, there
exist others, such as Fair Execution Time Estimation (FETE) scheduling [101],
which constitutes a version of Grid fairshare scheduling where jobs are sched-
uled based on completion time predictions. This is similar to scheduling in time-
sharing systems and the focus of this work lies at minimizing risk for missed job
deadlines. However, it is evaluated by using a simulated environment assuming
that tasks get a fair share of the resource’s computational power. Additional
algorithms for fair scheduling in Grids are presented in [102].
The allocation process described in Chapter 4, Section 4.2 has been slightly
modified in order to take into account the information provided by the FSGrid
system. New usage scenario of an allocation process is depicted in Figure 6.2. In
6.2. Improving end-user QoS: Sample Scenario 135
Figure 6.2. Meta-Scheduling in Advance Process.
this figure, as before, several administrative domains are represented (the three
bubbles) which have several users submitting jobs to resources through sev-
eral meta-schedulers. In each administrative domain there is an entity, named
Gap Management, in charge of managing the current and future usage of the
resources of that domain (taking into account Grid user’s usage). When there
is more than one meta-scheduler per domain, all must communicate with the
same Gap Management entity for that domain. There may be one or more FSGrid
servers having the information about users, projects and virtual organizations
prioritizations of the whole system. It must be noted that there may be one
FSGrid server per resource site, including several administrative domains and
virtual organizations, but there may also be several FSGrid servers (one per re-
source site) managing the information about the same virtual organizations. In
this way, resource sites mount global usage policies onto local policies, propa-
gating transparently the allocation updates to the other resource sites. Hence,
the new steps for the SA-Layer meta-scheduling in advance process (Figure 6.2)
are:
1. A user sends a request to the local meta-scheduler providing a tuple with
information on the application and the input QoS parameters. Again, as
outlined in previous chapters, in this approach the input QoS parameters
136Chapter 6. Improving Grid QoS by means of Adaptable Fair Share Scheduling
are just specified by the start time and the deadline. This request waits in
the jobs pool until it is chosen to be scheduled.
2. FSGrid sorts the job pools by taking into account job ownership (what users
submitted the job requests) and user usage histories and priorities.
3. The meta-scheduler selects the first job of the pool to allocate it in the same
way as in Chapter 4. That is, it communicates with the Gap Management
entity that executes a gap search algorithm to obtain both the resource and
the time interval to be assigned for the execution of the job.
4. If it is not possible to fulfill the user’s QoS requirements using the resources
of its own domain, communication with meta-schedulers from other do-
mains starts. Techniques based on P2P systems (as proposed by [128] [131],
among others) can be used to perform the inter-domain communications
efficiently. For scalability reasons, each meta-scheduler does not have com-
plete knowledge of all the meta-schedulers in the system.
5. If it is still not possible to fulfill the QoS requirements, a renegotiation
process is started between the user and the meta-scheduler to redefine QoS
requirements. This renegotiation, as well as the overall interaction with
users, may be conducted by means of Service Level Agreements (SLAs) [35]
[134].
A sample scenario illustrating the way SA-Layer submits jobs when supported
by FSGrid (as opposed to Figure 6.1, which shows the original SA-Layer behav-
ior), is depicted in Figure 6.3. As Figure 6.3 illustrates, the FSGrid user prioriti-
zation system updates user priority and (re)sorts the job pools accordingly after
each allocation decision.
Hence, when using both systems together, jobs are not executed taking into
account just their arrival time but also considering the user who sent them.
Thus, the way of using resources is improved from the user point of view as they
can execute a different number of jobs taking into account previous jobs already
executed (or submitted) as well as user priority (defined in system usage alloca-
tion policies). These diagrams illustrate end–user fairness for a scenario where
all users have the same priority. With both systems combined, all users get to
schedule the same amount of jobs. However, if the policy were different (e.g.,
6.3. FSGrid 137
Figure 6.3. Scheduling Process with SA-Layer and FSGrid integrated. Jobs arescheduled (but not executed) in order of user priority.
User 3 would have more priority than the other two users) then users with more
priority would execute more jobs than the ones with lower priority (jobs of User 3
would be executed before the others). Hence, the integration of the two systems
lends the possibility of not only ensuring job executions within deadlines, but
also of providing different and specific QoS per user, project and virtual organi-
zation. It must be noted that not all jobs need to have time constraints. If users
do not need or want time restrictions for jobs, they may submit jobs without,
which then will be scheduled in a best effort way. This feature highlights the
necessity of having a job prioritization system that decides what job is to be the
next to be submitted. A brief description of FSGrid is given next.
6.3 FSGrid
One of the key mechanism used to affect how resource capacity is distributed
in scheduling environments is job prioritization. Whilst schedulers like the
SA-Layer system determine when jobs are run, prioritizers determine in what
order jobs should be run (scheduled) to meet a specific objective function. In
fairshare scheduling environments, resource capacity allocations are specified
as quota allocations and the objective function (fairness) is defined in terms of
resource capacity utilization meeting quota allocations. In these environments,
schedulers use a fairshare job prioritizer mechanism to ensure that jobs are
138Chapter 6. Improving Grid QoS by means of Adaptable Fair Share Scheduling
Figure 6.4. FSGrid Architecture [4].
scheduled in an order that ensures that users receive their allocated resource
capacity.
FSGrid [4] defines a decentralized distributed system that extends the con-
cept of fairshare scheduling to Grid environments and defines mechanisms for
fair user prioritization and scheduling of jobs in federated resource environments
(more in-depth details in [4]).
The FSGrid architecture (depicted in Figure 6.4) is realized as a web service–
based Service Oriented Architecture (SOA) and can be integrated with existing
scheduling environments such as Maui or SLURM with a minimum of intrusion.
Integration points are exposed as services for policy specification, usage data
storage, and fairshare calculation. For integration with local scheduling systems
that lack global (Grid-level) mappings between job owners and jobs, FSGrid also
provides an optional interface for job ownership resolution. The FSGrid archi-
tecture is designed to facilitate distributed load balancing and pre-computation
and caching of all computational states within the system.
The policy model used in FSGrid is based on the concept of organization of
end–users in Virtual Organizations (VOs) [5] that autonomously specify recursive
policy trees that define hierarchies and quotas for users, projects, and organi-
zations. Thus, the definition of fairness used here details system–wide resource
utilization to converge to predefined resource capacity allocations over time. The
policy model of FSGrid is illustrated in Figure 6.5, and defines resource capacity
allocations in a tree format that expresses usage allocations in fraction shares of
resource capacity. This construction virtualizes the current capacity allocation
and decouples policy allocations from the actual resource capacity metrics used
in scheduling. FSGrid can utilize any metric for resource capacity, e.g., arbi-
trary combinations of CPU time, wall clock time, storage requirements, etc., but
6.3. FSGrid 139
Figure 6.5. A FSGrid policy tree.
requires that metrics used are homogeneous or comparable between resource
sites contributing resource usage data. To this end, in this Thesis, computa-
tional time has been selected as the comparable metric to perform a fairshare
usage of resources.
The tree-based format of FSGrid resource allocations allows policy defini-
tion to be recursively delegated to Resource Sites (RS), VOs, and projects within
VOs (PX ). Note that projects may contain users (UX ) as well as sub-projects,
for instance /VO2/P2/P3. Moreover, it is possible to define local queues and
groups of users that are not defined in VOs, e.g., user /LQ/UX. In FSGrid, re-
source site administrators define local policy trees for resource sites and mount
global (distributed) policy component trees for VOs onto branches of local policy
trees. FSGrid provides mechanisms for distributed access to policy components,
which can further be subdivided and administrated by VOs and VO entities (e.g.,
project administrators). Policy component updates are transparently propagated
to resource sites and automatically updated in fairshare calculations.
The fairshare calculation algorithm of FSGrid is based on comparing policy
trees to usage trees, which are identical in structure to policy trees but contain-
ing actual usage data rather than usage allocation quotas. A set of tree compar-
ison operators are combined to produce a customizable mechanism for fairshare
based job prioritization. To limit and modulate the influence of historical usage
information on fairshare calculations, FSGrid defines a structure where usage
data is organized in time-resolved user–level histograms (e.g., storing all known
usage data for a specific user in a histogram where each bin contains a sum-
mary of that user’s resource capacity usage for a specific day). To allow site ad-
ministrators greater control of usage data influence, FSGrid defines individually
configurable usage decay functions that can be used to modulate the impact of
140Chapter 6. Improving Grid QoS by means of Adaptable Fair Share Scheduling
usage data on fairshare. FSGrid supports and automatically adapts to dynamic
updates in usage policies and usage data.
Comparisons between policy allocations and usage data are expressed in fair-
share trees that also inherit structure from policy trees and contain all fairshare
information for a resource site in a single structure. The tree-based FSGrid
fairshare load balancing algorithm is very efficient, and can pre-computate and
cache fairshare state data for entire virtual organizations. For comparison of jobs
in scheduling prioritization, FSGrid utilizes an algorithm that extracts fairshare
vectors from paths in fairshare trees. Fairshare vectors contain all pertinent in-
formation for comparison of usage status for job owners, and facilitates ranking
of jobs on multiple fairshare levels simultaneously [4]. With this computational
structure, prioritization of jobs is reduced to (lexicographic or arithmetical) or-
dering of fairshare vectors associated with jobs.
However, two major factors that impact the speed of FSGrid convergence (the
convergence of usage consumptions to policy usage allocations) are usage cost
variance (differences in job lengths) and usage update latencies. Usage cost vari-
ance stems from differences in job lengths and are in general unavoidable – jobs
will get different run lengths due to differences in computations, resource avail-
ability, resource capacity, and even variations in resource capacity (in shared
systems or resource elasticity in paravirtualized resources such as those in
certain Cloud systems). Usage update latencies stem from the total cost (e.g.
run length) of a job being unknown until the job is successfully executed and
processed. Until the usage cost of a job is known, and reported to the Usage
Statistics Service (USS), FSGrid is unaware that the job exists and does not fac-
tor it into fairshare usage allocation enforcement calculations. Factors such as
scheduling costs, data transmission overhead, and storage requirements may be
factored into usage costs as well, further complicating the calculations.
FSGrid is a fully decentralized system where resource site administrators
individually determine what VOs to contribute resource capacity to, as well as
the relative amount of resource capacity to contribute to each VO. The system
fully preserves resource site autonomy and is devoid of central coordination.
6.4. Integrated Architecture 141
6.4 Integrated Architecture
As both systems work at different levels and try to manage QoS from different
points of view, they can work together to try to enhance the overall QoS perceived
by Grid users. In this way, QoS can be addressed in terms of jobs finishing on
time, taking into account previous and scheduled usage of the system. What is
more, usage policies may be set (not all users having the same priority) to take
into account the different users existing in the system, as well as the different
projects and virtual organizations.
In addition, FSGrid may take advantage of the SA-Layer usage cost predic-
tions to improve the convergence of resource utilization to allocation quotas.
This means that less time is needed to achieve correct fairshare values. If the us-
age policy is changed dynamically, the system is going to adapt itself to achieve
the new fairshare values quite fast, even without having completed any job exe-
cutions.
When scheduling is done in environments with resource queues (where mul-
tiple jobs are queued on resources prior to execution), the time until the cost
of a job is known may vary substantially, leading to significant perturbations of
FSGrid convergence. To provide better QoS in scheduling and resource pool load
balancing, SA-Layer computes predictions of job execution times as part of the
meta-scheduling process. To increase the efficiency of the FSGrid fairshare job
prioritization (i.e., increase FSGrid convergence rates), SA-Layer job execution
time predictions are used as estimates of job usage costs and reported to FSGrid
when individual jobs are scheduled. Predictions are later replaced with actual
job usage costs when they become known (after job completion).
A snapshot of this new architecture with both systems working together is
depicted in Figure 6.6. When a user sends a job execution request, it is stored
into a job pool. This pool is sorted by taking into account current users prior-
ities obtained through the Fairshare Calculation Service of the FSGrid system.
Then, the SA-Layer schedules the first job of the highest prioritized user. Dur-
ing the scheduling process carried out by SA-Layer, the predicted information
for the scheduled job is sent to the USS of the FSGrid system. This propagates
scheduling information as quickly as possible, facilitating fast convergence of
the FSGrid fairshare values. Finally, when the scheduled job finishes its exe-
142Chapter 6. Improving Grid QoS by means of Adaptable Fair Share Scheduling
Figure 6.6. SA-Layer and FSGrid systems integrated.
cution, the actual information about the usage cost of that execution is sent to
the FSGrid system to replace the previously predicted information. In this way,
more accurate fairshare values may be obtained as, at the end, actual execu-
tions times are used. The predictions are only used whilst the system is waiting
for the actual usage cost. In cases when the user cancels a job, or the job is not
executed fulfilling the agreed QoS, the predicted information sent to the USS of
FSGrid is also updated in order to remove the erroneously predicted information.
6.5 Performance Evaluation
To evaluate the performance and characterize the behavior of the proposed sys-
tem, a number of experiments are undertaken in a Grid environment testbed.
6.5.1 Testbed
In this case, the evaluation testbed consists of resources located at three dif-
ferent Universities across Europe, in the same way as in previous chapter (Fig-
ure 6.7). The only difference is that at the University of Umeå (UmU), Sweden,
there is also a FSGrid server in charge of the users prioritization.
6.5. Performance Evaluation 143
Figure 6.7. Grid testbed topology.
6.5.2 Workload
As in previous chapters, the 3node test of the GRASP [115] benchmarks, is used
for testing our implementation.
To evaluate the performance of both presented systems working together,
3node jobs are submitted with different input parameters for 7 different users
which also have different usage policies (see Figure 6.5). To this end, the com-
pute_scale parameter of the 3node test is set by following an uniform distribution
function taking values between 0 and 20. The total amount of bytes to be trans-
ferred in each execution is set to 100 MB. The rationale of that is because this
is an intermediate size whose transfer is quite fast when using local resources
(UCLM resources) and rather slow when using external resources (UNED or UmU
resources).
6.5.3 FSGrid Convergence Rate
The impact of utilizing job usage cost predictions in meta-scheduling environ-
ments with resource queues is illustrated in Figure 6.8, which depicts experi-
ments using a workload made by submitting 3node jobs to force that there are
144Chapter 6. Improving Grid QoS by means of Adaptable Fair Share Scheduling
0
20
40
60
80
100
0 2000 4000 6000 8000 10000
Rel
ativ
e U
sag
e (%
)
Jobs
/VO1/LQ
/VO2
(a) Using actual job costs.
0
20
40
60
80
100
0 500 1000 1500 2000 2500
Rel
ativ
e U
sag
e (%
)
Jobs
/VO1/LQ
/VO2
(b) Combining predicted and actual job costs.
Figure 6.8. FSGrid convergence rates for an isolated policy tree subgroup.
always jobs belonging to each user in the job pool. Jobs are submitted continu-
ously up to the convergence of the system, since the aim of this test is to measure
the convergence speed. The rationale is that if there are only jobs from one user
in the job pool, the system is going to schedule those jobs even if his fairshare
vector is low (as there are no other users competing for resource usage).
Figure 6.8 shows the summarized resource usage for the first level depicted
at Figure 6.5. Hence, VO1 should reach relative resource usage of 50 %, whilst
for LQ and VO2 they should be around 20 % and 30 %, respectively. In the
illustration, the relative distribution of number of jobs scheduled is plotted as a
function of total number of jobs scheduled.
As illustrated in Figure 6.8 (a), usage cost update latencies delay FSGrid
convergence and require the system to process many jobs before it can reach a
6.5. Performance Evaluation 145
balanced usage state. As illustrated in Figure 6.8 (b), incorporation of SA-Layer
job cost predictions (which are reported to FSGrid during the scheduling phase)
significantly improves the FSGrid convergence rate. It must be noted that, for
this last Figure, the represented x–axis scale only goes to 2500 as the system
converged long before. While SA-Layer provides high quality estimates of usage
costs it is worth noting that even poor predictions substantially improve FSGrid
convergence rates. This is due to predicted values effectively displacing the main
FSGrid convergence noise from usage update noise to usage cost variance noise,
which has substantially lower impact on overall FSGrid convergence. Further
treatment of FSGrid convergence and noise models is available in [4].
6.5.4 Quality of Service
To highlight the impact of using FSGrid to fairly distribute resource usage (from
the end–user point of view), another workload is used. In this case, all users
submit identical jobs simultaneously. This is repeated up to 150 different jobs
per user (in this example, for a total of 1050 jobs).
In these experiments, the functionality of the SA-Layer with and without the
FSGrid job prioritization system is compared. To this end, the used metric is
the percentage of jobs whose allocations fail. This metric may be defined as
the number of jobs whose allocations fail due to the fact that the system does
not have enough free time slots to execute them within their time constraints.
Therefore, it is a metric related to the resource utilization QoS from the end–user
point of view.
As Figure 6.9 (a) depicts, when the FSGrid system is not used, all users
experience a roughly equal failure rate as differences in user priority are not
taken into account. However, when user prioritization is used, the failure rate is
related to the usage policy allocations. Therefore, this figure shows that for users
with higher priority (see Figure 6.5) the failure rate is lower than for those with
lower priority. For instance, user /LQ/UX has a failure rate below 5% whilst
for user /VO1/P1/U4 is above 25%. Figure 6.9 (b) highlights the relationship
between usage policy and failure rate. It shows how failure rate decreases when
the resource usage percentage is increased in the policy.
146Chapter 6. Improving Grid QoS by means of Adaptable Fair Share Scheduling
/LQ/UX /VO1/P1/U2 /VO1/P1/U3 /VO1/P1/U4 /VO1/U1 /VO2/P2/P3 /VO2/P2/U10
5
10
15
20
25
30
Failure
Rate
(%
)
With FSGrid Without FSGrid
0,08 0,1 0,12 0,14 0,16 0,18 0,2
Usage Percentage
0
5
10
15
20
25
30
Failu
re R
ate
(%
)
(a) Failure Rate Per User. (b) Failure Rate Per Percentage.
Figure 6.9. Failure rates.
To sum up, the graphics of Figure 6.9 show that a better QoS is provided to
users when using SA-Layer in conjunction with FSGrid system, as users with
higher priority have fewer jobs that cannot be executed fulfilling the QoS re-
quested. Moreover, this fact, along with the fast convergence rate, gives us the
possibility of providing special QoS to certain users, projects or VOs during a
specific time. Hence, it is possible to address adaptable QoS that fits current
requirements, which means great improvement of the QoS provisioned to users.
6.6 Summary
In this chapter an infrastructure to manage QoS by improving resource uti-
lization is presented. This improvement is based on using FSGrid system to-
gether with the SA-Layer meta-scheduler presented in previous chapters. FS-
Grid improves fairness in scheduling by taking into account usage policies spec-
ifying multi–level resource capacity allocations (i.e., allocations for end–users,
projects and virtual organizations).
Combination of the SA-Layer and the FSGrid systems is shown to improve
end–user resource utilization QoS as resource usage is balanced taking into
account pre-defined resource allocation policies. In addition, use of SA-Layer
job duration predictions is shown to improve the convergence rate of the FSGrid
fairshare prioritization system, resulting in fewer jobs being required to achieve
fairshare resource utilization and end–user usage quota enforcement. In this
way, those usage policies can be changed dynamically and the system will reach
a fairness resource usage quite soon.
6.6. Summary 147
Finally, a performance evaluation of the combined architecture is presented.
This study highlights the benefits of SA-Layer using the information provided by
FSGrid to be able to improve the QoS provided to the users. Moreover, the ben-
efits from the FSGrid point of view are also presented. FSGrid takes advantage
of the predictions made by SA-Layer, and by using this information, it presents
a huge improvement regarding the time needed to reach a fair resource usage.
148Chapter 6. Improving Grid QoS by means of Adaptable Fair Share Scheduling
CHAPTER
7Conclusions, Contributions and
Future Work
This chapter presents the conclusions drawn from this Thesis, reviews the con-
tributions obtained from the work developed, and suggests guidelines for future
research.
7.1 Conclusions
A Grid is a highly distributed system where providing QoS is very difficult due to
several reasons, such as the heterogeneity of Grid resources or the different se-
curity policies of the administrative domains that build the whole environment.
There are several works whose objective is to overcome those difficulties. A com-
pilation of those approaches has been presented along with the identification of
their weak points. Apart from that, other approaches similar to our proposals or
relate to the techniques used to get over the found challenges are studied.
With the aim of developing an open–source middleware for the Grid com-
munity that is capable of addressing QoS, a real Grid environment (based on
Globus and GridWay) has been set up and maintained. In this way, the evalu-
ation results consider natural behavior, the heterogeneity and the dynamism of
149
150 Chapter 7. Conclusions, Contributions and Future Work
Grid resources. Over this infrastructure several modifications have been imple-
mented to make it network–aware and to provide an autonomic behavior which
improves the scheduling decisions.
Anyway, in a Grid environment it is quite difficult to provide any kind of QoS
without making reservation of resources in advance. However, as organizations
sharing their resources in such a context still keep their independence and au-
tonomy [2], and due to the fact that not every resource in a Grid environment
provides this functionality, those reservations are not always feasible. In fact,
there are some kind of resources, such as bandwidth, which may be scattered
across several administrative domains, making their reservations really difficult.
Owing to these facts, our implementation is based on performing scheduling in
advance rather than advance reservations. This means that resources and time
periods to execute the jobs are selected and taken into account when performing
the next allocation decisions, but without making any physical reservation of the
resources.
Under that scheduling in advance scenario, and based on the idea proposed
by Castillo et al. in [3] [38], efficient algorithms to select a suitable resource
and time period to execute the jobs have been developed, and have been im-
plemented efficient data structures to store the information needed to perform
this scheduling process. This led us to have to develop prediction techniques
to estimate the future status of the resources, network inclusive, as well as im-
plementing several heuristics to estimate the time needed to complete the
job execution in a resource at a specific time in the future.
However, even making this process as accurate as possible, it has some prob-
lems regarding resource utilization. Mainly job rejections due to fragmentation
and/or unfavorable previous decisions. To overcome these issues, two resche-
duling techniques have been developed. The first one is a reactive technique
that moves an already scheduled task in order to try to also accept a new in-
coming job. The second one is a preventive technique that, from time to time,
tries to reduce the fragmentation of the system. To this end, several heuristics
have been implemented to try to better measure the existing fragmentation in
a Grid system at some specific point, and to know whether is useful to apply this
technique or not.
7.2. Contributions 151
Finally, as the developed software does not provide any mechanism to deal
with different users priorities, and as a result of the stay made at the University
of Umeå, work has also been carried out on the integration of SA-Layer with a
system capable of providing the information needed to manage different levels of
QoS depending on the target usage policy (FSGrid [4]).
In addition to these implementation, a performance evaluation of each one of
them has been done and presented. They highlight (1) the advantages of having
a system that autonomously adapt itself to the current system behavior; (2) the
need to make scheduling of resources in advance; (3) the importance of making
predictions about resources’ status and job durations and the improvement ob-
tained by using them; (4) the necessity of rescheduling techniques that improve
the resource usage by better allocating the jobs when as the system has more
information about all the jobs to be executed; (5) and the benefits of being able
to deal with different users priorities when the system is overloaded.
7.2 Contributions
The work carried out during this Thesis has produced the following contribu-
tions:
• Development of an autonomic network–aware meta-scheduler over Grid-
Way. This is presented in Chapter 3, where two different modification to
GridWay are presented. First, the methodology followed to include network
information into GridWay by using the Iperf tool is detailed. Subsequently,
another proposal is presented which focuses on the implementation of an
autonomic network–aware meta-scheduling architecture that is capable of
adapting its behavior to the current status of the environment, so that
jobs can be efficiently mapped to computing resources. This proposal uses
concepts from autonomic computing to react to changes in the status of
the system in order to perform meta-scheduling more efficiently. In this
way, by using this information, the GridWay meta-scheduler has knowl-
edge about how trustable a resource behavior is. Consequently, it can
choose a more reliable resource to submit the job.
152 Chapter 7. Conclusions, Contributions and Future Work
• Development of a meta-scheduler in advance system (SA-Layer) over
GridWay. This is presented in Chapter 4, where the predictive framework
built on top of Globus and the GridWay meta-scheduler, named SA-Layer,
is detailed. It manages QoS by means of performing meta-scheduling of
jobs in advance. SA-Layer manages idle/busy periods of resources in order
to choose the most suitable one for each job by using red–black trees as a
data structure. It uses heuristics that consider the network as a first level
resource and it makes different estimations for the time needed to complete
the transfers and the execution times.
• Development of predictive techniques over SA-Layer. This is also pre-
sented in Chapter 4, where different complement prediction techniques are
detailed. A predictive module to figure out the future status of resources
and interconnection network has been implemented. This prediction tech-
nique is based on an Exponential Smoothing function which takes into
account the previous status of the system resources to try to figure out
what will be the future performance of them. An autonomous behavior is
added by means of computing a trust value for each resource by also taking
into account their previous behaviors.
• Development of rescheduling techniques over the SA-Layer to improve
resource utilization. This is presented in Chapter 5, where the problems
related to the scheduling techniques detailed in Chapter 4 are outlined.
Fragmentation appears as a well known effect of every allocation process
and may become the cause of poor resource utilization. Apart from that,
there may also be job rejection caused by unfavorable previous decisions
due to the inability to foresee the future. For these reasons, two techniques
have been developed to tackle fragmentation problems, which consists of
rescheduling already scheduled tasks with the aim of reducing the frag-
mentation by having more information in the allocation process. Under
such scenario, knowing the status of the system is a must. In this way,
different metrics are developed aiming at measuring the fragmentation of
the system.
• Improving SA-Layer functionality by using an adaptable fairshare job
prioritization system (FSGrid). This is presented in Chapter 6, where
the integration of the SA-Layer with a fairshare job prioritization system,
7.3. Publications 153
named FSGrid, is detailed and evaluated. FSGrid provides a distributed
system for decentralized management of resource allocation policies and
an efficient mechanism for fairshare-based job prioritization. Hence, the
integrated architecture focuses on enhancement of resource utilization QoS
through the combination of both systems. Thus, the new architecture com-
bines the strengths of both systems and improves perceived end–user QoS
by providing reliable resource allocations adhering to usage allocation poli-
cies.
7.3 Publications
The work carried out during this Thesis has produced the following publications.
They include three international journal paper, several international conference
papers and several national conference papers.
Also, two technical reports have been published, which are first draft of pa-
pers submitted for publication in conferences and journals. This has been done
in order to gain visibility and obtain faster feedback. Moreover, works submit-
ted for publication are also mentioned, as well as several publication which are
related to or based on the contributions made along this Thesis. Finally, addi-
tional contributions regarding the developed software, and the direction of some
final degree projects and a Master Thesis are outlined.
7.3.1 Journal papers
• Luis Tomás, Agustín Caminero, Blanca Caminero and Carmen Carrión.
Network–aware meta-scheduling in advance with autonomous self-tuning
system. Future Generation Computer Systems – The International Jour-
nal of Grid Computing Theory Methods and Applications, Elsevier, Holland,
ISSN: 0167–739X, Volumen 27, pages 486–497, 2011.
Impact 2010: 2,365. This journal is 9/97 (Q1) in JCR 2010, in the field
Computer Science, theory and methods.
This publication presents a framework built on top of Globus and the
GridWay meta-scheduler to improve QoS by means of performing meta-
154 Chapter 7. Conclusions, Contributions and Future Work
scheduling of jobs in advance, named SA-Layer. This framework manages
idle/busy periods of resources in order to choose the most suitable resource
for each job. Besides, no prior knowledge on the duration of jobs is re-
quired, as opposed to other works using similar techniques. SA-Layer uses
heuristics that consider the network as a first level resource and presents
an autonomous behavior so that it adapts to the dynamic changes of the
Grid resources. The autonomous behavior is obtained by means of com-
puting a trust value for each resource and performing job rescheduling.
• Luis Tomás, Agustín Caminero, Omer Rana, Carmen Carrión and Blanca
Caminero. A GridWay-based Autonomic Network–Aware Metascheduler. Fu-
ture Generation Computer Systems – The International Journal of Grid
Computing Theory Methods and Applications, Special Issue on Quality of
Service in Grid and Cloud Computing, In Press. Elsevier, Holland, ISSN:
0167–739X, 2012.
Impact 2010: 2,365. This journal is 9/97 (Q1) in JCR 2010, in the field
Computer Science, theory and methods.
This publication presents the implementation of an autonomic network–
aware meta-scheduling architecture which is capable of adapting its behav-
ior to the current status of the environment, so that jobs can be efficiently
mapped to computing resources. The implementation extends the widely
used GridWay meta-scheduler and relies on Exponential Smoothing to pre-
dict the execution and transfer times of jobs. An autonomic control loop
(which takes account of CPU use and network capability) is used to alter
job admission and resource selection criteria in order to improve overall job
completion times and throughput.
7.3.2 International conference papers
• Luis Tomás, Agustín Caminero, Blanca Caminero and Carmen Carrión.
Studying the influence of network-aware Grid scheduling on the performance
received by users. In Proceedings of the International Conference on Grid
computing, high-performAnce and Distributed Applications (GADA). Mon-
terrey, México. November, 2008. ISBN: 978-3-540-88870-3.
Quality indicator: CORE: C; paper referenced in DBLP.
7.3. Publications 155
In this paper an extension to the GridWay metascheduler to perform sche-
duling considering the network status is presented.
• Luis Tomás, Agustín Caminero, Blanca Caminero, Carmen Carrión. Im-
proving GridWay with Network Information: Tuning the Monitoring Tool. In
Proceedings of the 6th High-Performance Grid Computing (HPGC) Work-
shop, in conjunction with International Parallel and Distributed Processing
Symposium (IPDPS). Rome, Italy. May, 2009. ISBN: 978-1-4244-3750-4.
Quality indicator: CORE: C; paper referenced in DBLP. Acceptance rate of
23% (IPDPS rate).
This paper presents an evaluation and tuning of the overhead produced by
the network monitoring tool used in [112].
• Luis Tomás, Agustín Caminero, Blanca Caminero, Carmen Carrión. Ad-
vanced Meta-Scheduling using Red–Black Trees in Heterogeneous Grids En-
vironments. In Proceedings of the 7th High-Performance Grid Comput-
ing (HPGC) Workshop, in conjunction with International Parallel and Dis-
tributed Processing Symposium (IPDPS). Atlanta, USA. April, 2010. ISBN:
978-1-4244-6441-8.
Quality indicator: CORE: C; paper referenced in DBLP. Acceptance rate of
24.1% (IPDPS rate).
This publication presents the first version of the SA-Layer. A meta-schedu-
ler in advance system based on red–black trees data structures to manage
the idle/busy periods of resources and with a simple method to obtain pre-
dictions about jobs durations into resources.
• Luis Tomás, Agustín Caminero, Blanca Caminero, Carmen Carrión. Us-
ing Network Information to Perform Meta-scheduling in Advance in Grids.
In Proceedings of the 16th International Euro–Par Conference (Euro–Par
2010). Lecture Notes in Computer Science, Part I Volumen 6271/2010, pp.
431–443. Ischia, Italy. September, 2010. ISBN: 978-3-642-15276-4.
Quality indicator: CORE: A; paper referenced in DBLP and Scopus. Overall
acceptance rate of 35%. Specific track acceptance rate of 28%.
This article is based on improving the SA-Layer functionality by considering
the network as a first level resource. In this way, it is included into the
156 Chapter 7. Conclusions, Contributions and Future Work
SA-Layer a new technique to better estimate the time needed to complete
the job executions by making different estimations for the transfers and
execution times.
• Luis Tomás, Agustín Caminero, Blanca Caminero, Carmen Carrión. Expo-
nential Smoothing for network–aware meta-scheduler in advance in Grids.
In Proceedings of the International Workshop on Scheduling and Resource
Management on Parallel and Distributed Systems (SRMPDS), in conjunc-
tion with the Intl. Conference on Parellel Processing (ICPP). San Diego,
USA. September, 2010. ISBN:978-0-7695-4157-0.
Quality indicator: CORE: C; paper referenced in DBLP and Scopus. Accep-
tance rate of 32% (ICPP rate).
This paper presents a new version of the SA-Layer that uses exponential
smoothing functions to predict the future status of the resources and in-
terconnection networks when estimating job durations. In this way, the
time needed to complete the jobs are better estimated.
• Luis Tomás, Agustín Caminero, Blanca Caminero, Carmen Carrión. Ad-
dressing Resource Fragmentation in Grids Through Network–Aware Meta-
Scheduling in Advance. In Proceedings of the 11th International Sym-
posium on Cluster, Cloud and Grid Computing (CCGrid 2011). Newport
Beach, USA. May, 2011. ISBN: 978-0-7695-4395-6. BEST POSTER AWARD.
Quality indicator: CORE: A; paper referenced in DBLP and Scopus. Accep-
tance rate of 29.1%.
This publication presents a new technique within SA-Layer to tackle frag-
mentation problems regarding the allocation process which may lead to a
poor resource utilization. This technique consists of rescheduling already
scheduled tasks. To this end, heuristics are implemented to calculate the
intervals to be replanned and to select the jobs involved in the process.
Moreover, another heuristic is implemented to put rescheduled jobs as
close together as possible to minimize the fragmentation.
• Luis Tomás, Agustín Caminero, Blanca Caminero, Carmen Carrión. A
Strategy to Improve Resource Utilization in Grids Based on Network–Aware
Meta-Scheduling in Advance. In Proceedings of the 12th IEEE/ACM In-
7.3. Publications 157
ternational Conference on Grid Computing (Grid 2011). Lyon, France.
September, 2011. ISBN: 978-0-7695-4572-6.
Quality indicator: CORE: A; paper referenced in DBLP.
This paper presents a more in-depth study and evaluation about the Bag
of Task rescheduling techniques presented in [139].
• Luis Tomás, Per-Olov Östberg, Blanca Caminero, Carmen Carrión, Erik
Elmroth. An Adaptable In–Advance and Fairshare Meta-Scheduling Archi-
tecture to Improve Grid QoS. In Proceedings of the 12th IEEE/ACM Interna-
tional Conference on Grid Computing (Grid 2011). Lyon, France. Septem-
ber, 2011. ISBN: 978-0-7695-4572-6.
Quality indicator: CORE: A; paper referenced in DBLP.
This work focuses on the enhancement of resource utilization QoS through
combination of two systems. Our predictive meta-scheduling framework
(SA-Layer) and a distributed fairshare job prioritization system, named FS-
Grid. The integrated architecture presented in this work combines the
strengths of both systems and improves perceived end–user quality of ser-
vice by providing reliable resource allocations adhering to usage allocation
policies.
7.3.3 National conference papers
• Luis Tomás Bolívar, Agustín Caminero Herráez, Blanca Caminero Herráez,
Carmen Carrión Espinosa. Incorporando información de red en el meta-
planificador GridWay. In Proceedings of the XIX Jornadas de Paralelismo.
Castellón, Spain. September, 2008. ISBN: 978-84-8021-676-0.
This paper presents the modifications made over GridWay to make it net-
work–aware.
• Luis Tomás Bolívar, Blanca Caminero Herráez, Carmen Carrión Espinosa.
Planificación Avanzada en GridWay. In Proceedings of the XX Jornadas de
Paralelismo. A Coruña, Spain. September, 2009. ISBN: 84-9749-346-8.
This publication introduces the first steps to provide scheduling in advance
within the GridWay meta-scheduler.
158 Chapter 7. Conclusions, Contributions and Future Work
• Luis Tomás Bolívar, Blanca Caminero Herráez, Carmen Carrión Espinosa.
Meta-Planificación por Adelantado en Grids Heterogéneos. In Proceedings of
the XXI Jornadas de Paralelismo. Valencia, Spain. September, 2010. ISBN:
978-84-92812-49-3.
This article details the basic implementation of the SA-Layer. It is focused
on the data structures used for storing the information about future usage
of resources and the way of accessing to that information.
7.3.4 Technical reports
• Luis Tomás, Agustín Caminero, Blanca Caminero, and Carmen Carrión,
Grid Metascheduling Using Network Information: A Proof–of–Concept Imple-
mentation.. Technical Report, DIAB–08–04–2, Computing Systems Depart-
ment, University of Castilla–La Mancha, Spain, April 30, 2008.
This report presents an in-depth study about the performance of including
network monitoring tools into the GridWay meta-scheduler.
• Luis Tomás, Agustín Caminero, Blanca Caminero, and Carmen Carrión,
Using Network Information to Perform Meta-scheduling in Advance in Grids..
Technical Report, DIAB–10–03–2, Computing Systems Department, Uni-
versity of Castilla–La Mancha, Spain, March 25, 2010.
This report presents the basic functionality and performance of one of the
first versions of SA-Layer – without prediction techniques but making dif-
ferent estimations for execution and transfer times.
7.3.5 Submitted works
Journals papers
• Luis Tomás, Agustín Caminero, Blanca Caminero and Carmen Carrión.
On the Improvement of Grid Resource Utilization: Preventive and Reactive
Rescheduling Approaches. Submitted to Journal of Grid Computing, Spe-
cial Issue on High Performance Grid and Cloud Computing, ISSN: 0167–
739X. Impact 2010: 1,556. This journal is 29/97 (Q2) in JCR 2010, in
the field Computer Science, theory and methods.
7.3. Publications 159
This article presents two techniques that have been developed to tackle
poor resource utilization, whose main idea consists of rescheduling already
scheduled jobs so that new incoming jobs can be allocated. They deal with
the rejections due to fragmentation and because of unfavorable previous
decisions. In this way, several heuristics are presented to choose the best
job or jobs to be reallocated and when it has to be done.
International conference papers
• Luis Tomás, Per-Olov Östberg, Blanca Caminero, Carmen Carrión, Erik
Elmroth. Addressing QoS in Grids through a Fairshare Meta-Scheduling
In-Advance Architecture. Submitted to 12th International Symposium on
Cluster, Cloud, and Grid Computing (CCGrid), Ottawa, Canada, 2012.
This paper extends the study and results about the integrated architecture
(FSGrid plus SA-Layer) presented in [142].
• Luis Tomás, Blanca Caminero, Carmen Carrión. Improving Grid Resource
Usage: Metrics for Measuring Fragmentation. Submitted to 12th Interna-
tional Symposium on Cluster, Cloud, and Grid Computing (CCGrid), Ot-
tawa, Canada, 2012.
This paper studies different techniques for measuring the existing fragmen-
tation with the aim of improving the BoT rescheduling technique presented
in [140].
7.3.6 Related contributions
Journals papers
• Javier Conejero, Luis Tomás, Carmen Carrión, Blanca Caminero. QoS
Provisioning with Meta-Scheduling in Advance within SLA-based Grid Envi-
ronments. Computing and Informatics, In Press.
Impact 2010: 0,356. This journal is 100/108 (Q4) in JCR 2010, in the field
Computer Science, Artificial Intelligence.
This publication presents the mechanisms needed to manage the commu-
nication between the users and the SA-Layer system presented in [60]. To
160 Chapter 7. Conclusions, Contributions and Future Work
this end, those mechanisms are presented and implemented through SLA
contracts based on the WS-Agreement specification.
International conference papers
• Javier Conejero, Luis Tomás, Carmen Carrión, Blanca Caminero. An SLA-
based Meta-Scheduling in Advance System to Provide QoS in Grid Environ-
ments. In Proceedings of the 5th Iberian Grid Infrastructure Conference
(Ibergrid 2011). Santander, Spain. Jun, 2011. ISBN: 978-84-9745-884-9.
This article introduces the necessity of development an entity in charge of
establish the agreements between users and the entities which manage the
Grid resources. The mechanisms presented to this end are implemented
through SLA contracts based on the WS-Agreement specification.
• Javier Conejero, Luis Tomás, Carmen Carrión, Blanca Caminero. Differen-
tiated QoS in Grids supported by SLAs. In Proceedings of the 9th Intl. Work-
shop on Middleware for Grids, Clouds and e–Science (MGC), in conjunction
with the 12th Intl. Middleware Conference. Lisbon, Portugal. December,
2011. ISBN: n/a.
Quality indicator: CORE: C.
This paper presents a framework to negotiate SLAs between users and Grid
service providers, where the QoS expected by users is clearly defined in
three levels. This levels are used to classify the importance of each SLA
and deal with the confidence that Grid resources can provide.
National conference papers
• Javier Conejero, Luis Tomás, Carmen Carrión, Blanca Caminero QoS en
Entornos Grid mediante un Sistema de Meta-planificación por Adelantado
basado en SLAs. In Proceedings of the XXII Jornadas de Paralelismo. La
Laguna, Spain. September, 2011. ISBN: 978-84-694-1791-1.
This paper presents the infrastructure proposed to address the QoS pro-
visioning to users through Service Level Agreements following the WS-
Agreement specification.
7.4. Funds 161
7.3.7 Additional contributions
• The software developed along this Thesis is available at web page: http://
www.i3a.uclm.es/raap/gridcloud/SA-Layer.
In that web page, the main characteristics of the software are outlined. The
aim of this is to gain visibility amongst the Grid community.
• Direction of Final Degree Project: “Estudio y Evaluación de la Herramienta
de Monitorización NWS en un Sistema Grid”. Student: D. Francisco Javier
Conejero Bañón. July 2009. Escuela Superior de Ingeniería Informática,
University of Castilla–La Mancha (UCLM).
• Direction of Final Degree Project: “Planificación avanzada en Grids: Uso del
árbol rojo–negro para los algoritmos de alojamiento de trabajos”. Student:
D. Angel Codón Ramos. September 2009. Escuela Superior de Ingeniería
Informática, University of Castilla–La Mancha (UCLM).
• Direction of Master Thesis: “Desarrollo de técnicas escalables para el des-
cubrimiento de información en sistemas paralelos distribuidos”. Student: D.
Ismael García Pérez. October 2011. National University of Distance Educa-
tion (UNED).
7.4 Funds
The present Thesis has been carried out thanks to the funds received from a
number of projects and grants. They are classified into national and regional
projects.
162 Chapter 7. Conclusions, Contributions and Future Work
7.4.1 National projects
Project title: High-performance, Reliable Architectures forData Centers and Internet Servers
Funding entity: Consolider-Ingenio 2010 Program
Code: CSD2006-46
Participants: University of Castilla–La Mancha, PolytechnicUniversity of Valencia, University of Murcia,University of Valencia
Length: from October 2006 to December 2011
Main researcher: Dr. Francisco J. Quiles Flor (UCLM subproject)
Number of researchers: 80
Total price of the project: 3,500,000 (1,038,000 euros for UCLM)
Project title: Server architectures, applications and services
Funding entity: Ministerio de Ciencia e Innovación (MICINN)
Code: TIN2009-14475-C04-03 (TIN subprogram)
Participants: University of Castilla–La Mancha, PolytechnicUniversity of Valencia, University of Murcia,University of Valencia
Length: from October 2009 to October 2012
Main researcher: Dr. Francisco J. Quiles Flor (UCLM subproject)
Number of researchers: 29
Total price of the project: 407,800 euros (UCLM only)
7.4. Funds 163
7.4.2 Regional projects
Project title: Improvement of the Quality of Service of GridApplications
Funding entity: UCLM
Code: PBI08-0055-2800
Participants: University of Castilla–La Mancha, MAAT G-Knowledge
Length: from March 2008 to December 2010
Main researcher: Dra. María Blanca Caminero Herráez
Number of researchers: 5
Total price of the project: 72,000 euros + 70,149 euros (infrastructureFEDER funds)
Project title: MoteGrid: Grid Architecture for DistributedProcessing Information Collected by WirelessSensor Networks
Funding entity: Junta de Comunidades de Castilla–La Mancha
Code: PII1C09–0101–9476
Participants: University of Castilla–La Mancha, University ofMurcia, Complutense University of Madrid
Length: from April 2009 to April 2012
Main researcher: Dra. Carmen Carrión Espinosa
Number of researchers: 12
Total price of the project: 150,000 euros
164 Chapter 7. Conclusions, Contributions and Future Work
7.5 Collaborations with other research groups
To complete this Thesis, the author realized a research stay, which was hosted
by Professor Erik Elmroth, within the research group on Distributed Systems
(Grid & Cloud, www.cloudresearch.se), which belongs to the Department of
Computing Science at the University of Umeå (Sweden). This stay lasted 3
months from 29th March to 1st July of 2011. During this stay, fairness poli-
cies when performing meta-scheduling were studied and applied. To this end,
interaction between the FSGrid and the SA-Layer was implemented and tested.
As a result, a poster was presented at the 12th IEEE/ACM International Con-
ference on Grid Computing (Grid 2011). Moreover, an extended performance
evaluation of the integrated architecture have been submitted to the 12th Inter-
national Symposium on Cluster, Cloud and Grid Computing (CCGrid 2012).
7.6 Future work
The work presented in this Thesis has led to different ideas for further work.
Some of them are presented the next.
• Study of more sophisticated methods (apart from exponential smoothing
functions) to predict job execution times.
• Network reservations: in this Thesis, focus is placed on the scheduling
process. In the future, addressing the issues related to network reserva-
tions is planned, i.e., to develop a Bandwidth Broker which collaborates
with the meta-scheduler in order to perform network reservations. Us-
ing this approach, the effective bandwidth in between measures could be
better predicted and the estimations would be more accurate. Hence, the
TOLERANCE value would be improved and there would be less options of
choosing a wrong resource due to a misprediction. For this reason, it is a
good point to try to reserve network bandwidth when and where this could
be possible.
• Development of algorithms to schedule data as another resource, with the
aim of improving the time needed for transfers when executing a job, with
7.6. Future work 165
the consequent reduction in the execution times. More precisely, jobs may
require multiple pieces of data, which in turn may be replicated on different
storage resources. So, finding an instance of the required pieces of data for
each job, and performing the execution of the job meeting its QoS require-
ments (e.g., the execution deadline), is an interesting research issue. Al-
though some techniques have already been presented (for instance, [143]),
keeping the bandwidth between all the storage resources and the comput-
ing resources creates scalability issues, which must be addressed.
• Improvement of the rescheduling techniques to make them more so-
phisticated and intelligent. To this end, different information (apart from
the start time constraint) may be used when performing the rescheduling
techniques, such as deadline or laxity.
• Comparison of SA-Layer approach with an algorithm which applies real
reservations of resources, whenever possible. For example, clusters man-
aged by Maui [52] supporting reservation of CPUs.
• Deal with workflows by taking into account where each job is located. In
this way, the jobs with file dependencies may be put in the same resource
(or at least, close to it) with the aim of decreasing the time needed to send
those files or even avoiding transfers in some cases.
• Implementation of mechanisms for Fault Tolerance Management. This
is another guideline for current and future work based on trying to fore-
see future problems regarding resources availability and performance. By
using this information some resources could be avoided during some spe-
cific time to prevent job failures which would lead to QoS agreements not
fulfilled.
• Improvement of the FSGrid with network information. To this end, it
may be also useful to develop some kind of fair network usage metric. The
aim of this would be not only to be fair in computational resource usage
but also in network usage, as this may influence the performance of many
other job executions.
• Deployment of the implementation on EGEE resources [10], providing
us with a larger Grid testbed, where resources are more distributed and
more users are involved in submitting jobs to the system.
166 Chapter 7. Conclusions, Contributions and Future Work
• SA-Layer Adaptation to the Cloud: the main idea is to adapt the func-
tionality provided by SA-Layer to the cloud infrastructures with the aim
of taking advantages of the features provided by them. On the one hand,
it is desired to modify the prediction techniques to also estimate the time
needed to deploy a virtual machine, to resume it or to stop it. On the other
hand, an adaptation of the rescheduling techniques to reduce the fragmen-
tation generated at the scheduling process must be addressed. To this end,
using virtual machines (live) migrations is a point that makes possible to
take advantages of small amount of slots, otherwise useless, by moving,
from one resource to another and from one period of time to another (live
migration of the application or even the whole virtual machine), some best–
effort applications to use those scattered slots in time and/or resources.
APPENDIX
AAcronyms
AD Administrative Domain
API Application Programming Interface
ANM-ExS Autonomic Network-aware Meta-scheduler
BB Bandwidth Broker
BoT Bag of Tasks
BoT-R Bag of Task Rescheduling
CAC Connection Admission Control
CSF Community Scheduler Framework
DB Database
DSRT Dynamic Soft Real Time Scheduler
EDF Earliest Deadline First
EGEE Enabling Grids for E-sciencE
ESII Escuela Superior de Ingeniería Informática
ETTS Execution and Transfer Time Separately
ExS Exponential Smoothing
167
168 Appendix A. Acronyms
FCFS First Come First Serve
FETE Fair Execution Time Estimation
GARA Globus Architecture for Reservation and Allocation
GarQ Grid Advanced Reservation Queue
GIS Grid Information System
GNB Grid Network Broker
GNRB Grid Network-aware Resource Broker
GRAM Grid Resource Allocation Management
GridFTP Grid File Transfer Protocol
GRIP Grid Resource Information Protocol
GRMS GridLab Resource Management System
GSI Grid Security Infrastructure
GT4 Globus Toolkit 4
G-QoSM Grid Quality of Service Management
I3A Instituto de Investigación en Informática de Albacete
I/O Input/Output
LAN Local Area Network
LHC Large Hadron Collider
LRMS Local Resource Management System
LSF Load Sharing Facility
MAPE Monitor Analyze Plan Execute
MDS Monitoring and Discovery System
MI Millions of Instructions
MIPS Millions of Instructions Per Second
NGB NAS Grid Benchmarks
169
NPB NAS Parallel Benchmarks
NRSE Network Resource Scheduling Entity
NWS Network Weather Service
OGSA Open Grid Service Architecture
OSI Open System Interconnection
P2P Peer-to-Peer
PBS Portable Batch System
QBETS Queue Bounds Estimation from Time Series
QoS Quality of Service
RC Replanning Capacity
RS Resource Sites
RT Resource Trust
SA-Layer Scheduling in Advance Layer
SDK Software Development Kit
SJF Shortest Job First
SLA Service Level Agreement
SLURM Simple Linux Utility for Resource Management
SOA Service Oriented Architecture
SOI Service Oriented Infrastructure
TCP Transport Control Protocol
TCT Total Completion Time
UCLM University of Castilla–La Mancha
UmU University of Umeå
UNED National University of Distance Education
USS Usage Statistics Service
170 Appendix A. Acronyms
VARQ Virtual Advance Reservations Queues
VIOLA Vertically Integrated Optical testbed for Large Application
VO Virtual Organization
VP Visualization Pipeline
WSAG4J WS-AGreement for Java
WSLA Web Service Level Agreement
WS-GRAM Web Service Grid Resource Allocation Management
XML Extensible Markup Language
Bibliography
[1] Schwiegelshohn, U., Badia, R.M., Bubak, M., Danelutto, M., Dustdar, S.,
Gagliardi, F., Geiger, A., Hluchy, L., Kranzlmüller, D., Laure, E., Priol,
T., Reinefeld, A., Resch, M., Reuter, A., Rienhoff, O., Rüter, T., Sloot, P.,
Talia, D., Ullmann, K., Yahyapour, R., von Voigt, G.: Perspectives on Grid
Computing. Future Generation Computer Systems 26(8) (2010) 1104 –
1115
[2] Foster, I., Kesselman, C.: The Grid 2: Blueprint for a New Computing
Infrastructure. 2 edn. Morgan Kaufmann (2003)
[3] Castillo, C., Rouskas, G.N., Harfoush, K.: Efficient resource manage-
ment using advance reservations for heterogeneous Grids. In: Proc. of the
Intl. Parallel and Distributed Processing Symposium (IPDPS), Miami, USA
(2008)
[4] Östberg, P.O., Henriksson, D., Elmroth, E.: Decentralized, Scalable, Grid
Fairshare Scheduling (FSGrid). Future Generation Computer Systems
(submitted, 2011)
[5] Foster, I., Kesselman, C., Tuecke, S.: The Anatomy of the Grid: Enabling
Scalable Virtual Organizations. International Journal of High Performance
Computing Application 15 (August 2001) 200–222
[6] CERN. LHC Computing. Web page at http://www.interactions.org/
LHC/computing/index.html (Date of last access: 16th December, 2011)
[7] Krauter, K., Buyya, R., Maheswaran, M.: A taxonomy and survey of Grid
resource management systems for distributed computing. Software – Prac-
tice & Experience 32 (2002) 135–164
171
172 Bibliography
[8] Al-Ali, R., Sohail, S., Rana, O., Hafid, A., von Laszewski, G., Amin, K., Jha,
S., Walker, D.: Network QoS provision for distributed Grid applications.
Intl. Journal of Simulations Systems, Science and Technology, Special
Issue on Grid Performance and Dependability 5(5) (2004) 13–28
[9] Foster, I.T.: Globus Toolkit Version 4: Software for Service-Oriented Sys-
tems. In: Proc. of the Intl. Conference on Network and Parallel Computing
(NPC), Beijing, China (2005)
[10] Vázquez, C., Huedo, E., Montero, R.S., Llorente, I.M.: Federation of Ter-
aGrid, EGEE and OSG infrastructures through a metascheduler. Future
Generation Computer Systems 26(7) (2010) 979 – 985
[11] Yeo, C.S., Buyya, R.: A taxonomy of market-based resource management
systems for utility-driven cluster computing. Software – Practice & Expe-
rience 36 (2006) 1381–1419
[12] Huedo, E., Montero, R.S., Llorente, I.M.: A modular meta-scheduling ar-
chitecture for interfacing with pre-WS and WS Grid resource management
services. Future Generation Computing Systems 23(2) (2007) 252–261
[13] Czajkowski, K., Fitzgerald, S., Foster, I., Kesselman, C.: Grid informa-
tion services for distributed resource sharing. In: Proceedings of 10th
IEEE International Symposium on High Performance Distributed Com-
puting (HPDC), San Francisco, USA (2001)
[14] Portable Batch System. Web page at http://www.openpbs.org (Date of
last access: 16th December, 2011)
[15] Litzkow, M.J., Livny, M., Mutka, M.W.: Condor - A Hunter of idle work-
stations. In: Proc. of the 8th Intl. Conference on Distributed Computer
Systems (ICDCS), San Jose, USA (1988)
[16] Gentzsch, W.: Sun Grid Engine: Towards creating a compute power Grid.
In: Proc. of the First Intl. Symposium on Cluster Computing and the Grid
(CCGrid), Brisbane, Australia (2001)
[17] Zhou, S.: LSF: Load Sharing in Large-Scale Heterogeneous Distributed
Systems. In: Proc. of the Workshop on Cluster Computing. (1992)
Bibliography 173
[18] Wei, X., Ding, Z., Yuan, S., Hou, C., Li, H.: CSF4: A WSRF compliant
meta-scheduler. In: Proc. of the Intl. Conference on Grid Computing &
Applications (GCA), Las Vegas, USA (2006)
[19] Legion Project. Web page at http://legion.virginia.edu/ (Date of last
access: 16th December, 2011)
[20] Marco, C., Fabio, C., Alvise, D., Antonia, G., Francesco, G., Alessandro,
M., Moreno, M., Salvatore, M., Fabrizio, P., Luca, P., Francesco, P.: The
gLite Workload Management System. In: Proc. of the 4th Intl. Conference
on Advances in Grid and Pervasive Computing (GPC), Geneva, Switzerland
(2009)
[21] EGEE project (Enabling Grids for E-science in Europe). Web page at
http://public.eu-egee.org/ (Date of last access: 16th December,
2011)
[22] Frey, J., Tannenbaum, T., Livny, M., Foster, I., Tuecke, S.: Condor-G:
A Computation Management Agent for Multi-Institutional Grids. Cluster
Computing 5 (2002) 237–246
[23] Romberg, M.: The UNICORE Grid infrastructure. Scientific Programming
Special Issue on Grid Computing 10 (April 2002) 149–157
[24] Bank, J., Werner, F.: Heuristic algorithms for unrelated parallel machine
scheduling with a common due date, release dates, and linear earliness
and tardiness penalties. Mathematical and Computer Modelling 33(4-5)
(2001) 363 – 383
[25] Foster, I., Kesselman, C., Tsudik, G., Tuecke, S.: A security architecture
for computational Grids. In: Proc. of the 5th ACM conference on Computer
and Communications Security. (1998)
[26] Allcock, J.B.B., Bresnahan, J.: GridFTP protocol specification. In: In SGF
GridFTP Working Group Document. (2002)
[27] GridWay Project. Web page at http://www.gridway.org/ (Date of last
access: 16th December, 2011)
174 Bibliography
[28] Distributed Systems Architecture (DSA) Research Group at Universidad
Complutense de Madrid (UCM). Web page at http://dsa-research.org/
(Date of last access: 16th December, 2011)
[29] Kurowski, K., Ludwiczak, B., Nabezyski, J., Oleksiak, A., Pukacki, J.: Dy-
namic Grid scheduling with job migration and rescheduling in the GridLab
resource management system. Scientific Programming 12(4) (2004) 263–
273
[30] Venugopal, S., Buyya, R., Winton, L.J.: A Grid service broker for schedu-
ling e-Science applications on global data Grids. Concurrency and Com-
putation: Practice and Experience 18(6) (may 2006) 685–699
[31] Merlo, A., Clematis, A., Corana, A., Gianuzzi, V.: Quality of Service on
Grid: Architectural and methodological issues. Concurrency and Compu-
tation: Practice & Experience 23 (2011) 745–766
[32] Ali, R.A., Rana, O., von Laszewski, G., Hafid, A., Amin, K., Walker, D.: A
Model for Quality-of-Service Provision in Service Oriented Architectures.
Journal of Grid and Utility Computing (2005)
[33] Foster, I., Kesselman, C., Lee, C., Lindell, B., Nahrstedt, K., Alain: A dis-
tributed resource management architecture that supports advance reser-
vations and co-allocation. In: Proc. of the Intl. Workshop on Quality of
Service, London, England (1999)
[34] Seidel, J., W aldrich, O., Ziegler, W., Wieder, P., Yahyapour, R.: Using SLA
for resource management and scheduling - A survey. Technical Report
CoreGRID TR-0096, Institute on Resource Management and Scheduling
(2007)
[35] Conejero, J., Tomás, L., Carrión, C., Caminero, B.: Differentiated QoS in
Grids supported by SLAs. In: Proc. of the 9th Intl. Workshop on Middle-
ware for Grids, Clouds and e-Science (MGC), in conjunction with the 12th
Intl. Middleware Conference, Lisbon, Portugal (2011)
[36] Buyya, R., Abramson, D., Giddy, J.: An Economy Driven Resource Man-
agement Architecture for Global Computational Power Grids. In: Proc.
of the Intl. Conference on Parallel and Distributed Processing Techniques
and Applications, (PDPTA), Las Vegas, USA (2000)
Bibliography 175
[37] Sulistio, A., Cibej, U., Prasad, S.K., Buyya, R.: GarQ: An efficient schedu-
ling data structure for advance reservations of Grid resources. Int. Journal
of Parallel Emergent and Distributed Systems 24(1) (2009) 1–19
[38] Castillo, C., Rouskas, G.N., Harfoush, K.: On the Design of Online Sche-
duling Algorithms for Advance Reservations and QoS in Grids. In: Proc.
of the Intl. Parallel and Distributed Processing Symposium (IPDPS), Los
Alamitos, USA (2007)
[39] Battré, D., Hovestadt, M., Kao, O., Keller, A., Voss, K.: Planning-based
Scheduling for SLA-awareness and Grid Integration. In: Proc. of the 26th
Workshop of the UK Planing and scheduling Special Interest Group (Plan-
SIG2007), Prague, Czech Republic (2007)
[40] Adami, D., Giordana, S., Repeti, M., Coppola, M., Laforenza, D., Tonel-
lotto, N.: Design and Implementation of a Grid Network-Aware Resource
Broker. In: Proc. of the Intl. Conference on Parallel and Distributed Com-
puting and Networks, Innsbruck, Austria (2006)
[41] Xhafa, F., Abraham, A.: Computational models and heuristic methods for
Grid scheduling problems. Future Generation Computer Systems 26(4)
(2010) 608 – 621
[42] Guan, D., Cai, Z., Kong, Z.: Provision and analysis of QoS for distributed
Grid applications. In: Proc. of the 5th Intl. Conference on Wireless commu-
nications, networking and mobile computing (WiCOM). (2009) 4191–4194
[43] Chu, H.H., Nahrstedt, K.: CPU service classes for multimedia applica-
tions. In: Proc. of Intl. Conference on Multimedia Computing and Systems
(ICMCS), Florence, Italy (1999)
[44] Mateescu, G.: Extending the Portable Batch System with preemptive job
scheduling. In: SC2000: High Performance Networking and Computing,
Dallas, USA (2000)
[45] Cárdenas, C., Gagnaire, M.: Evaluation of Flow-Aware Networking (FAN)
architectures under GridFTP traffic. Future Generation Computer Sys-
tems 25(8) (2009) 895 – 903
176 Bibliography
[46] Caminero, A., Rana, O., Caminero, B., Carrión, C.: Performance eval-
uation of an autonomic network-aware metascheduler for Grids. Con-
currency and Computation: Practice and Experience 21(13) (2009) 1692–
1708
[47] Palmieri, F.: Network-aware scheduling for real-time execution support in
data-intensive optical Grids. Future Generation Computer Systems 25(7)
(2009) 794 – 803
[48] Wolski, R., Spring, N.T., Hayes, J.: The Network Weather Service: A dis-
tributed resource performance forecasting service for metacomputing. Fu-
ture Generation Computer Systems 15(5–6) (1999) 757–768
[49] Sulistio, A.: Advance Reservation and Revenue-based Resource Manage-
ment for Grid Systems. PhD thesis, Department of Computer Science and
Software Engineering, The University of Melbourne, Australia (May 2008)
[50] Sulistio, A., Schiffmann, W., Buyya, R.: Advanced reservation-based sche-
duling of task graphs on clusters. In: Proc. of the 13th Intl. Conference on
High Performance Computing (HiPC, Bangalore, India (2006)
[51] GWD-I, Global Grid Forum (GGF): Advance reservations: State of the art.
J. MacLaren (2003) http://www.ggf.org , Date of last access: 16th De-
cember, 2011.
[52] Jackson, D., Snell, Q., Clement, M.: Core Algorithms of the Maui Sche-
duler. In Feitelson, D., Rudolph, L., eds.: Job Scheduling Strategies for
Parallel Processing. Volume 2221 of Lecture Notes in Computer Science.
Springer Berlin / Heidelberg (2001) 87–102
[53] Roy, A., Sander, V.: GARA: A Uniform Quality of Service Architecture. In:
Grid Resource Management. Kluwer Academic Publishers (2003) 377–394
[54] Siddiqui, M., Villazón, A., Fahringer, T.: Grid capacity planning with
negotiation-based advance reservation for optimized QoS. In: Proc. of the
Conference on Supercomputing (SC), Tampa, USA (2006)
[55] Waldrich, O., Wieder, P., Ziegler, W.: A meta-scheduling service for co-
allocating arbitrary types of resources. In: Proc. of the 6th Intl. Conference
on Parallel Processing and Applied Mathematics (PPAM), Poznan, Poland
(2005)
Bibliography 177
[56] Smith, W., Foster, I., Taylor, V.: Scheduling with advanced reservations.
In: Proc. of the 14th Intl. Parallel and Distributed Processing Symposium
(IPDPS), Washington, USA (2000)
[57] Qu, C.: A Grid Advance Reservation Framework for Co-allocation and Co-
reservation Across Heterogeneous Local Resource Management Systems.
In: Proc. of 7th Intl. Conference on Parallel Processing and Applied Math-
ematics (PPAM), Gdansk, Poland (2007)
[58] Elmroth, E., Tordsson, J.: Grid resource brokering algorithms enabling
advance reservations and resource selection based on performance pre-
dictions. Future Generation Computing Systems 24(6) (2008) 585–593
[59] Singh, G., Kesselman, C., Deelman, E.: A provisioning model and its
comparison with best-effort for performance-cost optimization in Grids.
In: Proc. of the 16th Intl. symposium on High Performance Distributed
Computing (HPDC), Monterey, USA (2007)
[60] Tomás, L., Caminero, A.C., Carrión, C., Caminero, B.: Network-aware
meta-scheduling in advance with autonomous self-tuning system. Future
Generation Computer Systems 27(5) (2011) 486 – 497
[61] Brown, R.: Calendar queues: a fast 0(1) priority queue implementation
for the simulation event set problem. Communications of the ACM 31(10)
(1988) 1220–1227
[62] Brodnik, A., Nilsson, A.: A Static Data Structure for Discrete Advance
Bandwidth Reservations on the Internet. In: Proc. of Swedish National
Computer Networking Workshop (SNCNW), Stockholm, Sweden (2003)
[63] Nurmi, D., Brevik, J., Wolski, R.: QBETS: Queue Bounds Estimation from
Time Series. In: Proc. of 13th Intl. Workshop on Job Scheduling Strategies
for Parallel Processing (JSSPP), Seattle, USA (2007)
[64] Nurmi, D., Wolski, R., Brevik, J.: VARQ: Virtual Advance Reservations
for Queues. In: Proc. of 17th Intl. Symposium on High-Performance Dis-
tributed Computing (HPDC), Boston, USA (2008)
[65] Dobber, M., van der Mei, R., Koole, G.: A prediction method for job run-
times on shared processors: Survey, statistical analysis and new avenues.
Performance Evaluation 64(7-8) (2007) 755–781
178 Bibliography
[66] Dinda, P.A.: The statistical properties of host load. Scientific Programming
7(3-4) (1999) 211–229
[67] Jin, H., Shi, X., Qiang, W., Zou, D.: An adaptive meta-scheduler for data-
intensive applications. Intl. Journal of Grid and Utility Computing 1(1)
(2005) 32–37
[68] Zhang, Y., Sun, W., Inoguchi, Y.: Predict task running time in Grid envi-
ronments based on CPU load predictions. Future Generation Computing
Systems 24(6) (2008) 489–497
[69] Gehr, J., Schneider, J.: Measuring Fragmentation of Two-Dimensional
Resources Applied to Advance Reservation Grid Scheduling. In: Proc. of
the 9th Intl. Symposium on Cluster Computing and the Grid (CCGRID),
Shanghai, China (2009)
[70] Johnstone, M.S., Wilson, P.R.: The memory fragmentation problem:
solved? SPNOTICES: ACM SIGPLAN Notices 34(3) (1999) 26–36
[71] Wilson, P.R., Johnstone, M.S., Neely, M., Boles, D.: Dynamic storage
allocation: A survey and critical review. In: Proc. of the Intl. Workshop on
Memory Managment (IWMM), Kinross, UK (1995)
[72] De Assunção, M.D., Buyya, R.: Performance analysis of multiple site
resource provisioning: effects of the precision of availability information.
In: Proc. of the 15th Intl. Conference on High Performance Computing
(HiPC), Bangalore, India (2008)
[73] Figuerola, S., Ciulli, N., De Leenheer, M., Demchenko, Y., Ziegler, W.,
Binczewski, A.: Phosphorus: single-step on-demand services across
multi-domain networks for e-Science. In: Proc. of the European Confer-
ence and Exhibition on Optical Communication, Berlin, Germany (2007)
[74] Elmroth, E., Tordsson, J.: A standards-based Grid resource brokering
service supporting advance reservations, coallocation and cross-Grid in-
teroperability. Concurrency and Computation: Practice and Experience.
21(18) (2009) 2298 – 2335
[75] Caniou, Y., Charrier, G., Desprez, F.: Analysis of Tasks Reallocation in a
Dedicated Grid Environment. In: Proc. of the Intl. Conference on Cluster
Computing (CLUSTER), Heraklion, Greece (2010)
Bibliography 179
[76] Kephart, J.O., Chess, D.M.: The vision of autonomic computing. Com-
puter 36(1) (2003) 41–50
[77] Parashar, M.: Autonomic Grid Computing. Autonomic Computing – Con-
cepts, Requirements, Infrastructures, Editors: M. Parashar and S. Hariri,
CRC Press (2006)
[78] Dobson, S., Denazis, S.G., Fernández, A., Gaïti, D., Gelenbe, E., Massacci,
F., Nixon, P., Saffre, F., Schmidt, N., Zambonelli, F.: A survey of autonomic
communications. ACM TAAS 1(2) (2006) 223–259
[79] Dong, X., Hariri, S., Xue, L., Chen, H., Zhang, M., Pavuluri, S., Rao, S.:
Autonomia: an autonomic computing environment. In: Proc. of the IEEE
Intl. Conference on Performance, Computing and Communications. (2003)
[80] Liu, H., Parashar, M., Hariri, S.: A component-based programming model
for autonomic applications. In: Proc. of the Intl. Conference on Autonomic
Computing (ICAC), New York, USA (2004)
[81] Abawajy, J.H.: Autonomic Job Scheduling Policy for Grid Computing. In:
Proc. of the 5th Intl. Conference on Computational Science (ICCS), Atlanta,
USA (2005)
[82] Nou, R., Julií, F., Hogan, K., Torres, J.: A path to achieving a self-managed
Grid middleware. Future Generation Computer Systems 27(1) (2011) 10–
19
[83] Theilmann, W., Baresi, L.: Towards the Future Internet. In: Multi-
level SLAs for Harmonized Management in the Future Internet. IOS Press
(2009) 193–202
[84] Brandic, I., Music, D., Dustdar, S., Venugopal, S., Buyya, R.: Advanced
QoS Methods for Grid Workflows Based on Meta-Negotiations and SLA-
Mappings. In: Proc. of the 3rd Workshop on Work ows in Support of
Large-Scale Science, Austin, USA (2008)
[85] Ejarque, J., de Palol, M., Goiri, I.n., Julià, F., Guitart, J., Badia, R.M.,
Torres, J.: Exploiting semantics and virtualization for SLA-driven resource
allocation in service providers. Concurrency and Computation: Practice
and Experience 22(5) (2010) 541–572
180 Bibliography
[86] Andrieux, A., Czajkowski, K., Dan, A., Keahey, K., Ludwig, H., Nakata, T.,
Pruyne, J., Rofrano, J., Tuecke, S., Xu, M.: Web Services Agreement Spec-
ification (WS-Agreement). GFD-R-P.192. Technical report (October 2011)
[87] Waeldrich, O., Battré, D., Brazier, F., Clark, K., Oey, M., Papaspyrou, A.,
Wieder, P., Ziegler, W.: WS-Agreement Negotiation Version 1.0. GFD-R-
P.193. Technical report (October 2011)
[88] Lamanna, D.D., Skene, J., Emmerich, W.: SLAng: A Language for Defining
Service Level Agreements. In: Proc. of the Intl. Workshop of Future Trends
of Distributed Computing Systems, Los Alamitos, USA (2003)
[89] WSLA: Web Service Level Agreements. Web page at http://www.
research.ibm.com/wsla/ (Date of last access: 16th December, 2011)
[90] Parkin, M., Badia, R.M., Martrat, J.: A Comparison of SLA Use in Six
of the European Commissions FP6 Projects. Technical Report TR-0129,
Institute on Resource Management and Scheduling, CoreGRID - Network
of Excellence (2008)
[91] SLA at SOI. Web page at http://sla-at-soi.eu/ (Date of last access:
16th December, 2011)
[92] Battré, D., Djemame, K., Gourlay, I., Hovestadt, M., Kao, O., Padgett, J.,
Voβ, K., Warneke, D.: Assessgrid strategies for provider ranking mech-
anisms in risk-aware grid systems. In: Proc. of the 5th Intl. Workshop
on Grid Economics and Business Models (GECON), Las Palmas de Gran
Canaria, Spain (2008)
[93] EU-Brein. Web page at http://www.eu-brein.com/ (Date of last access:
16th December, 2011)
[94] WSAG4J - WS-Agreement framework for Java. Web page at http://
packcs-e0.scai.fraunhofer.de/wsag4j/ (Date of last access: 16th De-
cember, 2011)
[95] Snelling, D.F., Anjomshoaa, A., Wray, F., Basermann, A., Fisher, M., Sur-
ridge, M., Wieder, P.: NextGRID Architectural Concepts. In: Proc. of the
CoreGRID Symposium, Rennes, France (2007)
Bibliography 181
[96] Dumitrescu, C., Foster, I.: GRUBER: A Grid Resource Usage SLA Broker.
In: Proc. of the 11th Intl. Conference on Parallel Computing (Euro-Par),
Lisbon, Portugal (2005)
[97] Kay, J., Lauder, P.: A fair share scheduler. Commun. ACM 31(1) (1988)
44–55
[98] Yoo, A., Jette, M., Grondona, M.: SLURM: Simple Linux Utility for Re-
source Management. In Feitelson, D., Rudolph, L., Schwiegelshohn, U.,
eds.: Job Scheduling Strategies for Parallel Processing. Volume 2862 of
Lecture Notes in Computer Science. Springer Berlin / Heidelberg (2003)
44–60
[99] Krawczyk, S., Bubendorfer, K.: Grid resource allocation: allocation mech-
anisms and utilisation patterns. In: Proc. of the 6th Australasian work-
shop on Grid computing and e-Research (AusGrid), Darlinghurst, Aus-
tralia (2008)
[100] De Jongh, J.: Share scheduling in distributed systems. PhD thesis, Delft
Technical University (2002)
[101] Dafouli, E., Kokkinos, P., Varvarigos, E.A.: Fair Execution Time Estima-
tion Scheduling in Computational Grids. In Kacsuk, P., Lovas, R., Németh,
Z., eds.: Distributed and Parallel Systems. Springer US (2008) 93–104
[102] Doulamis, N., Varvarigos, E., Varvarigou, T.: Fair Scheduling Algorithms
in Grids. IEEE Transactions on Parallel and Distributed Systems 18
(2007) 1630–1648
[103] Austin, J., Jackson, T., Fletcher, M., Jessop, M., Cowley, P., Lobner, P.:
Predictive Maintenance: Distributed Aircraft Engine Diagnostics. In: The
Grid 2: Blueprint For A New Computing Infrastructure. Elsevier Science
(2004)
[104] Marchese, F.T., Brajkovska, N.: Fostering asynchronous collaborative vi-
sualization. In: Proc. of the 11th Intl. Conference on Information Visual-
ization, Zürich, Switzerland (2007)
[105] Kalekar, P.S.: Time series Forecasting using Holt-Winters Exponential
Smoothing. Technical report, Kanwal Rekhi School of Information Tech-
nology (2004)
182 Bibliography
[106] Fitzgerald, S., Foster, I., Kesselman, C., von Laszewski, G., Smith, W.,
Tuecke, S.: A directory service for configuring high-performance dis-
tributed computations. In: Proc. of 6th Symposium on High Performance
Distributed Computing (HPDC), Portland, USA (1997)
[107] Massie, M.L., Chun, B.N., Culler, D.E.: The Ganglia distributed monitor-
ing system: Design, implementation, and experience. Parallel Computing
30(5-6) (2004) 817–840
[108] NLANR/DAST : Iperf - The TCP/UDP Bandwidth Measurement Tool. Web
page at http://dast.nlanr.net/Projects/Iperf/ (Date of last access:
16th December, 2011)
[109] Sohail, S., Pham, K.B., Nguyen, R., Jha, S.: Bandwidth Broker Imple-
mentation: Circa-Complete and Integrable. Technical report, School of
Computer Science and Engineering, The University of New South Wales
(2003)
[110] Tomás, L., Caminero, A., Caminero, B., Carrión, C.: Using network in-
formation to perform meta-scheduling in advance in Grids. In: Proc. of
the 16th Intl. Conference on Parallel Computing (Euro-Par), Ischia, Italy
(2010)
[111] Tomás, L., Caminero, A., Carrión, C., Caminero, B.: Exponential Smooth-
ing for network-aware meta-scheduler in advance in Grids. In: Proc. of
the 6th Intl. Workshop on Scheduling and Resource Management on Par-
allel and Distributed Systems (SRMPDS), in conjunction with the 39th Intl.
Conference on Parellel Processing (ICPP), San Diego, USA (2010)
[112] Tomás, L., Caminero, A., Caminero, B., Carrión, C.: Studying the Influ-
ence of Network-Aware Grid Scheduling on the Performance Received by
Users. In: Proc. of the Grid computing, high-performAnce and Distributed
Applications (GADA), Monterrey, Mexico (2008)
[113] Tomás, L., Caminero, A., Caminero, B., Carrión, C.: Improving GridWay
with Network Information: Tuning the Monitoring Tool. In: Proc. of the
High Performance Grid Computing Workshop (HPGC), held jointly with
the Intl. Parallel & Distributed Processing Symposium (IPDPS), Roma, Italy
(2009)
Bibliography 183
[114] Stevens, W.R.: TCP/ IP Illustrated: The Protocols. Addison-Wesley (1994)
[115] Chun, G., Dail, H., Casanova, H., Snavely, A.: Benchmark probes for Grid
assessment. In: Proc. of 18th Intl. Parallel and Distributed Processing
Symposium (IPDPS), Santa Fe, New Mexico (2004)
[116] Frumkin, M., Van der Wijngaart, R.: NAS Grid Benchmarks: a tool for
Grid space exploration. In: Proc. of 10th IEEE Intl. Symposium on High
Performance Distributed Computing. (2001)
[117] The NAS Parallel Benchmark. Web page at http://www.nas.nasa.gov/
Resources/Software/npb.html (Date of last access: 16th December,
2011)
[118] Vazquez-Poletti, J., Huedo, E., Montero, R., Llorente, I.: A Comparison
Between two Grid Scheduling Philosophies: EGEE WMS and GridWay.
Multiagent and Grid Systems 3(4) (2007)
[119] GSA-RG: Grid scheduling use cases. Technical Report GFD-I.064, Global
Grid Forum (2006)
[120] Grimme, C.: Grid metaschedulers: An overview and up-to-date solutions.
Technical report, University of Dortmund (2006)
[121] Tomás, L., Caminero, A.C., Rana, O., Carrión, C., Caminero, B.: A
GridWay-based autonomic network-aware metascheduler. Future Gener-
ation Computer Systems, In Press
[122] Kalantari, M., Akbari, M.K.: Grid performance prediction using state-
space model. Concurrency and Computation: Practice and Experience.
21(9) (2009) 1109–1130
[123] Gong, L., he Sun, X., Member, S., Watson, E.F.: Performance modeling
and prediction of non-dedicated network computing. IEEE Transactions
on Computers 51 (2002) 1041–1055
[124] Vadhiyar, S.S., Dongarra, J.J.: A Performance Oriented Migration Frame-
work For The Grid. In: Proc. of the 3rd Intl. Symposium on Cluster Com-
puting and the Grid (CCGrid), Tokyo, Japan (2003)
184 Bibliography
[125] Wieczorek, M., Siddiqui, M., Villazon, A., Prodan, R., Fahringer, T.: Apply-
ing Advance Reservation to Increase Predictability of Workflow Execution
on the Grid. In: Proc. of the 2nd Intl. Conference on e-Science and Grid
Computing (e-Science), Washington, USA (2006)
[126] Tanwir, S., Battestilli, L., Perros, H.G., Karmous-Edwards, G.: Dy-
namic scheduling of network resources with advance reservations in opti-
cal Grids. Int. Journal of Network Management 18(2) (2008) 79–105
[127] Barz, C., Martini, P., Pilz, M., Purnhagen, F.: Experiments on Network
Services for the Grid. In: Proc. of the 32nd Conference on Local Computer
Networks (LCN), Washington, USA (2007)
[128] Caminero, A., Rana, O., Caminero, B., Carrión, C.: Network-aware heuris-
tics for inter-domain meta-scheduling in Grids. Journal of Computing and
System Sciences 77(2) (2011) 262 – 281
[129] de Assunção, M.D., Buyya, R., Venugopal, S.: InterGrid: A case for inter-
networking islands of Grids. Concurrency and Computation: Practice and
Experience 20(8) (2008) 997–1024
[130] Xiong, Z., Yang, Y., Zhang, X., Zeng, M.: Grid resource aggregation in-
tegrated P2P mode. In: Proc. of the 4th Intl. Conference on Intelligent
Computing (ICIC), Shanghai, China (2008)
[131] Stefano, A.D., Morana, G., Zito, D.: A P2P strategy for QoS discovery
and SLA negotiation in Grid environment. Future Generation Computer
Systems 25(8) (2009) 862 – 875
[132] Litke, A., Konstanteli, K., Andronikou, V., Chatzis, S., Varvarigou, T.:
Managing Service Level Agreement contracts in OGSA-based Grids. Fu-
ture Generation Computer Systems 24(4) (2008) 245 – 258
[133] Conejero, J., Tomás, L., Carrión, C., Caminero, B.: A SLA-based Meta-
Scheduling in Advance System to Provide QoS in Grid Environments. In:
Proc. of the 5th Iberian Grid Infraestructure Conference (Ibergrid), San-
tander, Spain (2011)
[134] Conejero, J., Tomás, L., Carrión, C., Caminero, B.: QoS Provisioning
with Meta-Scheduling in Advance within SLA-based Grid Environments.
Computing and Informatics, In Press
Bibliography 185
[135] Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to
algorithms. McGraw-Hill Book Company, Cambridge, London (2001)
[136] Huedo, E., Montero, R.S., Llorente, I.M.: A framework for adaptive execu-
tion in Grids. Software: Practice and Experience 34(7) (2004) 631–651
[137] Dobber, M., van der Mei, R., Koole, G.: A prediction method for job run-
times on shared processors: Survey, statistical analysis and new avenues.
Performance Evaluation 64(7-8) (2007) 755 – 781
[138] The R Foundation. Web page at http://www.r-project.org/ (Date of
last access: 16th December, 2011)
[139] Tomás, L., Caminero, A., Carrión, C., Caminero, B.: Addressing Resource
Fragmentation in Grids Through Network-Aware Meta-Scheduling in Ad-
vance. In: Proc. of the 11th Intl. Symposium on Cluster, Cloud, and Grid
Computing (CCGrid), Newport Beach, USA (2011)
[140] Tomás, L., Caminero, A., Carrión, C., Caminero, B.: A Strategy to Improve
Resource Utilization in Grids Based on Network–Aware Meta-Scheduling
in Advance. In: Proc. of the 12th IEEE/ACM Intl. Conference on Grid
Computing (Grid), Lyon, France (2011)
[141] Farooq, U., Majumdar, S., Parsons, E.W.: Efficiently Scheduling Advance
Reservations in Grids. Technical report, Carleton University, Department
of Systems and Computer Engineering (2005)
[142] Tomás, L., Östberg, P.O., Carrión, C., Caminero, B., Elmroth, E.: An
Adaptable In–Advance and Fairshare Meta-Scheduling Architecture to Im-
prove Grid QoS. In: Proc. of the 12th IEEE/ACM Intl. Conference on Grid
Computing (Grid), Lyon, France (2011)
[143] Venugopal, S., Buyya, R.: An SCP-based heuristic approach for scheduling
distributed data-intensive applications on global Grids. Journal of Parallel
and Distributed Computing 68(4) (2008) 471–487
186 Bibliography