load balancing in communication constrained distributed...
TRANSCRIPT
Load Balancing in CommunicationConstrained Distributed Systems: A
Probabilistic Approach
by
Sagar Dhakal
B.E., Electrical and Electronics Engineering, Birla Institute ofTechnology, 2001
M.S., Electrical Engineering, University of New Mexico, 2003
DISSERTATION
Submitted in Partial Fulfillment of the
Requirements for the Degree of
Doctor of Philosophy
Engineering
The University of New Mexico
Albuquerque, New Mexico
December, 2006
c©2006, Sagar Dhakal
iii
Dedication
To my parents and my wife
iv
Acknowledgments
I would like to thank my advisor Prof. Majeed M. Hayat for his continual guidance
and encouragement in my research work. It has been a privilege working with him
for the past four and a half years. I will always admire his analytical approach to
attack hard problems and ability to solve them rigorously. In addition, I have enjoyed
and learned immensely from his classroom lectures, which are always well-prepared
and theoretically deep. I also appreciate the financial support that he has provided
during my graduate study. Further, I thank him for giving me the opportunities to
attend and present my research findings in international conferences.
I thank Prof. James Ellison for accepting to be in my dissertation committee and
sharing his expertise in the fields of probability theory and differential equations. His
deep academic grasp, excellent teaching skill and friendly nature has always been a
great inspiration to me. I have learnt a lot from his probability theory and measure
theory lectures and homework assignments. I would also like to thank Prof. Balu
Santhanam, Prof. Chaouki T. Abdallah, Prof. Sudharman K. Jayaweera and Prof.
Yasamin Mostofi for being in my committee, reading my dissertation and providing
useful suggestions during and after the defense exam.
v
I take this opportunity to thank Mr. Jorge E. Pezoa, Mr. Cundong Yang and Mr.
Mohamed Elyas, who are my colleagues from the Load Balancing Group at UNM,
for their helpful suggestions. In particular, I am grateful to Mr. Pezoa for helping
me with this work. Working with him has always been enjoyable and at times both
of us have found ourselves to be more creative and efficient while collaborating with
each other.
I would also like to thank my parents, whose blessings and encouragement have
lead to the successful completion of this work. Finally, I would like to extend my
warmest gratitude to my wife for being nice, supportive and patient during hard and
busy times. She also deserves a special mention for proof-reading the manuscript.
This work has been supported by Prof. Majeed M. Hayat’s National Science
Foundation Information Technology Research (ITR) Grants No. ANI-0312611 and
ANI-0312182.
vi
Load Balancing in CommunicationConstrained Distributed Systems: A
Probabilistic Approach
by
Sagar Dhakal
ABSTRACT OF DISSERTATION
Submitted in Partial Fulfillment of the
Requirements for the Degree of
Doctor of Philosophy
Engineering
The University of New Mexico
Albuquerque, New Mexico
December, 2006
Load Balancing in CommunicationConstrained Distributed Systems: A
Probabilistic Approach
by
Sagar Dhakal
B.E., Electrical and Electronics Engineering, Birla Institute of
Technology, 2001
M.S., Electrical Engineering, University of New Mexico, 2003
PhD, Engineering, University of New Mexico, 2006
Abstract
The effectiveness of cooperative computing in a distributed, reconfigurable environ-
ment depends upon appropriate utilization of the available computing and commu-
nication resources. Any cooperative strategy for load distribution among nodes is
called load balancing. On one hand, the overall computing power of the system
may be improved by distributing workloads to all constituent nodes in proportion
to their loads and processing rates. On the other hand, however, it may seem more
prudent to assign most of the incoming loads to the reliable nodes in order to im-
prove the robustness of the system. At the same time, since nodes are connected by
means of shared communication medium, such as the Internet or a wireless local area
network (WLAN), load-transfer activities are not instantaneous but assume finite de-
lays. Moreover, delays incurred in bandwidth-limited mediums, such as the wireless
viii
infrastructure-based network and the wireless ad-hoc networks, are random, making
their accurate prediction impossible. The presence of such random delays in inter-
node communication can work against the benefits of load balancing in two principal
ways: (1) the load-balancing decision will rely upon dated information about the
load-state of the systems, and (2) any load being transferred will remain in transit in
the network for a random amount of time, thereby postponing the intended benefi-
cial effect of load balancing. These two factors may create system instabilities where
loads are transferred unduly between nodes, thereby increasing the service time. In
addition, due to the possibility of spontaneous or attack-induced node failures, the
number of functional nodes is also dynamically changing in a random fashion. This
together with the occurrences of random communication delays will introduce un-
certainties in the node-fault detection and correction mechanisms. In summary, a
cooperative distributed computing system is an unpredictable, reconfigurable envi-
ronment, whose performance must be optimized in a stochastic framework. To this
end, designing an effective load-balancing policy for such systems is a constrained
optimization problem that aims to maximize the usage of computing resources and
the overall reliability of the system while minimizing communication overhead.
In this dissertation, a novel queuing approach, based on stochastic regeneration,
is formulated to analyze the joint evolution of the distributed queues corresponding
to a multidimensional queuing model of a general cooperative distributed system.
The model specifically considers the randomness and heterogeneity in processing
times of the nodes, randomness in delays in the communication network, and un-
certainty in the number of functional nodes. Coupled renewal equations are derived
for ceratin classes of static and dynamic load-balancing policies. In order to reduce
the computational overhead associated with the scalability of optimal load-balancing
policy to large-scale systems, a sub-optimal approach for dynamic load balancing is
also proposed and tested. Our approach is general and can be adapted to resilient
communication networks, routing in wireless networks, and wireless sensor networks.
ix
The performance of proposed load balancing policies are evaluated using analyt-
ical, experimental and Monte-Carlo simulation methods. In particular, the interplay
between the optimal amount of load-transfers between nodes, node-failure/recovery
rates, and the average load-transfer delays are rigorously investigated. The perfor-
mance of the proposed dynamic load-balancing policy is compared to that of existing
static and dynamic load-balancing policies. Additionally, the theory is applied to a
distributed wireless-sensor network and the interplay between the total service time
and the energy consumption of each sensor is shown.
x
Contents
List of Figures xv
List of Tables xvii
Glossary xviii
1 Introduction 1
1.1 Dissertation overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Overview of Load-balancing Policies 8
2.1 Classification of load-balancing policies . . . . . . . . . . . . . . . . . 8
2.1.1 Local versus global load balancing . . . . . . . . . . . . . . . . 9
2.1.2 Static versus dynamic load balancing . . . . . . . . . . . . . . 9
2.1.3 Centralized versus distributed load balancing . . . . . . . . . . 9
2.1.4 Sender-initiated versus receiver-initiated load balancing . . . . 10
xi
Contents
2.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.1 Balancing scheme for SAMR applications . . . . . . . . . . . . 10
2.2.2 Hydrodynamic algorithm . . . . . . . . . . . . . . . . . . . . . 11
2.2.3 Graph partitioning method . . . . . . . . . . . . . . . . . . . 11
2.2.4 Load balancing using queuing theory . . . . . . . . . . . . . . 12
2.2.5 Shortest-expected-delay (SED) and never-queue (NQ) policy . 12
2.2.6 Fault-tolerant schemes . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Prior work by the LB-research group at UNM . . . . . . . . . . . . . 13
3 One-shot Load Balancing Policy: A Regeneration-theory Approach 15
3.1 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.1 Definition of system-information, system-function and network
states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1.2 Regeneration time . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2 System without node-failures . . . . . . . . . . . . . . . . . . . . . . 21
3.2.1 Description of the one-shot load balancing policy . . . . . . . 22
3.2.2 Renewal equations . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3 System with permanent node-failures . . . . . . . . . . . . . . . . . . 32
3.3.1 Renewal equations . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3.2 Calculation of the initial condition . . . . . . . . . . . . . . . 41
3.4 System with recoverable node-failures . . . . . . . . . . . . . . . . . . 45
xii
Contents
3.4.1 Analysis of proactive LB policy: LBP-1 . . . . . . . . . . . . . 46
3.4.2 Analysis of reactive LB policy: LBP-2 . . . . . . . . . . . . . 52
3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4 Experimental, Theoretical and Simulation Results 55
4.1 Distributed computing system architecture . . . . . . . . . . . . . . . 56
4.2 Empirical estimation of system parameters . . . . . . . . . . . . . . . 57
4.3 System without node-failures . . . . . . . . . . . . . . . . . . . . . . 60
4.4 System with recoverable node-failures . . . . . . . . . . . . . . . . . . 63
4.5 System with permanent node-failures . . . . . . . . . . . . . . . . . . 68
4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5 Dynamic Load Balancing Policy 75
5.1 Formulation of the DLB Policy . . . . . . . . . . . . . . . . . . . . . 76
5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.2.1 Comparison to other DLB policies . . . . . . . . . . . . . . . . 84
5.3 Sub-optimal LB policy for an n-node system . . . . . . . . . . . . . . 86
5.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6 Application to Wireless Sensor Networks 90
6.1 Description of wireless sensor networks . . . . . . . . . . . . . . . . . 90
6.2 Queuing equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
xiii
Contents
6.3 LB policies for two cooperating sensors . . . . . . . . . . . . . . . . . 92
6.3.1 Extension to n cooperating sensors . . . . . . . . . . . . . . . 94
6.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
7 Future work 97
7.1 Resilient distributed networks . . . . . . . . . . . . . . . . . . . . . . 97
7.2 Wireless sensor networks . . . . . . . . . . . . . . . . . . . . . . . . . 98
Appendices 100
A Optimality of partitions in the ideal case 101
B Proof of Equation (3.24) 102
C Proof of conditional independence of random delays W′21 and Y
′1 103
D Special property of minimum of exponential random variables 105
References 107
xiv
List of Figures
2.1 Average completion time as a function of LB instant. . . . . . . . . 14
2.2 Average overall completion time as a function of LB gain. . . . . . . 14
4.1 Empirically estimated pdfs of the processing time per task for the
Transmeta Crusoe machine (top) and Intel P4 machine (bottom) as
well as their exponential approximations (solid curves). . . . . . . . 58
4.2 Empirical pdf’s of the load-information delays from the first node to
the second node obtained on the Internet (left) and on the EECE
WLAN (right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.3 Left: Average transfer delay as a function of the number of tasks
transferred between nodes. The stars are the actual realizations from
the experiments. Right: Empirical pdf of the transfer delay per task
on the Internet under a normal work-day of operation. . . . . . . . 59
4.4 Left: The AOCT as a function of LB instants for the experiments
over the Internet. The LB gain was fixed at 1. Right: Amount of
load transferred between nodes at different LB instants. . . . . . . . 61
4.5 The AOCT under different LB gains for the Internet (left) and the
WLAN (right). The LB instant was fixed at 2s. . . . . . . . . . . . . 62
xv
List of Figures
4.6 Left: The AOCT as a function of the LB gain in presence of large
transfer delay. The LB instant was fixed at 2s. Right: Theoretical
result on the optimal LB gain for different delays. . . . . . . . . . . 64
4.7 The average overall completion time as a function of the LB gain K
for the LBP-1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.8 A realization of the queues obtained from the experiments conducted
for LBP-1 and LBP-2. . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.9 The cumulative distribution function of the overall completion time
in LBP-1. The upper figure shows the case of an initial workload of
(50, 0), while the lower figure is for an initial workload of (25, 50). . . 69
4.10 Probability of success as a function of LB gain of the first node when
LB is performed at time t = 0. Stars represent the Monte-Carlo
simulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.11 Probability of success as a function of LB gain of the second node
when LB is performed at time t = 0. Stars represent the Monte-Carlo
simulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.12 Probability of success as a function of LB instant tb, while LB gains
of both nodes are kept at 0.5. . . . . . . . . . . . . . . . . . . . . . . 73
5.1 Adaptive estimation of the average transfer delay per task. . . . . . 81
5.2 One realization of the queues under a static LB policy using a fixed
gain K = 1 (left) and DLB policy (right). . . . . . . . . . . . . . . . 84
6.1 Expected value of the total service time (left) and the battery-energy
consumed by the sensors (right) under different LB gains. . . . . . . 94
xvi
List of Tables
4.1 Experimental results for LBP-1 using the theoretically determined
optimal LB gains. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.2 Experimental and simulation results for LBP-2. . . . . . . . . . . . . 67
4.3 Performance of the LBP-1 and the LBP-2 under different network
delays. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.1 Experimental results for dynamic and static LB policies . . . . . . . 82
5.2 Experimental results of the ACTT for DLB policies . . . . . . . . . 86
5.3 Probability of success achieved under different policies . . . . . . . . 88
6.1 The AST and the AEC under the MST, FE and FT policies. . . . . 95
xvii
Glossary
ACTT Average completion time per task.
AEC Average energy consumption.
AOCT Average overall completion time.
AST Average service time.
cdf Cumulative distribution function.
DLB Dynamic load balancing.
FE Fair-energy.
FT Fair-tradeoff.
LB Load balancing.
MC Monte-Carlo.
MPN Minimum-packets-number.
MST Minimum-service-time.
NQ Never queue.
pdf Probability density function.
xviii
Glossary
POSIX Portable operating system interface for unix.
SAMR Structure adaptive mesh refinement.
SED Shortest expected delay.
SPR System processing rate.
TCP Transmission control protocol.
UDP User datagram protocol.
UNM University of New Mexico.
WLAN Wireless local-area networks.
xix
Chapter 1
Introduction
Distributed systems typically comprise geographically dispersed processors or nodes
that can communicate over a shared network of arbitrary topology. Some examples
of distributed systems are grid-computing systems, distributed telecommunication
systems, wireless sensor networks and embedded systems. Quality of service in a
distributed system can be improved by allowing its constituent nodes to work co-
operatively. For example, in distributed grid-computing systems, nodes can jointly
process computational loads in order to decrease the overall processing time of the
load [1]. Similarly, in distributed telecommunication systems, the switching stations
(or call centers) can cooperatively handle the incoming traffic in order to minimize
the rejection of new calls while keeping acceptable response times for the admitted
calls [2].
In distributed systems, loads of different sizes (possibly corresponding to different
applications) arrive randomly in time and node space. Clearly, severe load imbalance
may occur if nodes work independently without sharing loads. However, this situa-
tion can be avoided if the overloaded nodes can transfer loads to the under-loaded
nodes for cooperative processing. Any strategy for load transfer or distribution
1
Chapter 1. Introduction
among nodes is called load balancing (LB). Ideally (if we ignore reliability of nodes
and communication constraints), an effective LB policy ensures an optimal use of
the distributed resources whereby no node remains in an idle state while any other
node is being utilized. But, there are several challenges arising from the distributed
systems that have to be addressed carefully in order to design an effective LB policy.
In many of today’s distributed-computing environments, the nodes are linked
by a delay- and bandwidth-limited communication medium that inherently inflicts
tangible delays on inter-node communications and load transfers. Examples include
distributed-systems over WLANs as well as clusters of geographically distant nodes
connected over the Internet such as PlanetLab [3]. Although the majority of LB poli-
cies developed heretofore take account of such time delays [4–8], they are predicated
on the assumption that delays are deterministic. In actuality, delays are random
in such communication media. In particular, WLANs inevitably introduce random
delays and packet losses due to scarce radio spectrum, random power fluctuations,
time-varying channel gains, and interference. Our earlier work has shown that LB
policies that do not account for the delay randomness may perform poorly in prac-
tical distributed settings where random delays are present [9–12]. For example, if
nodes have dated, inaccurate information about the state of other nodes, due to
random communication delays between nodes, then this could result in unnecessary,
periodic exchange of loads among them. Consequently, certain nodes may become
idle while loads are in transit, a condition that would result in prolonging the total
completion time of a load.
Generally, the performance of LB in delay-infested environments depends upon
the selection of LB instants as well as the amount of load-transfers allowed between
nodes. For example, if the network delay is negligible within the context of a certain
application, the best performance is achieved by allowing every node to send all its
excess load (relative to the average load per node in the system) to less occupied
2
Chapter 1. Introduction
nodes. On the other hand, in the extreme case for which the network delays are
excessively large, it would be more prudent to reduce the amount of load-transfers so
as to avoid time wasted while loads are in transit. Clearly, in a practical delay-limited
distributed-computing setting, the amount of load to be exchanged lies between these
two extremes and the amount of load-transfer has to be carefully chosen.
Another important aspect in distributed systems is the issue of fault-tolerance
and reliability of service. In general, distributed systems may utilize dynamic sets
of nodes that may join and leave the system in a random fashion. An example of
such systems is “SETI at Home” [13]. Such systems typically use dedicated as well
as dynamic nodes comprising a collection of desk-tops or portable computing devices
that are online and can be used remotely upon availability. However, these nodes can
go off-line anytime, regardless of the portion of load assigned to them. Furthermore,
the participation of any node may be interrupted by the local usage of the node by
its owner. Such scenarios induce an uncertainty in the availability of the number
of functional nodes, whereby any node may randomly fluctuate between a “failure”
(or “down”) and “working” (or “up”) states. In addition, each component of the
system, either a node or a communication link, can undergo permanent physical
failure. Due to the heterogeneous nature of a distributed system (where components
may be provided by different manufacturers), system warranty is not provided and
the issue of fault tolerance is largely left at the hands of the developers.
Generally, when a fault occurs at one point, it has to be communicated to other
components and the fault is detected upon the reception of such communications
after random amount of times. Therefore, one should expect uncertainty in the
information about the functional components of the system. Similarly, due to the
random load-transfer delays, the mechanisms that are responsible for retrieving loads
from the queues of a faulty node must also be analyzed in a statistical framework.
Most of the existing literature in distributed systems that offers analytical treatment
3
Chapter 1. Introduction
of reliability disregards uncertainties associated with the fault-detection and load-
retrieval procedures [1, 14–16]. But, these uncertainties degrade the performance of
a LB policy that does not account for them.
Finally, we should note that most distributed systems are composed of hetero-
geneous nodes with each node possibly having different processing rate. Also, due
to the unpredictable characteristics of the incoming load (or application), each node
exhibits fluctuations in run-time processing rates. Furthermore, in energy-limited
applications like wireless sensor networks, the amount of energy spent on transfer-
ring loads between nodes should also be considered. Necessary care must be exercised
so that no excessive amount of time and energy is unduly wasted in performing LB
while the collective processing power of the distributed system is utilized maximally.
In summary, to design an effective LB policy, the following inherent factors of the
distributed systems should be captured: (i) the amount of load at each node (i.e., the
queue size), (ii) the heterogeneity in the processing rates of the nodes and the run-
time variation in processing times, (iii) the randomness in delays and the bandwidth
constraints in the communication network components involved in the transfer of
loads, (iv) the uncertainty in the number of functional nodes and associated fault-
detection and load-retrieval process, and (v) the energy overhead resulting from the
collaborative nature of the nodes. To the best of our knowledge, there are no LB
policies designed with due consideration to the above mentioned factors [5].
1.1 Dissertation overview
Chapter 2 presents a brief survey of existing LB policies and also discusses our ear-
lier work in LB. The one-shot LB policy is detailed in Chapter 3 for three types
of distributed systems; namely, (1) system without node-failures, (2) system with
recoverable node-failures, and (3) system with permanent node-failures. In the one-
4
Chapter 1. Introduction
shot LB policy, once nodes are initially assigned certain amount of load, all nodes
would together execute LB only at one prescribed instant. A novel queuing approach,
based on regeneration in stochastic processes, is formulated to analyze the dynam-
ics of the distributed system evolving under a one-shot LB action. Our approach
specifically considers the heterogeneity and run-time fluctuation in processing rates
of the nodes, randomness in delays in the communication network and the failure
and recovery probabilities of each node. Coupled renewal equations characterizing
both the expected value of the overall completion time for a given initial load as well
as the probability of success in completing a given initial load have been derived for
different types of distributed systems.
Chapter 4 presents the experimental, theoretical and Monte-Carlo (MC) sim-
ulation results on the performance of one-shot LB policy in distributed systems
connected over the Internet and the WLAN. The results show that for an arbitrary
initial load, there exist an optimal amount of load-transfer and an optimal LB instant
associated with the one-shot LB policy, which together maximize the performance
of the system.
In Chapter 5, a sender-initiated distributed dynamic load balancing (DLB) policy
is presented where each node autonomously executes LB at every external load (load
originating outside the system) arrival at that node. In particular, every time an
external load arrives at a node, only the receiver node executes a locally optimal
one-shot LB action. While calculating the amount of load-transfers, the proposed
DLB policy effectively trades-off between the queuing delays and the transfer delays
in order to maximize the system performance. A comparative study shows that
the proposed DLB policy outperforms other commonly used dynamic load balancing
policies. The chapter also presents a sub-optimal yet effective and computationally
efficient DLB policy for a multi-node system.
In Chapter 6, our theory is applied to develop an optimal LB policy for energy-
5
Chapter 1. Introduction
limited distributed sensor networks. The results show that there is a fundamental
tradeoff between savings in completion time, resulting from utilizing the processing
power of a distributed system cooperatively, and the combined delay and energy
overhead resulting from the very collaborative nature of the servers. Finally, Chap-
ter 7 presents the potential application of our theoretical approach in solving complex
queuing problems associated with the resilient communication networks and wireless
sensor networks.
1.2 Contributions
Based on the multi-server queues, this dissertation presents the first analytical model
for a distributed system that specifically considers the randomness and heterogeneity
in processing times, randomness in delays in the communication network, the ran-
domness in the number of functional nodes, and the energy overhead resulting from
collaboration of nodes. Three fundamental vectors have been introduced to track the
complicated point processes associated with the joint evolution of the multiple queues.
Then, with a exponential-delays assumption, theory of stochastic regeneration for
distributed systems is detailed. Coupled renewal equations are derived for the aver-
age overall completion time and the probability of success corresponding to different
versions of one-shot LB policy, and a novel approach to dynamic load balancing is
presented subsequently.
Based on the results obtained from real-time LB experiments performed over the
WLAN and the Internet, Chapter 4 signifies the practical applicability and effec-
tiveness of the theoretical models of Chapter 3. It also analytically establishes our
earlier simulation-based notion that the one-shot LB policy can be optimized over
the amount of load-transfers between nodes and the selection of the LB instant.
Moreover, the Chapter sheds the following two valuable insights: (i) if the average
6
Chapter 1. Introduction
transfer delay per task is large compared to the average processing time per task,
the amount of load transfers have to be reduced appropriately in order to improve
the system performance, and (ii) the proactive LB policy outperforms the reactive
LB policy when the network delays are large compared to the average recovery times
of nodes and vice versa.
Chapter 5 presents a sender-initiated DLB policy that can efficiently handle ran-
domly arriving external loads. The proposed DLB policy adapts to the dynamic
environment of the distributed system and does not require synchronization among
nodes. The experimental results show that the proposed DLB policy outperforms
other commonly used DLB policies in delay-infested random environment. A sub-
optimal DLB policy is also given. Finally, in Chapter 6 the applicability of our
theoretical approach in other dynamical systems is exhibited by designing a novel
energy-aware LB policy for distributed wireless sensor network applications.
To date, our work has resulted in two journal papers [17, 18], two book chapters
[10,19] and five conference papers [11,20–23].
7
Chapter 2
Overview of Load-balancing
Policies
This Chapter begins with a brief description of different types of LB policies, which
is followed by a survey of related works in this area. Finally, prior work in LB
performed by the LB-research group at the University of New Mexico (UNM) is
summarized.
2.1 Classification of load-balancing policies
The LB policies can broadly be categorized as local versus global, static versus dy-
namic, centralized versus distributed and sender-initiated versus receiver-initiated [5].
Next, we provide a brief exposition to each category.
8
Chapter 2. Overview of Load-balancing Policies
2.1.1 Local versus global load balancing
In a local LB policy [24, 25], each node can transfer load only to a group of neigh-
boring processors, thereby minimizing remote communications. But, in a global bal-
ancing policy, a certain amount of global information is used to initiate LB among
all participant nodes. In this scheme, the load distribution cost can outweigh the
computational gain for a sufficiently large system.
2.1.2 Static versus dynamic load balancing
Static load distribution assigns load to nodes probabilistically or deterministically (as
in a round-robin fashion), without consideration of runtime events. This scheme has
a limited application in realistic distributed systems since it is generally impossible
to make predictions of arrival times of loads and processing times required for future
loads. On the other hand, in a DLB policy [26], the load distribution is made during
run-time based on current processing rates and network condition. A DLB policy can
use either local or global information. A global DLB policy is proposed in Chapter 5
of this dissertation.
2.1.3 Centralized versus distributed load balancing
Centralized schemes [27,28] store global information at a designated node. All sender
nodes (or receiver nodes) access the designated node to calculate the amount of load-
transfers as well as to identify where tasks are to be sent to (or received from). The
drawback of this scheme is that the LB is paralyzed if the particular node that con-
trols LB fails. Such centralized schemes also require synchronization among nodes.
In contrast, in a distributed LB policy, every node executes balancing autonomously.
In some cases [6, 29, 30], the idle nodes can fetch load during runtime from a shared
9
Chapter 2. Overview of Load-balancing Policies
global queue.
2.1.4 Sender-initiated versus receiver-initiated load balanc-
ing
In sender-initiated LB policy [31], the overloaded nodes transfer one or more of their
tasks to the under-loaded nodes, while in receiver-initiated LB policy [32], the under-
loaded nodes request loads from the overloaded nodes. In some cases [33, 34], both
the under-loaded as well as the overloaded nodes can initiate load transfers.
2.2 Related work
In this section, we describe some LB policies that are commonly referred in the
current literature.
2.2.1 Balancing scheme for SAMR applications
The authors propose a DLB algorithm for structured adaptive mesh refinement
(SAMR) applications on distributed systems [4]. The balancing scheme is divided
into two phases: (i) global LB phase and (ii) local LB phase. Both the local and
the global balancing phase occur periodically, but the period for global LB is much
longer than the period for local LB. First, the load redistribution cost, which includes
both the communication and the computation overhead, for global LB is heuristi-
cally evaluated. Then, the global LB is invoked only if the computational gain is
bigger than the redistribution cost by some pre-defined factor. At the local level, LB
is performed based only on the processing rates of the nodes, while the presence of
stochastic delays is compromised.
10
Chapter 2. Overview of Load-balancing Policies
2.2.2 Hydrodynamic algorithm
In the approach given in [7], each node is viewed as a liquid cylinder, where the cross-
sectional area corresponds to the buffer capacity of the node, the communication
links are modeled as liquid channels between the cylinders, the load is represented
by liquid, and the LB algorithm manages the flow of the liquid. The objective is
to reach the equilibrium state where the heights of the liquid columns are same in
all the cylinders. Load redistribution is performed so that each node obtains its
share in proportion to its capacity. A potential energy function is introduced, whose
minimum value corresponds to the state of equilibrium. It is assumed that the
communication channels induce fixed delays and LB activity is completed within a
fixed interval. Further, every LB step involves state information exchanges and load
migration among the neighboring nodes. The authors show that the global potential
energy converges geometrically to the equilibrium state.
2.2.3 Graph partitioning method
In [8], the authors consider a graph of arbitrary number of nodes having weighted
costs on edges. In load balancing, the weight of a node represents the size of the load
at that node and the cost of an edge represents the amount of data transfer between
the nodes connected over that edge. The objective is to partition the nodes into
subsets of given weights in order to minimize the sum of the costs on all edge cuts.
It is shown that finding an optimal solution using a strictly exhaustive procedure
requires an inordinate amount of computation, and therefore, solving the problem
heuristically is a quick approach to produce sub-optimal solutions. This model is
predicated by the assumption that the computation time and the communication
delays are deterministic.
11
Chapter 2. Overview of Load-balancing Policies
2.2.4 Load balancing using queuing theory
A DLB policy based on queuing theory approach is discussed in [24]. The authors
propose an algorithm that compares the local load of every node to the load of all
other nodes and migrates a task whenever the load difference between nodes is more
than one task. Every time a new task is created, the proposed policy gets triggered.
If the communication cost is too high, the migration is avoided even if the imbalance
exists. The authors have developed their analytical model for two groups each with
two processors and the LB is performed in two phases: intra-group and inter-group.
The task arrival rate and the service rate at all the nodes are assumed to be the same.
They also assume fixed communication delays. Clearly, this model is not suitable for
delay-limited distributed systems.
2.2.5 Shortest-expected-delay (SED) and never-queue (NQ)
policy
The SED policy [35] is based on the multiple-queues model, and can be performed in
a centralized as well as distributed fashion. Whenever a new task arrives at a node,
the algorithm calculates the expected service time offered by each node. Then, the
task joins the queue of the node that gives the shortest expected service time. On
the other hand, in the NQ policy [36], all the incoming tasks are assigned to the node
that has an empty queue. If more than one node have an empty queue, the SED
policy is invoked only among the nodes with the empty queues. Similarly, if none of
the queues is empty, the SED policy is invoked among all nodes.
12
Chapter 2. Overview of Load-balancing Policies
2.2.6 Fault-tolerant schemes
Checkpoint-resume or terminate-restart mechanisms are used to detect failures and
recover unprocessed tasks at the failed nodes [1, 14]. The node failure has also been
tackled by keeping multiple copies of load on different nodes [37]. But, the work of
Choi et al. [38] addresses performance of the system based only on the unreliable
resources, without addressing the cooperation between dedicated and non-dedicated
nodes.
2.3 Prior work by the LB-research group at UNM
We begin with a literary exposition to the LB model [9]. An arbitrary number of
distributed nodes having different queue sizes is considered. Initially, each node
broadcasts its queue-size information, which will be delayed by some time (referred
as communication delay) while reaching other nodes. All the nodes execute LB
together at common balancing instants (called LB instants). At the LB instant, each
node calculates its excess load by comparing its load to the total load of the system
and partitions its positive excess load among other nodes. Then, each partition is
scaled by multiplying with a common balancing gain (called LB gain K ∈ [0, 1])
before transferring them to the appropriate receiver nodes. We used the custom-
made MC simulation software [10] to evaluate the performance of this continuous
balancing policy. In our results on continuous LB policies, nodes were found to
unnecessarily exchange tasks back and forth, thereby prolonging the total completion
time. We also observed that a large LB gain (closer to one) produces a high degree
of fluctuations in the tail of the queues. Our preliminary studies revealed that for
distributed systems with realistic random communication delays, limiting the number
of balancing instants and optimizing the performance over the choice of the balancing
times as well as the LB gain at each balancing instant can result in significant
13
Chapter 2. Overview of Load-balancing Policies
improvement in computing efficiency.
The degraded performance of the continuous LB policy in presence of random de-
lays motivated us to look into the one-shot LB policy [11]. In particular, once nodes
are initially assigned certain number of tasks, all nodes would together execute LB
only at one prescribed instant using a common LB gain K. The MC simulation re-
sults showed that for a given initial load and average processing rates, there exist an
optimal LB gain and an optimal balancing instant associated with the one-shot LB
policy, which together minimize the average overall completion time. This becomes
evident from the simulation results shown in Fig. 2.1 and Fig. 2.2. Similar results
have been obtained from the real-time experiments conducted over WLAN [23].
0 1 2 3 4 5 47
48
49
50
51
52
53
54
55
56
INSTANTS FOR FIRST BALANCING(ms)
CO
MP
LE
TIO
N T
IME
(ms)
Figure 2.1: Average completion time
as a function of LB instant.
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 140
45
50
55
60
65
GAIN PARAMETER(K )
MIN
IMU
M C
OM
PLE
TIO
N T
IME
(ms)
Small Delay Case
Figure 2.2: Average overall comple-
tion time as a function of LB gain.
However, these preliminary results obtained from MC simulations were not verified
theoretically in our earlier work. Further, the performance was evaluated only on
the two-node and three-node distributed systems without considering the possibility
of node failures/recoveries as well as the energy constraints of the nodes. In addi-
tion, our prior work focused on handling only an initial load without considering
subsequent arrivals of loads.
14
Chapter 3
One-shot Load Balancing Policy:
A Regeneration-theory Approach
In a one-shot LB policy, a cluster of distributed nodes take a single synchronized
scheduling action in order to fairly distribute the load in the system. In particular,
once nodes are initially assigned certain load, each node broadcasts the information
about its local load, while it receives load-information from all other nodes. Based
on the load-information, all nodes would together execute LB at one prescribed in-
stant, called the LB instant. Each load-information packet takes a random amount
of time to reach other nodes, thereby introducing randomness in the shared infor-
mation among the nodes. This behavior is found to be prominent specifically in the
case of distributed systems over the wireless infra-structure based as well as ad-hoc
communication networks. Consequently, a node may or may not have received the
load information from other nodes by the time LB is performed. In addition, once
LB is performed, loads are exchanged between nodes over the communication net-
work, where each load transfer takes random amount of time to reach the destination
node. Therefore, the dynamics of the distributed system, evolving under a one-shot
LB policy, should be represented as a stochastic system involving a set of distrib-
15
Chapter 3. One-shot Load Balancing Policy: A Regeneration-theory Approach
uted queues [10]. In our earlier work [11], MC studies and real-time experiments
conducted over WLAN confirmed that for every initial load distribution, there exist
an optimal LB instant and optimal amount of load exchanges associated with the
one-shot LB policy, which together minimize the average overall load completion
time.
The goal of this Chapter is to develop a novel theoretical approach that can
analytically characterize the performance of a one-shot LB policy (in terms of the
LB instant and the total load exchanges) in a distributed system. The concept of
regeneration in stochastic processes [39–41] is utilized to address the joint evolution
of distributed queues, which eventually leads to the formulation of coupled renewal
equations characterizing the distributed system. This system of renewal equations
are precisely solved to calculate the optimal LB instant and the optimal amount of
load-exchanges required to maximize the performance of the system. The theoretical
approach presented in this Chapter is useful in solving complex queuing problems
that arise in other important areas such as distributed telecommunication networks,
routing in wireless networks and wireless sensor networks. It should be noted that
the numerical results pertaining to the theoretical work of this Chapter are detailed
in Chapter 4.
This Chapter is organized as follows: In Section 3.1, we formally introduce the
concept of regeneration for a general distributed system that follows exponential
distribution of delays. Next, we present three different models of distributed systems;
namely, (1) system with no node failure in Section 3.2, (2) system with permanent
node failure in Section 3.3, and (3) system with recoverable node failure in Section 3.4.
Renewal equations are derived for each model. Finally, our conclusions are given in
Section 3.5.
16
Chapter 3. One-shot Load Balancing Policy: A Regeneration-theory Approach
3.1 Theory
Consider a system of n distributed nodes connected over a network of arbitrary
topology. Let mk ≥ 0 be an integer representing the initial queue length (number
of tasks) of the kth node at time t = 0. (Throughout the dissertation, a task is the
smallest and indivisible unit of load, while load is a collection of tasks.) The service
time for each task as well as the failure times and the recovery times, if any, of all
functional nodes are assumed to be random. We assume for the moment that there
is no future arrival of external tasks (i.e., tasks that are not present in the system at
t = 0) to the distributed system for t > 0. In addition, all the nodes are assumed
to be functional at t = 0 when the load-information is also broadcasted by each
node to all other nodes. Each load-information takes a random amount of time to
reach the destination server. In order to divide the total tasks of the system among
all functional nodes, LB is performed at time tb ≥ 0 so that each functional node,
the kth node, say, transfers an amount Ljk(tb) ≥ 0 tasks to the jth node (j 6= k)
that is functional according to the knowledge of the kth node. Naturally, these task
exchanges that occur over the shared communication network take random transfer
times.
In addition, each node is equipped with a backup system that performs the fol-
lowing duties only in the event of a permanent node failure: (i) If a node becomes
faulty, its backup node immediately sends a failure-notice to each functional node.
We assume that it takes a random time for the failure-notice to reach the destination.
(ii) If a node becomes faulty, its backup system immediately distributes all unserved
tasks of the node evenly among remaining functional nodes. (iii) Upon reception of
future tasks by a faulty node (due to LB performed by other nodes at time tb and
before their knowledge of the failed state of the faulty node), the backup system of
the faulty node evenly distributes those tasks back to the remaining functional nodes.
Note that a functional node detects the failure of any remote node by either receiving
17
Chapter 3. One-shot Load Balancing Policy: A Regeneration-theory Approach
a failure-notice sent by the backup system of the remote node or by receiving the
unserved tasks of the remote node, whichever happens first.
The purpose of this Chapter is to develop a LB policy, viz., the choice of the
balancing instant tb together with the amounts Ljk(tb) of load exchanges, that max-
imizes the joint performance of n nodes.
3.1.1 Definition of system-information, system-function and
network states
Any intelligent LB action performed by any functional node at time tb should utilize
its knowledge about: (1) the initial load of other nodes; (2) the number of functional
nodes in the system; (3) loads that are in transit over the communication network.
Note that due to the random arrival times of the LIs, each functional node, the kth
node, say, may or may not have received the aforementioned LIs about other nodes
by the time it performs LB at tb. Therefore, for each kth node we assign an n-tuple
binary vector, ik, that describes its information state about the queue-lengths of all
nodes. More precisely, a “1” entry for the jth component of ik (j 6= k) indicates that
the kth node has received the load-information from the jth node. (By definition,
the kth component of ik is always “1.”) Clearly, at t = 0 when the LIs are just
broadcasted, all the entries of ik (except for the kth component) are set to 0. We
define the system-information state as the concatenated vector I4= (i1, . . . , in). For
example, in a system with two nodes (n = 2), the state I = (10, 11) at time t
corresponds to the configuration for which the first node has not received the load-
information from the second node (i1 = (10)) while the second node has received the
load-information from the first node (i2 = (11)) by the time t.
Similarly, for each kth node, we assign a binary vector fk of size n, where a “1”
(“0”) entry in the jth component at time t indicates that the jth node is functional
18
Chapter 3. One-shot Load Balancing Policy: A Regeneration-theory Approach
(faulty) as perceived by the kth node at time t. We define the concatenated vector
F4= (f1, . . . , fn) as the system-function state. Note that since all nodes are assumed
to be functional at t = 0, F has all its entries set to “1” at t = 0.
In addition, due to random transfer-delays in the shared communication network,
each group of tasks being transferred over the network has a random arrival time.
Let gk ∈ {0, 1, 2, 3, ...} represent the number of different groups of tasks that are
simultaneously in transit to the kth node at time t. We can assign a vector ck of
size gk + 1 such that each component of ck (except the first component) represents
the number of tasks in a particular group in transit, while the first component of
ck is always set to gk. We now define the network state as the concatenated vector
C4= (c1, . . . , cn). For example, in a 3-node system, C = ([2 10 1], [1 5], [0]) at time
t corresponds to the network state for which two different groups of tasks (10 tasks
in the first group and 1 task in the second group) are being transferred to the first
node (c1 = [2 10 1]), one group of 5 tasks is being transferred to the second node
(c2 = [1 5]), while there is no transfer being made to the third node (c3 = [0]).
Finally, it should be noted that the implicit dependence of I,F and C on time t
becomes evident as we progress deeper into the stochastic analysis of the underlying
queuing process.
In order to achieve an analytically tractable solution, the following assumptions
are imposed on the random times that characterize the queues generated at t = 0.
Assumption A1 (Exponential distribution of delays): The following random
variables are exponentially distributed: (i) Wki: the service time for the ith customer
of the kth functional node (with rate λdk); (ii) Yk: the failure time of the kth func-
tional node (with rate λfk); (iii) Sk: the recovery time of the kth functional node
(with rate λsk); for any j 6= k, (iv) Xkj: the arrival time of the load-information sent
from the jth node to the kth node (with rate λkj); (v) XFkj: the arrival time of the
failure-notice sent from the jth node to the kth node (with rate λFkj); and (vi) Zki:
19
Chapter 3. One-shot Load Balancing Policy: A Regeneration-theory Approach
the arrival time of the ith group of tasks sent to the kth node (with rate λ̃ki).
Assumption A2 (Independence of delays): All the random variables listed in
Assumption A1 are mutually independent.
The above assumptions are well approximated according to the empirical data
obtained from actual LB experiments (which will be shown in Chapter 4) conducted
on a distributed system over a WLAN in the context distributed computing [19,22].
In addition, we will assume that the mean transfer delay for the ith group of tasks in
transit to the kth node is λ̃−1ki = θq, where θ is an experimentally calculated channel
constant (in seconds per task), and q is the number of tasks in the ith group.
Convention C1 (Degenerate cases): The following time delays are set to ∞almost surely (a.s.): (i) service time at any node with no customer, (ii) the service
time at a faulty node, (iii) the failure time of a faulty node, (iv) the recovery time
of a functional node, (v) the load-information arrival time when there is no load-
information in transit, (vi) the failure-notice arrival time when there is no failure-
notice in transit, and (vii) the load arrival time when there is no group of tasks in
transit.
3.1.2 Regeneration time
The key idea is to introduce a special random variable, called the regeneration time,
τ , defined as the minimum of the following six random variables: the time to the first
task service by any node, the time to the first occurrence of failure at any node, the
time to the first occurrence of recovery at any node, the time to the first arrival of an
load-information at any node, the time to the first arrival of a failure-notice at any
node, or the time to the first arrival of a group of tasks at any node. More precisely,
τ4= min
(mink(Wk1), mink(Yk), mink(Sk), minj 6=k(Xkj), minj 6=k(X
Fkj), mink,i(Zki)
). Note
that in light of Assumptions A1, A2, and Convention C1, it is straightforward to see
20
Chapter 3. One-shot Load Balancing Policy: A Regeneration-theory Approach
that τ is an exponentially distributed random variable.
The key property of the regeneration time τ is that upon the occurrence of the
regeneration event {τ = s}, new queues will emerge at time s (to be proved later)
that have similar statistical properties and dynamics as their predecessors but with
new initial system condition. More precisely, the occurrence of the regeneration event
{τ = s} gives birth to a new distributed queuing system at t = s whose random times
satisfy Assumptions A1 and A2 while having its own initial system condition. The
new initial system condition can be a new initial load distribution if the regeneration
event is a service to a task, a new F and a new C if the regeneration event is a
permanent node failure or a node recovery, a new I if the regeneration event is an
arrival of a load-information, a new F if the regeneration event is an arrival of a
failure-notice, or a new load distribution and a new C if the regeneration event is an
arrival of a load.
For the queuing system that emerges at the regeneration time τ , let the random
times (all measured from τ) W′ki, Y
′k , S
′k, X
′kj, X
F ′kj and Z
′ki, respectively, be the service
time for the ith task at the kth node, the failure-time of the kth node, the recovery-
time of the kth node, the arrival time of the load-information sent from the jth node
to the kth node, the arrival time of the failure-notice sent from the jth node to the
kth node, and the arrival time of the ith group of tasks sent to the kth node. To
prove that the queues are regenerated upon the occurrence of {τ = s}, it suffices to
show that the conditional distributions of W′ki, Y
′k , S
′k, X
′kj, X
F ′kj and Z
′ki given that
the event {τ = s} has occurred, satisfy assumptions A1 and A2.
3.2 System without node-failures
In this Section we consider a distributed system where nodes are always functional
[18]. Clearly, the failure time of any functional node can now be set to infinity
21
Chapter 3. One-shot Load Balancing Policy: A Regeneration-theory Approach
almost surely. First, we present a detailed exposition to the one-shot LB policy in
Section 3.2.1, while we derive the renewal equations for the distributed system in
Section 3.2.2.
3.2.1 Description of the one-shot load balancing policy
Let Qj(t) be the number of tasks in the queue of the jth node at time t. Due to the
random delay Xjl of the load-information sent from the lth node to the jth node, the
queue length of the lth node as perceived by the jth node at time t is delayed and is
given by Ql(t−Xjl). We assume that Ql(t−Xjl) = 0 a.s. for all Xjl > t, implying
that the jth node assumes that the lth node has zero queue size if it does not receive
the load-information from the lth node by the time t. At the LB instant t = tb,
every jth node (j = 1, . . . , n) computes its excess load by comparing its local load
to the average overall load of the system. More precisely, the excess load, Lexj (tb), is
random and is given by
Lexj (tb) =
(Qj(tb)−
λdj∑nk=1 λdk
n∑
l=1
Ql(tb −Xjl)
)+
, (3.1)
where and (x)+ 4= max(x, 0). Note that the second quantity inside the parenthesis
in (3.1) is simply the fair share of node j from the totality of the loads in the system.
This is a more plausible way to calculate the excess load of a node in a heterogeneous
computing environment as compared to our earlier methods that did not consider the
processing speeds of the nodes [10, 11]. With the inclusion of the processing speed
of the nodes in (3.1), a slower node would have a larger excess load than that of a
faster node.
Moreover, the excess load has to be partitioned among the n−1 nodes by assigning
a larger portion to a node with smaller relative load. To this end, we introduce two
different approaches to calculate the partitions, denoted by pij, which represent the
22
Chapter 3. One-shot Load Balancing Policy: A Regeneration-theory Approach
fraction of the excess load of the jth node to be sent to the ith node. For the
conservation of total load, any such partition should satisfy∑n
l=1 plj = 1 where
pjj = 0 by definition.
In our first approach, the fractions pij for i 6= j, are chosen as:
pij =
1n−2
(1− λ−1
diQi(tb−Xji)P
l6=j λ−1dl
Ql(tb−Xjl)
),
∑l 6=j Ql(tb −Xjl) > 0
λdi/∑
k 6=j λdk, otherwise
(3.2)
where n ≥ 3. Clearly, a node assigns a larger partition of its excess load to a node
with a small load relative to all other candidate recipient nodes. Indeed, it is easy
to check that∑n
l=1 plj = 1. For the special case when n = 2, pij = 1 whenever i 6= j.
But observe that pij ≤ 1n−2
for any ith node. This means that the maximum size of
the partition decreases as the number of nodes in the system increases, irrespective
of the processing rates of the nodes. Therefore, this partition may not be effective in
a scenario where some nodes may have very high processing rates, as compared to
most of the nodes in the system. This observation prompted us to consider a second
partition, which is described below.
In the second approach, the sender node locally calculates the excess load for
each node in the system, and calculates the portions to be transferred accordingly.
For convenience, define Qi(j)(t)4= Qi(t−Xji) and let Lex
i(j)(tb) be the excess load at
the ith node, as calculated by the jth node at the LB instant tb. Then, by using
similar rationale used in (3.1), we obtain the locally computed excess load
Lexi(j)(tb)
4= Qi(j)(tb)− λdi∑n
k=1 λdk
n∑
l=1
Ql(j)(tb). (3.3)
It is straightforward to verify that∑n
i=1 Lexi(j)(tb) = 0 almost surely. Let Vj
4= {j :
Lexj(j)(tb) > 0} be the collection of candidate sender nodes as perceived by the jth
node. Now for any j ∈ Vj, we partition the excess load of the jth node among only
23
Chapter 3. One-shot Load Balancing Policy: A Regeneration-theory Approach
those nodes that are perceived by the jth node to be below the average system load.
Let Uj4= {i : Lex
i(j)(tb) < 0} be the collection of such candidate recipient nodes. Now,
the partition pij can be defined as
pij =
Lexi(j)(tb)/
∑l∈Uj
Lexl(j)(tb), i ∈ Uj
0, otherwise.(3.4)
The above partition is most effective when delays are negligible, Qi(j)(t) are deter-
ministic, and tasks are arbitrarily divisible. In this case, if LB is executed together by
all the nodes that do not belong to Uj, each node finishes its tasks together, thereby,
minimizing the overall completion time. The proof of optimality of this partition is
shown in Appendix A.
When delays are present, the partitions defined by (3.2) or (3.4) may not be
effective in general, and the proportions pij must be adjusted. To incorporate this
adjustment, the adjusted load to be transferred to the ith node must be defined as
Lij(tb) = bKijpijLexj (tb)c, (3.5)
where bxc is the greatest integer less than or equal to x, and the parameters Kij ∈[0, 1] constitute the user-specified LB gains. The one-shot LB policy can be sum-
marized as follows: at the LB instant tb, the jth node compares its local load to
the average overall load of the system; then partitions its excess load among n − 1
available nodes using the fractions Kijpij, and dispatches the integral parts of the
adjusted excess loads to other nodes. The objective is to calculate the optimal tb
and the optimal LB gains Kij that will together minimize the total time to serve∑n
i=1 Qi(0) tasks by the totality of n nodes.
24
Chapter 3. One-shot Load Balancing Policy: A Regeneration-theory Approach
3.2.2 Renewal equations
In this section, we derive renewal equations that characterize the average overall
completion time (AOCT) for a given initial load distribution under the one-shot LB
policy. The overall completion time is defined as the maximum over completion times
for all nodes. For the moment, we will assume that all nodes execute LB using a
common LB gain Kij = K. This assumption will be relaxed in Chapter 5 to a setting
where nodes execute LB asynchronously using different LB gains.
Let T Im1,...,mn
(tb) be the overall completion time given that the LB is executed
at time tb, where the kth node has mk ≥ 0 tasks at time t = 0, and the system-
information state is I at time t = 0. Let the AOCT be given as µIm1,...,mn
(tb) :=
E[T Im1,...,mn
(tb)]. The goal is to calculate the AOCT when the system starts in an
uninformed state, viz., I = (10...0, 01...0, . . . , 00...1). However, to exploit the regen-
eration theory, we need to calculate the AOCT for an arbitrary system-information
state I.
Theorem 1: For n ∈ IN, m1, . . . ,mn ∈ ZZ+, and tb ≥ 0, the AOCT µIm1,...,mn
(tb)
satisfies the following set of 2n(n−1) (one for each initial system-information state I)
difference-differential equations shown in (3.6):
dµIm1,...,mn
(tb)
dtb
=n∑
k=1
λdkµI
m1−δ1,k,...,mn−δn,k(tb) +
n∑
k=1
∑
j 6=k
λkjµIkj
m1,...,mn(tb)− λµI
m1,...,mn(tb) + 1, (3.6)
where δj,k = 1 is the Kronecker delta, λ =∑n
k=1(λdk+
∑j 6=k λkj), Ikj is identical to I
with the exception that the jth component of ik is 1, and mk−1 is set to 0 whenever
mk = 0.
Before proving Theorem 1, we first present and prove Lemmas 1–3. As there
25
Chapter 3. One-shot Load Balancing Policy: A Regeneration-theory Approach
is no node failure, according to Convention C1 the regeneration time becomes τ =
min(mink(Wk1), minj 6=k(Xkj), mink,i(Zki)
). On {τ ≤ tb}, we define T
′Im1,...,mn
(tb) as
the time taken by the new queueing system emerging at τ to serve all the customers
in the system if LB is performed jointly by all nodes at time tb provided that the
system-information state at t = τ is specified by I while mk ≥ 0 tasks (k = 1, . . . , n)
are in the queue of the kth node at t = τ .
Lemma 1: For s ≤ tb, E[T Im1,...,mn
(tb)|τ = s, τ = W11] = s+E[T Im1−1,...,mn
(tb−s)].
Proof: The regeneration event {τ = s, τ = W11} implies that the first activity
to occur in the queuing system is precisely the completion of the first task at the
first server. Since in this case LB is not performed till τ , load transfer does not
occur in the queuing system that emerges prior to time t = s. Therefore, according
to Convention C1, P{Z′ki = ∞|τ = s, τ = W11
}= 1. Also, observe that upon the
occurrence of {τ = s, τ = W11} the system-information state I remains unchanged,
while the system load distribution at time τ becomes m1 − 1 tasks in the queue of
the first node, and m2, . . . ,mn tasks in the queue of the remaining nodes. Therefore,
by construction E[T Im1,...,mn
(tb)|τ = s, τ = W11] = E[τ + T′Im1−1,...,mn
(tb)|τ = s, τ =
W11] = s + E[T′Im1−1,...,mn
(tb)|τ = s, τ = W11].
The proof of Lemma 1 is complete once we show that E[T′Im1−1,...,mn
(tb)|τ = s, τ =
W11] = E[T Im1−1,...,mn
(tb − s)]. Observe that by definition, W′k1 = Wk1 − τ and X
′jk =
Xjk − τ for k, j ∈ {1, . . . , n}, j 6= k. Moreover, it is elementary to show (see
Appendix B for proof in a more general setting) that P{
W′21 ≤ t|τ = s, τ = W11
}=(
1 − e−λd2t)u(t) and P
{X
′jk ≤ t|τ = s, τ = W11
}=
(1 − e−λjkt
)u(t), and for
all j ≥ 2 and k ∈ {1, . . . , n}, W′kj have identical distributions. In other words,
conditional upon the occurrence of {τ = s, τ = W11}, all random times of the newly
emerging queuing system satisfy Assumption A1. Similarly, it can also be shown
that conditional upon the occurrence of {τ = s, τ = W11}, all random times of the
26
Chapter 3. One-shot Load Balancing Policy: A Regeneration-theory Approach
emerging queuing system, viz., W′ki, and X
′jk are mutually independent, thereby,
satisfying Assumption A2. The generalized proof of conditional independence of the
random times is provided in Appendix C.
In summary, we have shown that conditional on the occurrence of {τ = s, τ =
W11}, the random times characterizing the queuing system at time s satisfy Assump-
tions A1 and A2. Therefore, by shifting the time origin from t = 0 to t = s, we can
look at the emergent queuing system as the original queuing system with m1 − 1
tasks at the first node, m2, . . . , mn tasks in the queue of the remaining nodes, while
the system-information state I is unchanged. Nonetheless, due to the shift of origin,
the scheduling instant is now at tb − s units of time from the new origin. Therefore,
we conclude that E[T′Im1−1,...,mn
(tb)|τ = s, τ = W11] = E[T Im1−1,...,mn
(tb − s)], which
completes the proof of Lemma 1. 2
Lemma 2: For s ≤ tb, E[T Im1,...,mn
(tb)|τ = s, τ = Xkj] = s + E[T Ikj
m1,...,mn(tb − s)].
Proof: Here, the regeneration event is the arrival of the load-information sent
from the jth node to the kth node. Therefore, upon the occurrence of {τ = s, τ =
Xkj}, in accordance to Section 3.1.1, the jth component of ik becomes 1, while all
other il (l = 1, . . . , n and l 6= k) as well as the load distribution remain unchanged.
Therefore, E[T Im1,...,mn
(tb)|τ = s, τ = Xkj] = E[τ + T ′Ikj
m1,...,mn(tb)|τ = s, τ = Xkj] =
s + E[T ′Ikj
m1,...,mn(tb)|τ = s, τ = Xkj], where Ikj is identical to I except that the jth
component of ik is 1. Next, based on similar analysis as given in Lemma 1, we
can show that all the random times of the queuing system that emerges upon the
occurrence of {τ = s, τ = Xkj} satisfy Assumptions A1 and A2. Therefore, by
shifting the origin from t = 0 to t = s, we obtain E[T ′Ikj
m1,...,mn(tb)|τ = s, τ = Xkj] =
E[T Ikj
m1,...,mn(tb − s)]. 2
27
Chapter 3. One-shot Load Balancing Policy: A Regeneration-theory Approach
Lemma 3: E[T Im1,...,mn
(tb)|τ > tb] = tb + E[T Im1,...,mn
(0)].
Proof: In this case the regeneration event occurs after the LB instant. In other
words, nothing has occurred in the system until tb, therefore, the load distribution
as well as the system-information state I of the queuing system at tb is exactly the
same as that of the original queues. Next, we will analyze the random times of the
queuing system that emerges at time tb. Let T′′Im1,...,mn
(tb) be the time taken by the new
queueing system emerging at tb to serve all customers if LB is performed jointly by all
nodes at time tb, and provided that the system-information state at t = tb is specified
by I while mk ≥ 0 tasks (k = 1, . . . , n) are in the queue of the kth node at t = tb.
Therefore, by construction, E[T Im1,...,mn
(tb)|τ > tb] = E[tb + T′′Im1,...,mn
(tb)|τ > tb]. Let
the random times characterizing the queuing system emerging at tb be W′′ki, X
′′kj and
Z′′ki, all measured from tb. Note that we have maintained the same symbols (with
the addition of double primes) to characterize the same random delays. Since no
LB is done prior to tb, there is no load in transfer between nodes. Thus, based on
Convention C1, we obtain P{Z′′ki = ∞|τ > tb
}= 1. Also, for the ith customer of the
kth server, W′′ki = Wki − tb and for k, j ∈ {1, . . . , n}, j 6= k, X
′′jk = Xjk − tb. Now, it
is straightforward to show that the random times W′′ki and X
′′kj satisfy Assumptions
A1 and A2 (Refer to Lemma 7 in Section 3.3 for the detailed proof,). Consequently,
nothing has changed in the initial condition as well as the statistics of the queues
while tb units of time has elapsed. Therefore, in the new queuing system, we can
shift the origin to the right by tb units of time that gives the LB instant at t = 0.
Thus, E[T′′Im1,...,mn
(tb)|τ > tb] = E[T Im1,...,mn
(0)]. 2
Proof of Theorem 1:
Exploiting the properties of conditional expectation, we can write the AOCT as:
E[T Im1,...,mn
(tb)] = E[E[T I
m1,...,mn(tb)|τ ]
]=
∫ ∞
0
E[T Im1,...,mn
(tb)|τ = s]fτ (s)ds
28
Chapter 3. One-shot Load Balancing Policy: A Regeneration-theory Approach
where, fτ (t) is the probability density function (pdf) of τ . Splitting the above inte-
gral, we get
E[T Im1,...,mn
(tb)]
=
∫ tb
0
E[T Im1,...,mn
(tb)|τ = s]fτ (s)ds +
∫ ∞
tb
E[T Im1,...,mn
(tb)|τ = s]fτ (s)ds.(3.7)
Now, for s ≤ tb, we can write
E[T Im1,...,mn
(tb)|τ = s] =n∑
k=1
∑
j 6=k
E[T Im1,...,mn
(tb)|τ = s, τ = Xkj]P{
τ = Xkj|τ = s}
+n∑
k=1
E[T Im1,...,mn
(tb)|τ = s, τ = Wk1]P{
τ = Wk1|τ = s}
. (3.8)
And, note that
∫ ∞
tb
E[T Im1,...,mn
(tb)|τ = s]fτ (s)ds = E[T Im1,...,mn
(tb)|τ > tb]P{τ > tb}. (3.9)
We now apply Lemmas 1 and 2 to (3.8), Lemma 3 to (3.9), and substitute in (3.7)
to obtain
µIm1,...,mn
(tb) =
∫ tb
0
( n∑
k=1
(s + µI
m1−δ1,k,...,mn−δn,k(tb − s)
)P{
τ = Wk1|τ = s}
+n∑
k=1
∑
j 6=k
(s + µIkj
m1,...,mn(tb − s)
)P{
τ = Xkj|τ = s})
× fτ (s) ds
+(µI
m1,...,mn(0) + tb
)P{τ > tb} (3.10)
Recall that τ is an exponential random variable with rate (inverse of mean) given as
λ =∑n
k=1(λdk+
∑j 6=k λkj), while P{τ = Wk1|τ = s} =
λdk
λand P{τ = Xkj|τ = s} =
29
Chapter 3. One-shot Load Balancing Policy: A Regeneration-theory Approach
λkj
λ(see Appendix D for details). Therefore, Equation (3.10) becomes
µIm1,...,mn
(tb) =(µI
m1,...,mn(0) + tb
)e−λtb +
∫ tb
0
sλe−λsds
+
∫ tb
0
[ n∑
k=1
λdkµI
m1−δ1,k,...,mn−δn,k(tb − s) +
n∑
k=1
∑
j 6=k
λkjµIkj
m1,...,mn(tb − s)
]e−λsds.
(3.11)
By direct differentiation of (3.11) with respect to tb and rearranging terms we obtain
(3.6). 2
It should be noted that in order to solve the renewal equations of Theorem 1, we
first need to calculate the corresponding initial conditions, namely, µIm1,...,mn
(0). For
simplicity, we will explicitly calculate the initial condition for a two-node system.
Nonetheless, our approach demonstrates the fundamental technique to calculate the
initial condition for a multi-node system.
Initial condition: LB at tb = 0
In the case of a two-node system, equation (3.6) yields four equations involving
µ(1k1,k21)m1,m2 (tb) for ki ∈ {0, 1}. In [22], a brute-force method (based on conditional
probabilities) was used to calculate µ(1k1,k21)m1,m2 (0). Here, we solve this more efficiently
using the concept of regeneration. Without loss of generality, suppose m1 > m2.
Using (3.1) and (3.5), and with p21 = 1, we obtain
L21(0) =
bK(λd2
m1−λd1m2)
λd1+λd2
c if (k1, k2) ∈ {(1, 0), (1, 1)}bKλd2
m1
λd1+λd2
c, otherwise.(3.12)
Similarly, we can calculate L12(0). For ease of notation, let L214= L21(0) and L12
4=
L12(0). The delay in transferring load Lkj is termed as load-transfer delay from the
jth to the kth node. Recall that in Assumption A1, the transfer delay of the load
30
Chapter 3. One-shot Load Balancing Policy: A Regeneration-theory Approach
in transit to the kth node is also assumed to follow an exponential pdf with rate
λ̃k1. Suppose T1(r1; L12) is the waiting time at node 1 before all the tasks (including
that sent from node 2) are served, where r1 is the number of tasks at node 1 just
after LB is performed at time t = 0, i.e. r1 = m1 − L21, and L12 is the number
of tasks in transit. Let the cumulative distribution function (cdf) of T1(r1; L12) be
FT1(r1; L12; t) = P{T1(r1; L12) ≤ t},
With LB at time t = 0, the only possible events at the first node can either be
the arrival of L12 tasks sent by the second node or the service to a task by the first
node (if r1 > 0). If the regeneration event occurring at time s ∈ [0, t] is the arrival
of L12 tasks, a new queue is born at node 1 with r1 + L12 tasks, where service time
of each task still follows exponential distribution, and there is no task in transit to
the first node. On the other hand, if the regeneration event is the service to a task
at the first node, a new queue is born at node 1 with r1 − 1 tasks, where service
time of each task still follows exponential distribution, and there are L12 tasks (with
exponentially distributed transfer time) in transit to the first node . Therefore,
P{T1(r1; L12) ≤ t} =
∫ t
0
fτ (s)
[P{T1(r1 − 1; L12) ≤ t− s}λd1
λ′
+ P{T1(r1 + L12; 0) ≤ t− s} λ̃11
λ′
]ds (3.13)
where, λ′ = λd1 + λ̃11. Similarly, we can solve for P{T2(r2; L21) ≤ t}. Differentiating
(3.13) with respect to t, we get:
dFT1(r1; L12; t)
dt= −λ′FT1(r1; L12; t) + λd1FT1(r1 − 1; L12; t) + λ̃11FT1(r1 + L12; 0; t).
The initial condition FT1(0; L12; t) and FT1(r1 + L12; t) can further be decomposed
into simpler recursive equations by invoking the regeneration theory again. We have
to calculate FT2(r2; L21; t) using similar approach. For simplicity of notation, let
FTk(t)
4= FTk
(rk; Lkj; t). Now, the overall completion time is TC = max(T1, T2), and
31
Chapter 3. One-shot Load Balancing Policy: A Regeneration-theory Approach
recall that its average E[TC ] is µ(1k1,k21)m1,m2 (0). By exploiting the independence of T1 and
T2, the explicit solution is given as:
µ(1k1,k21)m1,m2
(0) = E[max(T1, T2)
]=
∫ ∞
0
t [fT1(t)FT2(t) + FT1(t)fT2(t)] dt, (3.14)
where fT1(t) and fT2(t) are the pdfs of T1 and T2, respectively.
3.3 System with permanent node-failures
In this Section, we consider a distributed system in which any node can fail perma-
nently with some probability [17]. One-shot LB is performed jointly by all functional
servers at time tb in order to fairly distribute the loads in the system. In addition,
each node is equipped with a backup system that performs certain duties, as men-
tioned in Section 3.1, only in the event of a permanent node failure. The concept of
regeneration is exploited to calculate the probability of successfully serving all the
tasks in a finite amount of time for a given initial load distribution. To the best
of our knowledge, issues of reliability and scheduling in a distributed systems with
random delays have been addressed together for the first time in this work.
Let T I,Fm1,...,mn
(tb;C) denote the time taken by the system to serve all the tasks
in the system if LB is performed by all functioning nodes at time tb, and the initial
system condition at t = 0 is as specified by I,F,C while mk tasks (k = 1, . . . , n) are
in the queue of the kth node. Our objective is to calculate the probability of success in
serving all tasks defined by P{T I,Fm1,...,mn
(tb;C) < ∞} for I = (10...0, 01...0, . . . , 00...1),
F = (11...1, 11...1, . . . , 11...1), and C = ([0], [0], . . . , [0]). (That is, we assume a null
information state at t = 0, that all nodes are functional, and no tasks are in transit.)
However, it turns out that it is necessary to calculate the probability of success
corresponding to arbitrary initial system conditions.
32
Chapter 3. One-shot Load Balancing Policy: A Regeneration-theory Approach
3.3.1 Renewal equations
For brevity, we will consider a two-node system (n = 2); however, our approach can
be extended in a straightforward way to a multi-node model. Let RI,Fm1,m2
(tb;C)4=
P{T I,Fm1,m2
(tb;C) < ∞}. Trivially, RI,F0,0
(tb; [0], [0]
)= 1 for all tb ≥ 0 and for any I and
F, since there is no task to be serviced in this case. Also, RI,(0f12,f210)m1,m2
(tb; [0], [0]
)= 0
if either m1 > 0 or m2 > 0, for any I and for fij ∈ {0, 1} (i, j ∈ {0, 1}), since
in this case both the nodes have already failed while there is at least one unserved
task in the system. Our main results are the renewal equations characterizing the
probability RI,Fm1,m2
(tb;C), which are given below in the form of difference-differential
equations.
Theorem 2: For n = 2,m1,m2 ∈ ZZ+, i1, i2 ∈ {0, 1} and tb ≥ 0, the probability
RI,Fm1,m2
(tb;C) satisfies equations (3.15)–(3.23) shown below:
d
dtbR(1i1,i21),(11,11)
m1,m2
(tb; ([0], [0])
)= −λR(1i1,i21),(11,11)
m1,m2
(tb; ([0], [0])
)
+λd1R(1i1,i21),(11,11)m1−1,m2
(tb; ([0], [0])
)+ λd2R
(1i1,i21),(11,11)m1,m2−1
(tb; ([0], [0])
)
+λ21R(1i1,11),(11,11)m1,m2
(tb; ([0], [0])
)+ λ12R
(11,i21),(11,11)m1,m2
(tb; ([0], [0])
)
+λf1R(1i1,i21),(01,11)0,m2
(tb; ([0], [1 m1])
)+λf2R
(1i1,i21),(11,10)m1,0
(tb; ([1 m2], [0])
), (3.15)
d
dtbR
(1i1,i21),(01,11)0,m2
(tb; ([0], [1 m1])
)= −λ′R(1i1,i21),(01,11)
0,m2
(tb; ([0], [1 m1])
)
+ λd2R(1i1,i21),(01,11)0,m2−1
(tb; ([0], [1 m1])
)+ λ21R
(1i1,11),(01,11)0,m2
(tb; ([0], [1 m1])
)
+ λ12R(11,i21),(01,11)0,m2
(tb; ([0], [1 m1])
)+ λF
21R(1i1,i21),(01,01)0,m2
(tb; ([0], [1 m1])
)
+ λ̃21R(1i1,i21),(01,01)0,m1+m2
(tb; ([0], [0])
), (3.16)
33
Chapter 3. One-shot Load Balancing Policy: A Regeneration-theory Approach
d
dtbR
(1i1,i21),(11,10)m1,0
(tb; ([1 m2], [0])
)= −λ′′R(1i1,i21),(11,10)
m1,0
(tb; ([1 m2], [0])
)
+ λd1R(1i1,i21),(11,10)m1−1,0
(tb; ([1 m2], [0])
)+ λ21R
(1i1,11),(11,10)m1,0
(tb; ([1 m2], [0])
)
+ λ12R(11,i21),(11,10)m1,0
(tb; ([1 m2], [0])
)+ λF
12R(1i1,i21),(10,10)m1,0
(tb; ([1 m2], [0])
)
+ λ̃12R(1i1,i21),(10,10)m1+m2,0
(tb; ([0], [0])
), (3.17)
R(10,i21),(01,11)0,m2
(tb; ([0], [1 m1])
)= R
(11,i21),(01,11)0,m2
(tb; ([0], [1 m1])
)(3.18)
R(1i1,01),(11,10)m1,0
(tb; ([1 m2], [0])
)= R
(1i1,11),(11,10)m1,0
(tb; ([1 m2], [0])
), (3.19)
R(1i1,i21),(01,01)0,m2
(tb; ([0], [1 m1])
)=
λd2
λd2 + λf2 + λ̃21
R(1i1,i21),(01,01)0,m2−1
(tb; ([0], [1 m1])
)
+λ̃21
λd2 + λf2 + λ̃21
R(1i1,i21),(01,01)0,m1+m2
(tb; ([0], [0])
), (3.20)
R(1i1,i21),(10,10)m1,0
(tb; ([1 m2], [0])
)=
λd1
λd1 + λf1 + λ̃12
R(1i1,i21),(10,10)m1−1,0
(tb; ([1 m2], [0])
)
+λ̃12
λd1 + λf1 + λ̃12
R(1i1,i21),(10,10)m1+m2,0
(tb; ([0], [0])
), (3.21)
R(1i1,i21),(01,01)0,m1+m2
(tb; ([0], [0])
)=
(λd2
λd2 + λf2
)m1+m2
, (3.22)
and
R(1i1,i21),(10,10)m1+m2,0
(tb; ([0], [0])
)=
(λd1
λd1 + λf1
)m1+m2
, (3.23)
where λ =∑2
k=1(λdk+ λfk
+∑
j 6=k λkj), λ′ = λd2 + λ21 + λ12 + λF
21 + λ̃21 + λf2 , λ′′ =
λd1 + λ21 + λ12 + λF12 + λ̃12 + λf1 and mk − 1 is set to 0 when mk = 0.
34
Chapter 3. One-shot Load Balancing Policy: A Regeneration-theory Approach
Before proving Theorem 2, we first present and prove Lemmas 4–7. For this
purpose, on {τ ≤ tb}, we define T ′I,Fm1,...,mn
(tb;C) as the time taken by the new queueing
system emerging at τ to serve all the tasks in the system if LB is performed by all
functioning nodes at time tb provided that the system condition at t = τ is specified
by I,F,C while mk ≥ 0 tasks (k = 1, . . . , n) are in the queue of the kth node.
Lemma 4: For s ≤ tb, P{
T(10,01),(11,11)m1,m2
(tb; ([0], [0])
)< ∞
∣∣τ = s, τ = W11
}=
P{T (10,01),(11,11)m1−1,m2
(tb − s; ([0], [0])
)< ∞}.
Proof: We begin by noting that the regeneration event {τ = s, τ = W11} is
precisely service to the first task at the first node before any other activity takes
place in the queuing system. Thus, no failure-notice has been sent and no load-
redistribution (required only upon failure) has been made in the queuing system
that emerges prior to time t = s. Therefore, according to Convention C1, we obtain
P{XF ′
kj = ∞|τ = s, τ = W11
}= 1 and P
{Z′ki = ∞|τ = s, τ = W11
}= 1. Next, we
observe that the system condition of the queues that emerge upon the occurrence
of {τ = s, τ = W11} becomes I = (10, 01),F = (11, 11),C = ([0], [0]),m1 − 1
tasks in the queue of the first node, and m2 tasks in the queue of the second
node. Therefore, by construction P{
T(10,01),(11,11)m1,m2
(tb; ([0], [0])
)< ∞|τ = s, τ =
W11
}= P
{τ + T
′(10,01),(11,11)m1−1,m2
(tb; ([0], [0])
)< ∞|τ = s, τ = W11
}. The proof is com-
plete once we establish that P{
T′(10,01),(11,11)m1−1,m2
(tb; ([0], [0])
)< ∞|τ = s, τ = W11
}=
P{T (10,01),(11,11)m1−1,m2
(tb − s; ([0], [0])
)< ∞}.
Next, we observe that by definition, W′k1 = Wk1−τ, Y
′k = Yk−τ and X
′jk = Xjk−τ
for k, j ∈ {1, 2}, j 6= k. Moreover, it is elementary to show (see Appendix B for proof)
that the conditional distribution of W′21 is
P{
W′21 ≤ t|τ = s, τ = W11
}=
(1− e−λd2
t)u(t), (3.24)
where u(.) is the unit step function. Similarly, it can be show that P{
Y′k ≤ t|τ =
35
Chapter 3. One-shot Load Balancing Policy: A Regeneration-theory Approach
s, τ = W11
}=
(1−e−λfk
t)u(t), P
{X
′jk ≤ t|τ = s, τ = W11
}=
(1−e−λjkt
)u(t), and
for all j ≥ 2 and k ∈ {1, 2}, W′kj and Wkj have identical distributions. Therefore,
conditional upon the occurrence of {τ = s, τ = W11}, all random times of the newly
emerging queuing system satisfy Assumption A1.
The conditional independence of W′21 and Y
′1 is proved in Appendix C. Similarly,
it can also be shown that conditional upon the occurrence of {τ = s, τ = W11},W
′kj, Y
′k , and X
′jk are mutually independent. Therefore, upon the occurrence of
{τ = s, τ = W11}, all random times of the emerging queuing system also satisfy
Assumption A2.
In summary, we have shown that conditional on the occurrence of {τ = s, τ =
W11}, the random times characterizing the queuing system at time s satisfy As-
sumptions A1 and A2. Therefore, by shifting the time origin from t = 0 to t =
s, we can think of the emergent queuing system as the original queuing system
but with m1 − 1 tasks in the queue of the first server, while other system initial
conditions remain the same. In addition, due to the shift of origin, the LB in-
stant is now at tb − s units of time from the new origin. Therefore, we conclude
that P{
T′(10,01),(11,11)m1−1,m2
(tb; ([0], [0])
)< ∞|τ = s, τ = W11
}= P{T (10,01),(11,11)
m1−1,m2
(tb −
s; ([0], [0]))
< ∞}, which completes the proof of Lemma 4. 2
Lemma 5: For s ≤ tb, P{
T(10,01),(11,11)m1,m2
(tb; ([0], [0])
)< ∞|τ = s, τ = X12
}=
P{T (11,01)),(11,11)m1,m2
(tb − s; ([0], [0])
)< ∞}.
Proof: Note that upon the occurrence of {τ = s, τ = X12}, the system information
state of the emerging queues becomes I = (11, 01), while other system conditions
remain same as that of the original queues. Therefore, P{
T(10,01),(11,11)m1,m2
(tb; ([0], [0])
)<
∞|τ = s, τ = X12
}= P
{τ + T
′(11,01),(11,11)m1,m2
(tb; ([0], [0])
)< ∞|τ = s, τ = X12
}.
Using similar analysis as in Lemma 4, we can show that upon the occurrence of
{τ = s, τ = X12}, the random times characterizing the emerging queuing system
36
Chapter 3. One-shot Load Balancing Policy: A Regeneration-theory Approach
satisfy Assumptions A1 and A2. Therefore, P{
T′(11,01),(11,11)m1,m2
(tb; ([0], [0])
)< ∞|τ =
s, τ = X12
}= P{T (11,01)),(11,11)
m1,m2
(tb − s; ([0], [0])
)< ∞}. 2
Lemma 6: For s ≤ tb, P{
T(10,01),(11,11)m1,m2
(tb; ([0], [0])
)< ∞|τ = s, τ = Y1
}=
P{T (10,01),(01,11)0,m2
(tb − s; ([0], [1 m1])
)< ∞}.
Proof: In this case, the regeneration event is the failure of the first node. Thus,
according to the one-shot LB policy (refer to Section 3.1) , the occurrence of {τ =
s, τ = Y1} triggers the back-up system of the first node to send a failure notice as
well as m1 tasks from its queue to the second node. Since the failure-notice is in
transit at time s, the system function state of the emergent queuing system becomes
F = (01, 11); while the network state at time s becomes C = ([0], [1 m1]) due to
a group of m1 tasks that are in transit to the second node. Further, at t = s, the
system information state is I = (10, 01) while there are m2 tasks at the second node
and there is no task at the first node. Therefore,
P{
T (10,01),(11,11)m1,m2
(tb; ([0], [0])
)< ∞|τ = s, τ = Y1
}
= P{
τ + T′(10,01),(01,11)0,m2
(tb; ([0], [1 m1])
)< ∞|τ = s, τ = Y1
}. (3.25)
Next, we analyze the random times characterizing the queues that emerge upon the
occurrence of {τ = s, τ = Y1}. In light of Assumption A1, XF ′21 and Z
′21 follow
exponential distributions with rates λF21 and λ̃21, respectively. On the other hand,
according to C1, P{W
′1i = ∞|τ = s, τ = Y1
}= 1 for the ith task of the first node.
As no failure-notice has been sent from the second node to the first node and no
load-redistribution has been made by the second node prior to time t = s, we can
use Convention C1 to write P{XF ′
12 = ∞|τ = s, τ = Y1
}= 1 and P
{Z′1i = ∞|τ =
s, τ = Y1
}= 1 for the ith group of load transferred to the first node. Similar
to the proof of Lemma 4, the conditional distributions of W′2j, Y
′2 , X
′kj for j 6= k,
conditional on the occurrence of {τ = s, τ = Y1}, can each be shown to satisfy
37
Chapter 3. One-shot Load Balancing Policy: A Regeneration-theory Approach
Assumptions A1 and A2, thereby justifying the notion of regeneration of queues at
time τ . Therefore, when τ = s, we can shift the time origin from t = 0 to t = s and
obtain P{
T′(10,01),(01,11)0,m2
(tb; ([0], [1 m1])
)< ∞|τ = s, τ = Y1
}= P{T (10,01),(01,11)
0,m2
(tb −
s; ([0], [1 m1]))
< ∞}, which in conjunction with (3.25) completes the proof. 2
Lemma 7: P{
T(10,01),(11,11)m1,m2
(tb; ([0], [0])
)< ∞|τ > tb
}= P{T (10,01),(11,11)
m1,m2
(0; ([0], [0])
)<
∞}.
Proof: The occurrence of the event {τ > tb} implies that the system condition of
the queues at time tb is exactly the same as the initial system condition of the original
queues. Let for {τ > tb}, T ′′I,Fm1,...,mn
(tb;C) be the time taken by the new queueing
system emerging at tb to serve all tasks if LB is performed by all functioning nodes
at time tb, and provided that the system condition at t = tb is specified by I,F,C
while mk ≥ 0 tasks are in the queue of the kth node. Therefore, by definition,
P{
T(10,01),(11,11)m1,m2
(tb; ([0], [0])
)< ∞|τ > tb
}= P
{tb + T
′′(10,01),(11,11)m1,m2
(tb; ([0], [0])
)<
∞|τ > tb
}. Let the random times characterizing the queuing system emerging at
tb be W′′ki, Y
′′k , X
′′kj, X
F ′′kj and Z
′′ki, all measured from tb. Clearly, no failure-notice has
been sent and no customer-redistribution has been made in the queuing system that
emerges prior to time t = tb. Therefore, based on Convention C1, P{XF ′′
kj = ∞|τ >
tb}
= 1 and P{Z′′ki = ∞|τ > tb
}= 1. For the ith task of the kth node, W
′′ki = Wki−tb
and for k, j ∈ {1, 2}, j 6= k, Y′′k = Yk− tb and X
′′jk = Xjk− tb. Based on Assumptions
A1 and A2 it is straightforward to show that ,
P{
W′′ki ≤ t|τ > tb
}=
(1− e−λdk
t)u(t).
P{
W′′ki ≤ t1, Y
′′k ≤ t2|τ > tb
}= P
{W
′′ki ≤ t1|τ > tb
}P{
Y′′k ≤ t2|τ > tb
}.
Similarly, conditional on the occurrence of {τ > tb}, the conditional distributions
of W′′ki, Y
′′k , X
′′kj, XF ′′
kj and Z′′ki can be shown to satisfy A1 and A2. Consequently,
38
Chapter 3. One-shot Load Balancing Policy: A Regeneration-theory Approach
nothing has changed in the initial condition as well as the statistics of the queues
while tb units of time has elapsed. Therefore, we can shift the origin by tb units
of time, which makes the LB instant at t = 0 for the new queuing system. Thus,
P{
T′′(10,01),(11,11)m1,m2 (tb; ([0], [0])
)< ∞|τ > tb
}= P{T (10,01),(11,11)
m1,m2
(0; ([0], [0])
)< ∞}. 2
Proof of Theorem 2:
We begin by proving Equation (3.15) for the case I = (10, 01). Observe that
P{T (10,01),(11,11)m1,m2
(tb; [0], [0]
)< ∞}=
∫ tb
0
P{
T (10,01),(11,11)m1,m2
(tb; [0], [0]
)<∞|τ = s
}fτ (s)ds
+
∫ ∞
tb
P{
T (10,01),(11,11)m1,m2
(tb; [0], [0]
)<∞|τ = s
}fτ (s)ds. (3.26)
We can write the first integrand on the right side of (3.26) as
P{
T (10,01),(11,11)m1,m2
(tb; [0], [0]
)< ∞|τ = s
}=
2∑
k=1
P{
T (10,01),(11,11)m1,m2
(tb; [0], [0]
)< ∞|τ = s, τ = Wk
}P{τ = Wk|τ = s}
+2∑
k=1
∑
j 6=k
P{
T (10,01),(11,11)m1,m2
(tb; [0], [0]
)< ∞|τ = s, τ = Xkj
}P{τ = Xkj|τ = s}
+2∑
k=1
P{
T (10,01),(11,11)m1,m2
(tb; [0], [0]
)< ∞|τ = s, τ = Yk
}P{τ = Yk|τ = s}.(3.27)
Also, note that
∫ ∞
tb
P{
T (10,01),(11,11)m1,m2
(tb; [0], [0]
)< ∞|τ = s
}fτ (s)ds
= P{
T (10,01),(11,11)m1,m2
(tb; [0], [0]
)< ∞|τ > tb
}P{τ > tb}. (3.28)
We now apply Lemmas 4–6 to (3.27), Lemma 7 to (3.28), and substitute in (3.26) to
39
Chapter 3. One-shot Load Balancing Policy: A Regeneration-theory Approach
obtain
R(10,01),(11,11)m1,m2
(tb; [0], [0]
)=
∫ tb
0
[R
(10,01),(11,11)m1−1,m2
(tb − s; [0], [0]
)P{τ = W1|τ = s}
+ R(10,01),(11,11)m1,m2−1
(tb − s; [0], [0]
)P{τ = W2|τ = s}
+ R(10,11),(11,11)m1,m2
(tb − s; [0], [0]
)P{τ = X21|τ = s}
+ R(11,01),(11,11)m1,m2
(tb − s; [0], [0]
)P{τ = X12|τ = s}
+ R(10,01),(01,11)0,m2
(tb − s; [0], [1 m1]
)P{τ = Y1|τ = s}
+ R(10,01),(11,10)m1,0
(tb − s; [1 m2], [0]
)P{τ = Y2|τ = s}
]fτ (s) ds
+ R(10,01),(11,11)m1,m2
(0; [0], [0]
)P{τ > tb} (3.29)
Next, observe that in conjunction with A1, A2, and C1, it is straightforward to show
that fτ (t) = λe−λtu(t), where λ =∑2
k=1(λdk+ λfk
+∑
j 6=k λkj). Moreover, we can
show that (see Appendix D for proof) P{τ = Wk1|τ = s} =λdk
λ, P{τ = Xkj|τ =
s} =λkj
λ, and P{τ = Yk|τ = s} =
λfk
λ. Therefore, Equation (3.29) becomes
R(10,01),(11,11)m1,m2
(tb; [0], [0]
)= e−λtbR(10,01),(11,11)
m1,m2
(0; [0], [0]
)
+
∫ tb
0
[λd1R
(10,01),(11,11)m1−1,m2
(tb − s; [0], [0]
)+ λd2R
(10,01),(11,11)m1,m2−1
(tb − s; [0], [0]
)
+ λ21R(10,11),(11,11)m1,m2
(tb − s; [0], [0]
)+ λ12R
(11,01),(11,11)m1,m2
(tb − s; [0], [0]
)
+ λf1R(10,01),(01,11)0,m2
(tb − s; [0], [1 m1]
)+ λf2R
(10,01),(11,10)m1,0
(tb − s; [1 m2], [0]
) ]e−λs ds.
(3.30)
Finally, by differentiating (3.30) with respect to tb and rearranging terms we obtain
(3.15).
Proofs of Equations (3.15)–(3.17) for any given I can be achieved in a similar fash-
ion. Next, recall that according to the one-shot LB policy, no LB action is taken by a
faulty node, say the kth faulty node, whereby the information state vector ik becomes
redundant and does not effect the success probability. This observation leads to the
40
Chapter 3. One-shot Load Balancing Policy: A Regeneration-theory Approach
proof of (3.18) and (3.19). Also, note that in the case of a two-node model, whenever a
failure is detected, viz., F = (10, 10) or F = (01, 01), no LB action is taken by the only
remaining functional node. Consequently, tb and I have no role in determining the
success probability in these cases (but we do not drop tb and I from the notation just
to maintain consistency). Therefore, we obtain Equations (3.20)–(3.23) by exploiting
the following conditional probabilities, which can be proved similarly to Lemmas 4–
7: P{
TI,(01,01)0,m2
(tb; ([0], [1 m1])
)< ∞|τ = W21
}= P{T I,(01,01)
0,m2−1
(tb; ([0], [1 m1])
)< ∞},
P{
T(I,(01,01)0,m2
(tb; ([0], [1 m1])
)< ∞|τ = Z21
}= P{T I,(01,01)
0,m1+m2
(tb; ([0], [0])
)< ∞} and
P{
T(I,(01,01)0,m2
(tb; ([0], [1 m1])
)< ∞|τ = Y2
}= 0. This completes the proof of Theo-
rem 2. 2
3.3.2 Calculation of the initial condition
In order to solve the renewal equations (3.15)–(3.17) of Theorem 2, we first need
to calculate their initial conditions corresponding to tb = 0. In this case, LB is
performed at time 0 by each functional node, say the kth node, based on its infor-
mation state ik, so that Ljk(0) ≥ 0 tasks are transferred to the jth functional node.
Let T̃Fr1,r2
(C) be the time to serve all the tasks in the system given that the system
function state and the network state at t = 0 are F and C, respectively, while rk
tasks are in the queue of the kth node at t = 0. For simplicity, let Ljk4= Ljk(0).
Then, by construction, RI,(11,11)m1,m2
(0; ([0], [0])
)= P{T̃ (11,11)
r1,r2
(([1 L12], [1 L21])
)< ∞},
RI,(01,11)0,m2
(0; ([0], [1 m1])
)= P{T̃ (01,11)
0,r2
(([1 L12], [1 m1])
)< ∞} and
RI,(11,10)m1,0
(0; ([1 m2], [0])
)= P{T̃ (11,10)
r1,0
(([1 m2], [1 L21])
)< ∞}, where rk = mk − Ljk.
41
Chapter 3. One-shot Load Balancing Policy: A Regeneration-theory Approach
Theorem 3: For n = 2, r1, r2 ∈ ZZ+, gk ≥ 0 and cki > 0, the probability
P{T̃Fr1,r2
(C) < ∞} satisfies the relations shown in (3.31)–(3.35):
P{T̃ (11,11)r1,r2
(([g1 c11 . . . c1g1 ], [g2 c21 . . . c2g2 ])
)< ∞}
= P{T̃ (11,11)r1−1,r2
(([g1 c11 . . . c1g1 ], [g2 c21 . . . c2g2 ])
)< ∞}λd1
λ
+ P{T̃ (11,11)r1,r2−1
(([g1 c11 . . . c1g1 ], [g2 c21 . . . c2g2 ])
)< ∞}λd2
λ
+ P{T̃ (01,11)0,r2
(([g1 c11 . . . c1g1 ], [g2+1 c21 . . . c2g2 r1])
)< ∞}λf1
λ
+ P{T̃ (11,10)r1,0
(([g1+1 c11 . . . c1g1 r2], [g2 c21 . . . c2g2 ])
)< ∞}λf2
λ
+
g1∑i=1
P{T̃ (11,11)r1+c1i,r2
(([g1−1 c11 . . . c1(i−1) c1(i+1) . . . c1g1 ], [g2 c21 . . . c2g2 ])
)< ∞} λ̃1i
λ
+
g2∑j=1
P{T̃ (11,11)r1,r2+c2j
(([g1 c11 . . . c1g1 ], [g2−1 c21 . . . c2(j−1) c2(j+1) . . . c2g2 ])
)< ∞} λ̃2j
λ,
(3.31)
P{T̃ (01,11)0,r2
(([g1 c11 . . . c1g1 ], [g2 c21 . . . c2g2 ])
)< ∞}
= P{T̃ (01,11)0,r2−1
(([g1 c11 . . . c1g1 ], [g2 c21 . . . c2g2 ])
)< ∞}λd2
λ′
+
g1∑i=1
P{T̃ (01,11)0,r2
(([g1−1 c11 . . . c1(i−1) c1(i+1) . . . c1g1 ], [g2+1 c21 . . . c2g2 c1i])
)< ∞} λ̃1i
λ′
+
g2∑j=1
P{T̃ (01,11)0,r2+c2j
(([g1 c11 . . . c1g1 ], [g2−1 c21 . . . c2(j−1) c2(j+1) . . . c2g2 ])
)< ∞} λ̃2j
λ′,
(3.32)
42
Chapter 3. One-shot Load Balancing Policy: A Regeneration-theory Approach
P{T̃ (11,10)r1,0
(([g1 c11 . . . c1g1 ], [g2 c21 . . . c2g2 ])
)< ∞}
= P{T̃ (11,10)r1−1,0
(([g1 c11 . . . c1g1 ], [g2 c21 . . . c2g2 ])
)< ∞}λd1
λ′′
+
g1∑i=1
P{T̃ (11,10)r1+c1i,0
(([g1−1 c11 . . . c1(i−1) c1(i+1) . . . c1g1 ], [g2 c21 . . . c2g2 ])
)< ∞} λ̃1i
λ′′
+
g2∑j=1
P{T̃ (11,10)r1,0
(([g1+1 c11 . . . c1g1c2j], [g2−1 c21 . . . c2(j−1) c2(j+1) . . . c2g2 ])
)< ∞} λ̃2j
λ′′,
(3.33)
P{T̃ (01,11)0,r2
(([0], [0])
)< ∞} =
(λd2
λd2 + λf2
)r2
, (3.34)
and
P{T̃ (11,10)r1,0
(([0], [0])
)< ∞} =
(λd1
λd1 + λf1
)r1
, (3.35)
where λ̃−1ki = θcki, λ =
∑2k=1(λdk
+ λfk) +
∑g1
i=1 λ̃1i +∑g2
j=1 λ̃2j, λ′ = λd2 + λf2 +
∑g1
i=1 λ̃1i +∑g2
j=1 λ̃2j, λ′′ = λd1 +λf1 +
∑g1
i=1 λ̃1i +∑g2
j=1 λ̃2j and rk−1 is set to 0 when
rk = 0.
Proof of Theorem 3:
Consider Equation (3.31) with g1=1, c11=L12, g2=1 and c21=L21. We define
the regeneration random variable as ξ = min(W11,W21, Y1, Y2, Z11, Z21). Note that
all the random delays characterizing the queueing system at time 0 are assumed to
satisfy A1, A2 and C1. It is straightforward to show that ξ is an exponential r.v.
43
Chapter 3. One-shot Load Balancing Policy: A Regeneration-theory Approach
with rate λ =∑2
k=1 λdk+ λfk
+ λ̃k1. Observe that
P{T̃ (11,11)r1,r2
(([1 L12], [1 L21])
)< ∞}
=
∫ ∞
0
[ 2∑
k=1
P{
T̃ (11,11)r1,r2
(([1 L12], [1 L21])
)< ∞|ξ = s, ξ = Wk1
}P{ξ = Wk1|ξ = s}
+2∑
k=1
P{
T̃ (11,11)r1,r2
(([1 L12], [1 L21])
)< ∞|ξ = s, ξ = Yk
}P{ξ = Yk|ξ = s}
+2∑
k=1
P{
T̃ (11,11)r1,r2
(([1 L12], [1 L21])
)< ∞|ξ = s, ξ = Zk1
}P{ξ = Zk1|ξ = s}
]fξ(s)ds.
(3.36)
We can show (similarly to the proofs of Lemmas 4–7) that
P{
T̃ (11,11)r1,r2
(([1 L12], [1 L21])
)< ∞|ξ = s, ξ = W11
}= P{T̃ (11,11)
r1−1,r2
(([1 L12], [1 L21])
)< ∞}
P{
T̃ (11,11)r1,r2
(([1 L12], [1 L21])
)< ∞|ξ = s, ξ = W21
}= P{T̃ (11,11)
r1,r2−1
(([1 L12], [1 L21])
)< ∞}
P{
T̃ (11,11)r1,r2
(([1 L12], [1 L21])
)< ∞|ξ = s, ξ = Y1
}= P{T̃ (11,11)
0,r2
(([1 L12], [1 L21 r1])
)< ∞}
P{
T̃ (11,11)r1,r2
(([1 L12], [1 L21])
)< ∞|ξ = s, ξ = Y2
}= P{T̃ (11,11)
r1,0
(([1 L12 r2], [1 L21])
)< ∞}
P{
T̃ (11,11)r1,r2
(([1 L12], [1 L21])
)< ∞|ξ = s, ξ = Z11
}= P{T̃ (11,11)
r1+L12,r2
(([0], [1 L21])
)< ∞}
P{
T̃ (11,11)r1,r2
(([1 L12], [1 L21])
)< ∞|ξ = s, ξ = Z21
}= P{T̃ (11,11)
r1,r2+L21
(([1 L12], [0])
)< ∞}
Applying the above identities in (3.36) and using P{τ = Wk1|ξ = s} =λdk
λ, P{τ =
Yk|ξ = s} =λfk
λand P{τ = Zk1|ξ = s} = λ̃k1
λ, we obtain (3.31) since
∫∞0
fξ(s)ds = 1.
Similarly, we can prove Theorem 3 for (3.32)–(3.35). Note that the one-shot LB
policy requires that whenever a group of tasks arrive at a failed node, the back-up
system of the failed node should immediately send the group of tasks back to the
functional node. To this end, the following conditional probabilities are true and are
44
Chapter 3. One-shot Load Balancing Policy: A Regeneration-theory Approach
required to prove (3.32) and (3.33):
P{T̃
(11,10)r1,0
(([g1 c11 . . . c1g1 ], [g2 c21 . . . c2g2 ])
)< ∞|ξ = s, ξ = Z2j
}
= P{T̃ (11,10)r1,0
(([g1+1 c11 . . . c1g1 c2j], [g2−1 c21 . . . c2(j−1) c2(j+1) . . . c2g2 ])
)< ∞}
P{T̃
(01,11)0,r2
(([g1 c11 . . . c1g1 ], [g2 c21 . . . c2g2 ])
)< ∞|ξ = s, ξ = Z1i
}
= P{T̃ (01,11)0,r2
(([g1−1 c11 . . . c1(i−1) c1(i+1) . . . c1g1 ], [g2+1 c21 . . . c2g2 c1i])
)< ∞}
for j ∈ {1, ..., g2} and i ∈ {1, ..., g1}. 2
3.4 System with recoverable node-failures
Distributed systems (like “SETI at Home” [13]) utilize dynamic sets of remote nodes,
where nodes may join and leave the system in a random fashion. In fact, some of
these remote nodes are available only when they are not being used by their owners.
In addition, any node may randomly get disconnected from the Internet. In this
section, any unavailable node is considered to have failed, while a participating node
is considered to be functional. In particular, we consider that any node in the
system randomly fluctuates between a “failed” (or “down”) and “functional” (or
“up”) states.
We present two different LB policies: namely, a proactive LB policy called LBP-1
and a reactive LB policy called LBP-2 [21]. Given an initial load distribution, the
policy LBP-1 takes a proactive LB action at time t = 0 by predicting the failure
times, recovery times and the random load transfer times. In the policy LBP-2, the
LB action is performed at time t = 0 by considering the random load transfer times,
while disregarding the failure times and recovery times of nodes. Nonetheless, at
every occurrence of a node failure, a reactive scheduling action is taken to compensate
the failure.
45
Chapter 3. One-shot Load Balancing Policy: A Regeneration-theory Approach
3.4.1 Analysis of proactive LB policy: LBP-1
Consider a two-node distributed system. The policy LBP-1 allows a one-time and
a one-way load transfer between the nodes by predicting the failure times, recovery
times and the random load transfer times. No other balancing action is taken after-
wards. At time t = 0, both nodes are assumed to be functional and only one of the
nodes, say the kth node transfers Ljk tasks to the jth node, where
Ljk = bKmkc, (3.37)
where K ∈ [0, 1] is the LB gain. No other load transfer occurs afterwards and each
node will process its remaining tasks as well as the tasks transferred to it. The
optimal policy is to choose the sender node k and the receiver node j and calculate
the LB gain K that will minimize the AOCT. For the rest of this section, without
loss of generality, we will suppose that node 1 is the sender node.
Simplification of system-information and system-function states
We will assume that the delays in transferring LIs and failure notices between the
nodes are negligible as compared to the total computing time of the tasks as well
as to the delays in transferring the actual load between the nodes. Thus, nodes can
periodically communicate without incurring significant overheads, whereby each node
gets continually (instantly) informed of the initial load-states as well as the functional
states of other nodes. In other words, at any given time, we assume that every node
in the system has the current load information as well as the current function state
(either functional or faulty) of all other nodes. More precisely, i1 = i2 = . . . =
in = [1, . . . , 1], while f1 = f2 = . . . = fn. Therefore, under this assumption, we can
completely omit the system-information state from our analysis and also represent the
system-function state by a n-bit vector f that is common for all nodes. For example,
46
Chapter 3. One-shot Load Balancing Policy: A Regeneration-theory Approach
in a two-node system, the possible system functions states are: [0, 0], [0, 1], [1, 0] or
[1, 1].
In addition, it should be noted that there is no need to delay the LB instant
beyond time t = 0 as every node already has an up-to-date system information at
time t = 0 (due to negligible communication delays). Therefore, we will assume that
all the nodes take a synchronized LB action at tb = 0. These assumptions significantly
simplify our analysis as we will now obtain simple difference equations (instead of
difference-differential equations as in the previous two sections) that characterize the
AOCT.
Expected completion time and cumulative distribution function
Let T fr1,r2
(C) denote the time taken by the system to serve all the tasks in the system
if LB is performed by all functional nodes at time t = 0, and the initial system
condition at t = 0 is as specified by f ,C while rk tasks (k = 1, 2) are in the queue
of the kth node. According to LBP-1, LB is performed at t = 0 based on (3.37).
Therefore, at t = 0, r1 = m1 − L21, r2 = m2, C = ([0], [1 L21]) and f = [1, 1]. Our
objective is to calculate: E[T 1,1m1−L21,m2
([0], [1 L21]). Let µfr1,r2
(C)4= E[T f
r1,r2(C)] and
and pfr1,r2
(t;C)4= P{T f
r1,r2(C) ≤ t}. Clearly, µf
0,0([0], [0]) = 0 and pf0,0(t; [0], [0]) = 1
(for all t ≥ 0) since there is no task at any of the nodes and there is no task in the
network.
Theorem 4: For n = 2, r1, r2 ∈ ZZ+, L > 0 and t ∈ [0,∞), the AOCT µfr1,r2
(C)
and the cumulative distribution function pfr1,r2
(t;C) satisfies the four matrix relations
47
Chapter 3. One-shot Load Balancing Policy: A Regeneration-theory Approach
shown below:
µ0,0r1,r2
([0], [1 L]
)
µ0,1r1,r2
([0], [1 L]
)
µ1,0r1,r2
([0], [1 L]
)
µ1,1r1,r2
([0], [1 L]
)
=
1 −λs2/λA −λs1/λA 0
−λf2/λB 1 0 −λs1/λB
−λf1/λC 0 1 −λs2/λC
0 −λf1/λD −λf2/λD 1
−1
×
1λA
+ λ̃21
λAµ0,0
r1,r2+L
([0], [0]
)
1λB
+λd2
λBµ0,1
r1,r2−1
([0], [1 L]
)+ λ̃21
λBµ0,1
r1,r2+L
([0], [0]
)
1λC
+λd1
λCµ1,0
r1−1,r2([0], [1 L]
)+ λ̃21
λCµ1,0
r1,r2+L
([0], [0]
)
1λD
+λd1
λDµ1,1
r1−1,r2
([0], [1 L]
)+
λd2
λDµ1,1
r1,r2−1
([0], [1 L]
)+ λ̃21
λDµ1,1
r1,r2+L
([0], [0]
)
,
µ0,0r1,r2
([0], [0]
)
µ0,1r1,r2
([0], [0]
)
µ1,0r1,r2
([0], [0]
)
µ1,1r1,r2
([0], [0]
)
=
1 −λs2/λ′A −λs1/λ
′A 0
−λf2/λ′B 1 0 −λs1/λ
′B
−λf1/λ′C 0 1 −λs2/λ
′C
0 −λf1/λ′D −λf2/λ
′D 1
−1
×
1λ′A1
λ′B+
λd2
λ′Bµ0,1
r1,r2−1
([0], [0]
)
1λ′C
+λd1
λ′Cµ1,0
r1−1,r2([0], [0]
)
1λ′D
+λd1
λ′Dµ1,1
r1−1,r2
([0], [0]
)+
λd2
λ′Dµ1,1
r1,r2−1
([0], [0]
)
,
48
Chapter 3. One-shot Load Balancing Policy: A Regeneration-theory Approach
d
dt
p1,1r1,r2
(t; [0], [1 L]
)
p0,1r1,r2
(t; [0], [1 L]
)
p1,0r1,r2
(t; [0], [1 L]
)
p0,0r1,r2
(t; [0], [1 L]
)
=
−λD λf1 λf2 0
λs1 −λB 0 λf2
λs2 0 λC λf1
0 λs2 λs1 λA
p1,1r1,r2
(t; [0], [1 L]
)
p0,1r1,r2
(t; [0], [1 L]
)
p1,0r1,r2
(t; [0], [1 L]
)
p0,0r1,r2
(t; [0], [1 L]
)
+
λd1p1,1r1−1,r2
(t; [0], [1 L]
)+ λd2p
1,1r1,r2−1
(t; [0], [1 L]
)+ λ̃21p
1,1r1,r2+L
(t; [0], [0]
)
λd2p0,1r1,r2−1
(t; [0], [1 L]
)+ λ̃21p
0,1r1,r2+L
(t; [0], [0]
)
λd1p1,0r1−1,r2
(t; [0], [1 L]
)+ λ̃21p
1,0r1,r2+L
(t; [0], [0]
)
λ̃21p0,0r1,r2+L
(t; [0], [0]
),
,
d
dt
p1,1r1,r2
(t; [0], [0]
)
p0,1r1,r2
(t; [0], [0]
)
p1,0r1,r2
(t; [0], [0]
)
p0,0r1,r2
(t; [0], [0]
)
=
−λ′D λf1 λf2 0
λs1 −λ′B 0 λf2
λs2 0 λ′C λf1
0 λs2 λs1 λ′A
p1,1r1,r2
(t; [0], [0]
)
p0,1r1,r2
(t; [0], [0]
)
p1,0r1,r2
(t; [0], [0]
)
p0,0r1,r2
(t; [0], [0]
)
+
λd1p1,1r1−1,r2
(t; [0], [0]
)+ λd2p
1,1r1,r2−1
(t; [0], [0]
)
λd2p0,1r1,r2−1
(t; [0], [0]
)
λd1p1,0r1−1,r2
(t; [0], [0]
)
0,
,
where λA = λs1 + λs2 + λ̃21, λB = λd2 + λs1 + λf2 + λ̃21, λC = λd1 + λf1 + λs2 + λ̃21,
λD = λd1+λd2+λf1+λf2+λ̃21, λ′A = λs1+λs2 , λ′B = λd2+λs1+λf2 , λ′C = λd1+λf1+λs2 ,
λ′D = λd1 + λd2 + λf1 + λf2 , λ̃21 = θL, while as per Convention C1 λdkas well as
rk − 1 are both set to 0 whenever rk = 0.
Proof of Theorem 4:
49
Chapter 3. One-shot Load Balancing Policy: A Regeneration-theory Approach
Consider the case when the system initial condition is given by f = [0, 0] and
C = ([0], [1 L]). Based on Convention C1, the regeneration random variable can
be written as: τ = min(S1, S2, Z21), where Sk is the recovery time of the kth node
and Z21 is the random transfer delay of L tasks on their way to the second node.
Now, we can invoke Assumptions 1 and 2 (listed in Section 3.1) and formulate the
regeneration theory as mentioned in Lemmas 1–7 in order to obtain:
E[T 0,0r1,r2
([0], [1 L]
)|τ = s, τ = S1] = s + E[T 1,0r1,r2
([0], [1 L]
),
E[T 0,0r1,r2
([0], [1 L]
)|τ = s, τ = S2] = s + E[T 0,1r1,r2
([0], [1 L]
)and
E[T 0,0r1,r2
([0], [1 L]
)|τ = s, τ = Z21] = s + E[T 0,0r1,r2+L
([0], [0]
), (3.38)
where s ∈ [0,∞). Next, we use the iterated conditional expectations to write:
E[T 0,0r1,r2
([0], [1 L])] =
∫ ∞
0
E[T 0,0r1,r2
([0], [1 L]
)|τ = s]fτ (s)ds, (3.39)
where fτ (t) = (λs1 + λs2 + λ̃21)e−(λs1+λs2+λ̃21)su(s) is the pdf. of τ . Observe that
{τ = s} =⋃2
k=1{τ = s, τ = Sk}⋃{τ = s, τ = Z21}. Using this in (3.39) and
applying the results from (3.38) and Appendix D, we get:
E[T 0,0r1,r2
([0], [1 L])] =1
λA
+λs1
λA
E[T 1,0r1,r2
([0], [1 L])] +λs2
λA
E[T 0,1r1,r2
([0], [1 L])]
+λ̃21
λA
E[T 0,0r1,r2+L([0], [0])], where λA = λs1 + λs2 + λ̃21.
50
Chapter 3. One-shot Load Balancing Policy: A Regeneration-theory Approach
Similarly, we can show the following results:
E[T 0,1r1,r2
([0], [1 L])] =1
λB
+λs1
λB
E[T 1,1r1,r2
([0], [1 L])] +λd2
λB
E[T 0,1r1,r2−1([0], [1 L])]
+λf2
λB
E[T 0,0r1,r2
([0], [1 L])] +λ̃21
λB
E[T 0,1r1,r2+L([0], [0])], where λB = λd2 + λs1 + λf2 + λ̃21,
E[T 1,0r1,r2
([0], [1 L])] =1
λC
+λd1
λC
E[T 1,0r1−1,r2
([0], [1 L])] +λf1
λB
E[T 0,0r1,r2
([0], [1 L])]
+λs2
λC
E[T 1,1r1,r2
([0], [1 L])] +λ̃21
λB
E[T 1,0r1,r2+L([0], [0])], where λC = λd1 + λf1 + λs2 + λ̃21,
E[T 1,1r1,r2
([0], [1 L])] =1
λD
+λd1
λD
E[T 1,1r1−1,r2
([0], [1 L])] +λd2
λD
E[T 1,1r1,r2−1([0], [1 L])]
+λf1
λD
E[T 0,1r1,r2
([0], [1 L])] +λf2
λD
E[T 1,0r1,r2
([0], [1 L])] +λ̃21
λB
E[T 1,0r1,r2+L([0], [0])],
where λD = λd1 + λd2 + λf1 + λf2 + λ̃21.
Rearranging the above four equations, one for each E[T fr1,r2
([0], [1 L])], we obtain the
first matrix relation given in Theorem 4. For the case when C = ([0], [0]), we can
repeat the above analysis, while setting Z21 = ∞ a.s. in accordance to Convention
C1. Also, note that the CDF pfr1,r2
(t;C) can be written as pfr1,r2
(t;C) = E[1T fr1,r2
(C)≤t],
where 1A is an indicator function for the event A. This will enable us to use the
smoothing property of expectations and exploit the regeneration theory to get the
desired matrix relation. This completes the proof of Theorem 4.2
As a final observation we point out that swapping the roles of the sender and
receiver nodes does not change the nature of our analysis. Therefore, we relax the
assumption that the first node is the sender node. Thus, we can theoretically cal-
culate the LB gain and the sender/receiver pair (i.e., which node should be sending
tasks to the other) that will minimize the AOCT. This allows the optimal implemen-
tation of LBP-1.
51
Chapter 3. One-shot Load Balancing Policy: A Regeneration-theory Approach
3.4.2 Analysis of reactive LB policy: LBP-2
In the policy LBP-2, all nodes execute LB together at time t = 0 without con-
sideration of the future possibilities for failures and subsequent recoveries of nodes.
Further, each node is assumed to be equipped with a backup system that can send
or receive tasks. Every time a node fails, the backup system of the failed node gets
activated and transfers loads to all other nodes (functional or failed). Each such
transfer is performed in order to compensate the loss in overall computing time of
the system due to the failure of a node. As in LBP-1, we will assume that the com-
munication delays are negligible, while we take account of the random load transfer
delays.
The initial LB action is taken at t = 0 to achieve an “approximately” uniform
division of the total system-load among all the nodes assuming that all nodes will
remain functional. To this end, we utilize the one-shot LB policy described in Sec-
tion (3.2.1). In particular, based on the hypothesis that nodes will never fail, we
exploit the theory given in Section (3.2.2) to calculate the optimal LB gains that
minimize the AOCT.
Now suppose that the kth node fails at time t > 0. The average time that the
kth node will remain in the failed mode is: E[Sk] = λ−1sk
. In contrary, had the kth
node been functional during the λ−1sk
units of time, it would have serviced λdkλ−1
sk
number of tasks (which is the average recovery time multiplied by the service rate
of the kth node). In other words, the failure of the kth node should result in an
accumulation, on average, of λdk/λsk
unattended tasks during its recovery period.
Therefore, the system that has previously been balanced at t = 0 becomes suddenly
unbalanced again. Consequently, the kth node has to be allowed another balancing
opportunity, where it can transfer
(λdiPn
j=1 λdj
)(λdk
λsk
)number of tasks to every ith
node, i 6= k, in the system. However, the steady-state probability of any ith node
52
Chapter 3. One-shot Load Balancing Policy: A Regeneration-theory Approach
to be functional is
(λsi
λfi+λsi
). Thus, at every failure instant of the kth node, the
reactive policy LBP-2 considers the steady-state functional probabilities of all ith
nodes (i 6= k), and transfers LFAILik number of tasks from the kth node to every ith
node in the system, where
LFAILik = b
(λsi
λfi+ λsi
)(λdi∑nj=1 λdj
)(λdk
λsk
)c. (3.40)
3.5 Conclusion
We have undertaken a novel queuing approach to analyze the stochastic dynamics,
evolving under the one-shot LB policy, of cooperative systems comprising distrib-
uted nodes. In the one-shot LB policy, each functional node first utilizes its local
information about other nodes to calculate the number of tasks to be transferred,
which is instantly followed by a synchronized load balancing action taken together
by all functional nodes. Our model specifically captures the effects of random com-
munication delays and random load-transfer delays in the communication network.
We have introduced three fundamental random vectors to track the underlying point
processes associated with the distributed system. At any given time, these vectors
store information about load distribution among nodes, available functional nodes
and loads in the communication network. In addition, our model assumes that all
the random delays follow exponential distributions. Under this assumption, a re-
generation theory has been formulated yielding coupled renewal equations for three
different types of distributed systems; namely, (1) a system with no node-failure, (2)
a system with random node-failures, and (3) a system with random node-failures
and random node-recoveries.
In particular, we have derived a set of renewal equations that characterize the
expected value of the overall completion time for a certain amount of load initially
53
Chapter 3. One-shot Load Balancing Policy: A Regeneration-theory Approach
given to the system with no node-failure. Similarly, we have obtained a different set
of renewal equations characterizing the probability of successfully serving an initial
amount of load present in the system with random node failures. For the system
with random node-failures and random node-recoveries, we have considered two dif-
ferent LB policies. The first policy, LBP-1, preemptively performs load balancing by
utilizing the statistical information about the failure and recovery processes. By as-
suming that the communication delays are negligible, we have obtained surprisingly
simple recursive relations that characterize the expected value of the overall com-
pletion times as well as their cumulative distribution functions. On the other hand,
in the second policy, LBP-2, the initial LB action is taken without predicting the
node-failure process. Instead, at every node-failure instant, the policy LBP-2 enables
the back-up system of the failed node to take a LB action, whereby distributing the
uncompleted load that has accumulated during the recovery time.
54
Chapter 4
Experimental, Theoretical and
Simulation Results
We present the experimental, theoretical and MC simulation-based results on the per-
formance of one-shot LB policy as applied to a distributed computing application.
The distributed computing application was chosen to be the matrix-multiplication
problem performed over a distributed system comprising two nodes that were con-
nected by (i) the Internet and (ii) the UNM EECE infrastructure-based IEEE 802.11b/g
WLAN. Over the Internet, we used a 650 MHz Intel Pentium III processor-based
computer (the first node) and a 2.66 GHz Intel P4 processor-based computer (the
second node). For the WLAN setup, the first node was replaced by a 1 GHz Trans-
meta Crusoe processor-based computer while the second node was kept the same as
in the Internet-based experiments.
In this Chapter, we first introduce our distributed computing test-bed in Sec-
tion (4.1) followed by the experimentally calculated empirical estimates of the system
parameters in Section (4.2). Next, we present the numerical results for LB policies
for three different distributed systems; namely, (1) system with no node-failures in
55
Chapter 4. Experimental, Theoretical and Simulation Results
Section 4.3, (2) system with recoverable node-failures in Section 4.4, and (3) sys-
tem with permanent node-failures in Section 4.5. The objective of this Chapter is
to compare the theoretical predictions, obtained by solving the equations given in
Chapter (3), to the real-time experimental results obtained over our test-bed.
4.1 Distributed computing system architecture
The LB policy has been implemented on a distributed computing system to experi-
mentally determine its performance. The system consists of nodes that are processing
jobs in a cooperative environment. Each node is equipped with a back-up system
that always saves the context of the application running on the node. The soft-
ware architecture of the distributed system is divided into three layers: application,
load-balancing and communication. In this section we provide just a brief exposition
to each layer; the interested readers are referred to the work of Ghanem [42] for a
detailed description.
The application that is used to evaluate the performance of LB policy is matrix
multiplication, where the service to one task is defined as the multiplication of one
row times a static matrix duplicated on all nodes. To achieve variability in the
processing speed of the nodes, the randomness is introduced in the size of each
task (row) by independently choosing its arithmetic precision with an exponential
distribution. In addition, the application layer updates the load information that
is being transferred between the nodes. The LB policy is implemented at the load-
balancing layer with a software using a multi-threaded process, where the POSIX-
threads programming standard is used. One of the threads schedules and triggers
the LB actions at predefined or calculated amount of times. At the LB instant,
the load-balancing thread calculates the number of the tasks to be transferred to
other nodes and accesses the shared data to specify the tasks to be sent by the
56
Chapter 4. Experimental, Theoretical and Simulation Results
communication layer. A different thread is coded for the back-up system of each
node in order to tackle the node-failures. If required, the failure-thread can also
compute the number of tasks to be transferred to other nodes. The communication
layer of each node handles the transfer of data from/to that node to/from the other
nodes within the system. Each node uses the UDP transport protocol to transfer its
load-information to the other nodes. The load information consists of the current
queue size, the estimate of current processing rate, and the estimate of network
delays between nodes. On the other hand, the communication layer uses the TCP
transport protocol to transfer the application data (tasks) between the nodes.
Identical copies of the above mentioned software runs on each node of the system.
Therefore, once the nodes are synchronized, every node can execute load balancing
autonomously at the prescribed LB instant by utilizing its local information. Finally,
it should be noted that the software platform is coded in ANSI-C over UNIX-based
systems. It has been successfully tested on SPARC processor-based machines run-
ning Solaris operating system and on IA-32 processor-based computers running both
Linux and Microsoft Windows operating systems.
4.2 Empirical estimation of system parameters
At first, experiments were performed to estimate the system parameters, namely,
the processing rates of the nodes (λdi), the load-information rates (λik), and the
load-transfer rate per task (θ−1). To recall, one task is a row (an array of numbers)
and the processing time of a task is the time required to multiply a static matrix
of fixed size by the row. As mentioned in Section 4.1, the task sizes are generated
randomly and independently, according to an exponential distribution, which will, in
turn, result in independent and identically-distributed processing times of tasks. The
empirically calculated pdfs of the processing time per task for each node is shown in
57
Chapter 4. Experimental, Theoretical and Simulation Results
Fig. 4.1. Clearly, each empirical pdf can be approximated with an exponential pdf
of appropriate rates.
0 2 4 6 8 10 120
0.5
1
1.5
t , s
f W1
(t)
0 1 2 3 4 50
0.5
1
1.5
2
t , s
f W2
(t)
Figure 4.1: Empirically estimated
pdfs of the processing time per task for
the Transmeta Crusoe machine (top)
and Intel P4 machine (bottom) as well
as their exponential approximations
(solid curves).
In Fig. 4.2 we show the empirical pdfs for the load-information delays over the
Internet as well as the WLAN, each of which can be approximated with an exponen-
tial pdf. In the experiments, each load-information packet had fixed size of 30 Bytes.
From Fig. 4.3 (left), we see that the average transfer delay grows linearly with the
increase in number of tasks. Further, in the same figure (right) the transfer delay per
task can also be approximated as an exponential random variable. Although there
are slight shifts observed in the pdfs of the load-information delays and the transfer
delay per task, in our approximation we have maintained the exponential form of
58
Chapter 4. Experimental, Theoretical and Simulation Results
the pdf and compensated the shift through the choice of the exponential parameter.
To summarize, empirical results for the pdfs are found to be in good agreement with
Assumption A1 of Section 3.1.
0.6 0.8 1 1.2 1.4 1.60
2
4
6
8
10
12
t, s
f X21
(t)
0.6 0.8 1 1.2 1.4 1.6 1.80
1
2
3
4
5
6
7
8
t, sf X
21
(t)
Figure 4.2: Empirical pdf’s of the load-information delays from the first node to the
second node obtained on the Internet (left) and on the EECE WLAN (right).
0 10 20 30 40 50 600
5
10
15
20
25
Number of Tasks
Ave
rage
Tra
nsfe
r D
elay
, s
0.1 0.2 0.3 0.4 0.5 0.60
5
10
15
20
t , s
f Z(
)
t
Figure 4.3: Left: Average transfer delay as a function of the number of tasks trans-
ferred between nodes. The stars are the actual realizations from the experiments.
Right: Empirical pdf of the transfer delay per task on the Internet under a normal
work-day of operation.
59
Chapter 4. Experimental, Theoretical and Simulation Results
4.3 System without node-failures
In the experiments conducted over the Internet, the first node and the second node
were initially assigned 100 and 60 tasks, respectively, where each task had a mean
size of 120 Bytes. In this context, the processing rates per task of the first node
and the second node were found to be 0.69 and 1.85, respectively. Firstly, fixing the
LB gain at K = 1, we optimized the AOCT by triggering the LB action at different
instants. The analytical and experimental results of this optimization are shown in
Fig. 4.4 (left). The experimental results are plotted by taking the AOCTs obtained
from 20 experiments for each tb. It can be seen that the AOCT becomes small
after tb = 1s. This behavior is attributed to the load-information delay imposed
by the channel. The empirically calculated average load-information delay from the
first node to the second node was 0.7s, and from the second node to the first node
was 0.9s. Therefore, any LB action performed before 0.7s is blind in the sense that
there is no knowledge of the initial load of the other node; both nodes exchange
tasks in this case. This behavior is evident from the experimental results shown in
Fig. 4.4 (right), which depicts the mean number of tasks transferred as a function
of tb. Further, when LB action is taken between 0.7s and 0.9s, the first node will
most likely have knowledge of the initial load of the second node, while the second
node would still be unknown about the initial load of the first node. Consequently,
according to (3.5), the first node sends a smaller portion of its load to the second
node while the second node still sends the same amount of load to the first node.
This means that the slower node (the first node) would eventually execute more tasks
than the faster node (the second node); hence, a larger AOCT is expected. On the
other hand, any LB action taken after 1s is not advantageous because there would
be a low probability for information to arrive. If tb is delayed for too long, the slower
node ends up computing more tasks, resulting in a larger AOCT (not shown in the
figure).
60
Chapter 4. Experimental, Theoretical and Simulation Results
For the experiments over WLAN, the initial load at the first node and the second
node were set to 100 and 60 tasks, respectively, while the processing rates per task
were estimated to be 1.07 and 1.85, respectively. A similar behavior, as in the case of
Internet, was observed for these WLAN experiments. Since the delay in transmitting
small packets, which is referred to here as load-information delay, fluctuates randomly
in WLAN, more realizations of the experiments are required for each tb to get a
smoother plot of the AOCT.
0 0.25 0.5 0.75 1 1.25 1.5 1.75 2 2.25 2.5 2.75 350
55
60
65
70
75
80
85
90
95
100
tb, s
Ave
rage
Ove
rall
Com
plet
ion
Tim
e, s
Experimental
Theoretical
MC Simulation
0 0.5 1 1.5 2 2.5 30
20
40
60
80
Ave
rage
Tas
k T
rans
ferr
ed
tb,s
Pentium IIIPentium 4
Figure 4.4: Left: The AOCT as a function of LB instants for the experiments over
the Internet. The LB gain was fixed at 1. Right: Amount of load transferred between
nodes at different LB instants.
Our next goal is to minimize the AOCT over K while keeping tb fixed. The
experiments were performed with the same initial configurations, and the LB was
triggered at 1s using different gains. The results obtained over the Internet and
WLAN are shown in Fig. 4.5. It is seen that the theoretical, MC-simulation, and
experimental results are in good agreement and the optimal K is approximately 1.
This is almost equivalent to the hypothetical case when transfer delay is absent,
in which case, perfect LB is achieved when K = 1 (or when, on average, 55 tasks
are transferred from the first node to the second node, as given by (3.5)). For
61
Chapter 4. Experimental, Theoretical and Simulation Results
experiments over the Internet, the empirically calculated average transfer delay per
task was found to be 0.17s, and the average delay to transfer 55 tasks from the first
node to the second node is therefore 9s, approximately. On the other hand, the
second node does not finish its initial load until 32s, which means that there are no
idle times at the second node before the arrival of the transfer. Therefore, any transfer
incurring a delay less than 32s is effectively equivalent, as far as the second node is
concerned, to an instantaneous transfer. For experiments over WLAN, the initial
load at the first node and the second node were set to 100 and 60 tasks, respectively,
while the processing rates per task were estimated to be 1.07 and 1.85, respectively.
The average delay to transfer 55 tasks was 5.5s, and the optimal performance was
obtained for K = 1 as expected.
0 0.2 0.4 0.6 0.8 10
20
40
60
80
100
120
140
160
K
Ave
rage
Ove
rall
Com
plet
ion
Tim
e,s
ExperimentalTheoreticalMC Simulation
0 0.2 0.4 0.6 0.8 10
20
40
60
80
100
K
Ave
rage
Ove
rall
Com
plet
ion
Tim
e,s
ExperimentalTheoreticalMC Simulation
Figure 4.5: The AOCT under different LB gains for the Internet (left) and the WLAN
(right). The LB instant was fixed at 2s.
These results motivate us to look further into the effect of K on the AOCT.
Specifically, we consider the types of applications that impose a mean transfer delay
to be greater than the mean processing time of the initial load at the receiver end,
thereby resulting in an idle time for the receiving node. This kind of situation can
arise in real applications like processing of satellite images where the images are large
in size, and thus the time to transfer them is greater than their processing time [43].
62
Chapter 4. Experimental, Theoretical and Simulation Results
We simulated this type of behavior in our matrix-multiplication setup by increasing
the mean size (in Bytes) of each row while simultaneously reducing the number of
columns to be multiplied in the static matrix. Clearly, a larger row size increases
the mean transfer delay per row (a task) as well as the mean processing time per
task. However, by reducing the number of columns in the static matrix, the mean
processing time per task can be reduced. By using this approach, we were able to
achieve a mean transfer delay per task of 0.72s while keeping the processing rates at
1.06 and 3.78 tasks per second for the first node and the second node, respectively.
The initial loads were still 100 and 60 tasks at the first and second nodes, respectively.
Now, according to (3.5), with K = 1, the load to be transferred from the first node is
64 tasks, producing a delay of 46s. On the other hand, the second node, on average,
finishes its initial load around 16s, and it would therefore have long idle time while
it is awaiting the arrival of load. This discussion is also supported by our theoretical
and experimental results shown in Fig. 4.6 (left), where the AOCT is at minimum
when K = 0.7, which holds for both experimental and theoretical curves. The error
between the theoretical and experimental minima is approximately 12%. Finally,
Fig. 4.6 (right) shows the analytical optimal gain as a function of the mean transfer
delay per task.
4.4 System with recoverable node-failures
All the experiments for policies LBP-1 and LBP-2 were conducted over the EECE
infrastructure-based IEEE 802.11b/g network at the University of New Mexico. The
processing rates per task of the first and the second nodes were estimated to be 1.07
and 1.85, respectively. Further, the nodes are assumed to fail and recover indepen-
dently and randomly according to an exponential pdf. In order to achieve this in our
experiments, we have coded a process that dynamically generates failure instants
63
Chapter 4. Experimental, Theoretical and Simulation Results
0 0.2 0.4 0.6 0.8 140
50
60
70
80
90
100
110
K
Ave
rage
Ove
rall
Com
plet
ion
Tim
e,s
ExperimentalTheoreticalMC Simulation
0 2 4 6 8 100
0.2
0.4
0.6
0.8
1
Mean Delay Per Task, s
Opt
imal
K
Figure 4.6: Left: The AOCT as a function of the LB gain in presence of large transfer
delay. The LB instant was fixed at 2s. Right: Theoretical result on the optimal LB
gain for different delays.
and sends signals, at all such failure instants, to the application layer ordering it
to stop executing tasks. Also, at every failure instant, the same process generates
a recovery time and waits for that amount of time before sending a new signal to
the application layer ordering it to resume the execution of tasks. In this section,
the average failure time for both nodes is 20s, while the average recovery times of
the first and second nodes are 10s and 20s, respectively. Clearly, the first node is
expected to be available for more time than the second node.
Initially, experiments were performed to assess the performance of LBP-1. The
first node was assigned 100 tasks, while the second node was assigned 60 tasks. The
LB was performed at time t = 0 according to (3.37) by transferring load from the
first node to the second node using different values for K and the corresponding
AOCT was computed. The theoretical, MC-simulation, and experimental results for
the AOCT are shown in Fig. (4.7). For comparison, the results for the no-failure case
(when the failure rate is set to zero) are also shown. From the theoretical curves,
it can be seen that the minimum AOCT occurs at K = 0.35, while the minimum
64
Chapter 4. Experimental, Theoretical and Simulation Results
occurs at K = 0.45 for the no-failure case. Note that in the former case, the first
node transfers 35 tasks to the second node, while in the latter case it transfers 45
tasks to the second node. In both cases, the AOCT is minimized when the first
node transfers tasks to the second node, which has a higher processing rate. But, in
presence of node failure, the amount of transfer has to be reduced for the optimal
performance because the second node is now less reliable. Intuitively, we can state
that the optimal K in case of node failure will always be less than the optimal K for
the no failure case.
0 0.2 0.4 0.6 0.8 140
60
80
100
120
140
160
180
K
Ave
rag
e O
vera
ll C
om
ple
tion
Tim
e,
s
TheoreticalExperimentalNo Failure CaseMC Simulation
Figure 4.7: The average overall com-
pletion time as a function of the LB
gain K for the LBP-1.
0 20 40 60 80 100 120 1400
20
40
60
80
100
120
140
Time, s
Qu
eu
e S
ize
LBP2LBP1
P4
Crusoe
Figure 4.8: A realization of the queues
obtained from the experiments con-
ducted for LBP-1 and LBP-2.
Next, we conducted experiments for LBP-2. The initial load distribution was 100
and 60 tasks at the first and second nodes, respectively. Note that from (3.40) the
amount of load to be transferred at every failure instant happens to be a constant,
depending on system parameters that are already set to certain values. The optimal
gain K for the initial LB (which does not account for node failure) was found to
be 1. Using this optimal gain, the AOCT was calculated using 60 independent
realizations of the experiments and was found to be 109.17s. We also performed
65
Chapter 4. Experimental, Theoretical and Simulation Results
the MC simulation under the same initial set-up for the LBP-2, and the AOCT
turned out to be 112.43s using 500 realizations. Recall that in the case of LBP-
1 (see Fig. (4.7)), the minimum AOCT is 117s, which is greater than the value
obtained for the LBP-2. This is expected since LBP-1 takes a preemptive action in
the beginning by predicting the failure instants, while LBP-2 avoids the prediction by
taking an action of transferring tasks only when failures occur. In order to compare
the dynamics of each policy, we show in Fig. 4.8 the actual queues of each node, under
one realization of the experiments performed for LBP-1 and LBP-2. The longer flat
portions of the queues correspond to the recovery times of the nodes. Also, the
downward (upward) jumps in the queues under LBP-2 correspond to the action of
transferring (receiving) tasks after every failure instant.
In order to compare the performance of LBP-1 and LBP-2 in the presence of
small network delays, we conducted experiments for each policy using different initial
loads. The average transfer delay per task was estimated to be equal to 0.02s. For
each initial load distribution that is listed in Table 4.1, the theoretical model was
utilized to calculate the optimal LB gains and the sender/receiver pair for LBP-1
that minimizes the AOCT. It was found that if the initial load of the first node is
smaller than the initial load of the second node, then the load transfer has to be
made from the second node to the first node; otherwise, the first node has to be
the sender node. Using the respective optimal LB gains and the sender/receiver
pairs, the actual experiments were conducted and the AOCT was calculated using
20 independent realizations for each initial load distribution. In Table 4.1 we also
list the theoretically calculated AOCT under the no node-failure case. The initial
optimal LB gains used in the experiments under LBP-2 were calculated apriori based
on the theoretical results for the initial condition as given in Section (3.2). In Table
4.2 we have listed the results obtained from our MC simulations and the real-time
experiments. We can see from both Tables that LBP-2 outperforms LBP-1 in all
cases.
66
Chapter 4. Experimental, Theoretical and Simulation Results
Table 4.1: Experimental results for LBP-1 using the theoretically determined optimal
LB gains.
Average Overall
Initial Optimal Completion Time (s)
Load LB Gain Node Failure Without
(m1,m2) Kopt Theo. Exp. Node
Pred. Result Failure
(200,200) 0.15 274.95 264.72 141.94
(200,100) 0.35 210.13 207.32 106.93
(100,200) 0.15 210.13 229.19 106.93
(200,50) 0.5 177.09 172.56 89.32
(50,200) 0.25 177.09 215.66 89.32
Table 4.2: Experimental and simulation results for LBP-2.
Initial Initial Optimal Average Overall
Load LB Gain Completion Time (s)
(m1,m2) Kopt MC Exp.
Simulation Result
(200,200) 1.00 277.9 263.4
(200,100) 1.00 202.4 188.8
(100,200) 0.80 203.07 212.9
(200,50) 1.00 170.81 171.42
(50,200) 0.95 189.72 177.6
We also studied the performance of LBP-1 and LBP-2 under different amount
of transfer delays in the channel. The results are shown in Table 4.3, and it can be
seen that when the average transfer delay per task is bigger than 1s, LBP-1 results
67
Chapter 4. Experimental, Theoretical and Simulation Results
in a smaller AOCT than LBP-2. This is attributable to the amount of time needed
in making load transfers at every failure instant in the case of LBP-2, which may
result in frequent idle times at the receiver node while it waits for the load to arrive.
On the other hand, LBP-1 only makes a one-time transfer at the beginning of load
execution. We observed that if the load-transfer time between nodes is in the order
of the average recovery time of the sender node, LBP-1 performs better than LBP-2.
Table 4.3: Performance of the LBP-1 and the LBP-2 under different network delays.
Average Delay Calculated Average Overall
Per Task Completion Time (s)
(s) LBP-1 LBP-2
.01 116.82 112.43
0.5 117.76 115.94
1 120.99 122.25
2 127.62 133.02
3 131.64 142.86
Finally, using results of Theorem 4 given in Section (3.4), we computed p1,1r1,r2
(t;C)
corresponding to the LB gain that minimizes the AOCT. The average transfer delay
per task was taken to be 0.02s. As an example, in Fig. 4.9 we present the cumulative
distribution function for the overall completion time for two different initial load
distributions given by (50, 0) and (25, 50).
4.5 System with permanent node-failures
In this section, we present the numerical results, based on the theory given in Sec-
tion (3.3), for the one-shot LB policy applied to a system comprising two distributed
68
Chapter 4. Experimental, Theoretical and Simulation Results
0 50 100 150 200 2500
0.5
1
t (s)p
1,1
25
,50(t
)
0 50 100 150 200 2500
0.5
1
t (s)
p1
,15
0,0
(t)
No Failure Failure
FailureNo Failure
Figure 4.9: The cumulative distribution function of the overall completion time in
LBP-1. The upper figure shows the case of an initial workload of (50, 0), while the
lower figure is for an initial workload of (25, 50).
nodes, where each node can fail permanently in random amount of time. The the-
oretical results are compared to the MC simulation results as well. At first, we
will provide a brief revision to the one-shot LB problem that was detailed earlier in
Section (3.2.1).
Recall that the load to be transferred from the jth node to the ith node (for
i, j ∈ {1, 2}, i 6= j) at the LB instant tb is given by
Lij(tb) =
⌊Kij
(Qj(tb)−
Q∗i(j)(tb) + Qj(tb)
2
)+⌋
, (4.1)
where Qk(0) = mk for k ∈ {1, 2} (which is the initial load of the kth node), and
Q∗i(j)(tb) is calculated based on the system-information state I at tb. More precisely,
Q∗i(j)(tb) = Qi(tb) if the jth node has received the load-information sent from the
69
Chapter 4. Experimental, Theoretical and Simulation Results
ith node by time tb, while Q∗i(j)(tb) = 0 otherwise. Notice that in (4.1), Lij(tb) is
calculated without considering the processing rates of the nodes, while in (3.1) the
calculation of the excess load of the jth node involves the processing rates of all
the nodes in the system. With the new expression for Lij(tb), the upper bound for
Kij becomes equal to2Qj(tb)
Qj(tb)−Q∗i(j)
(tb)instead of the upper bound equal to 1 in 3.5.
Therefore, in our new approach, it is possible for a node to transfer all its initial
load to another node in the system irrespective of the processing rates. This is quite
effective in situations where the faster nodes are more likely to fail permanently
than the relatively slower nodes, thereby necessitating large load-transfers in the
opposite directions, viz., from the faster to the slower nodes. The objective is to find
the optimal one-shot LB policy, defined by the choice of optimal tb together with
the optimal Kij, that maximizes computing reliability. More precisely, the optimal
one-shot LB policy is defined by argmaxtb,K12,K21
(R
(10,01),(11,11)m1,m2
(tb; [0], [0]
)).
Example 1: Consider the case for which m1 = 50 tasks and m2 = 25 tasks. The
service rates (in tasks per second) are λd1 = 0.5 and λd2 = 0.75; the failure times be:
λ−1f1
= 80 s and λ−1f2
= 50 s. The mean arrival times of the load-information packets
and the failure-notices be: λ−121 = λ−1
12 = λF−121 = λF−1
12 = 0.12 s, and let the slope of
mean transfer delay per task be θ = 0.4 s per task. Note that in this example the
first node with a smaller service rate is assigned a larger initial load. Intuitively, a
good LB policy would distribute the load considering both the service rates and the
failure rates of the nodes.
Let us first look at the solution to the initial condition corresponding to tb = 0,
when I = (10, 01),F = (11, 11) and C = ([0], [0]). Therefore, from (4.1), we obtain,
Lij(0) = bKijmj
2c with 0 ≤ Kij ≤ 2. Now, we can precisely solve the difference
equations listed in Theorem 3 of Section (3.3.2) to calculate R(10,01),(11,11)50,25
(0; [0], [0]
)).
In Fig. 4.10, the success probability under different choice of K12 is plotted as a
function of K21. Small value for K21 implies that the first (slower) node keeps most
70
Chapter 4. Experimental, Theoretical and Simulation Results
of its initial load. Consequently, load distribution remains unfair even after LB is
performed. Therefore, the time required to serve all customers becomes “large” and
the success probability is “small.” On the other hand, when K21 approaches 2, the
first node transfers most of its initial load to the second node. Consequently, almost
all the tasks have to be executed by the second (less reliable) node, thereby reducing
the success probability.
In Fig. 4.11, we fix K21 at 0.1, 1 and 2, while varying K12 between 0 and 2.
Note that when K21 = 2, the success probability can be maximized by choosing
K12 = 1.7. This can be explained by the fact that since the first node sends all 50
tasks to the second node, a large value for K12 ensures that the second node can
also send more tasks to the first node, which will avoid load-accumulation at the
second node. However, for K12 = 2, the success probability decreases, which can
be attributed to the fact that too many tasks get accumulated at the slower first
node. In Figs. 4.10 and 4.11, we have also shown results from the MC simulations
of the one-shot LB policy, where each success probability is calculated by averaging
outcomes (failures or successes) from 5000 independent realizations of the policy.
The MC simulation results are in good agreement to our theoretical results. Finally,
by numerically solving the renewal equations of Theorem 3 of Section (3.3.1), we
have plotted the success probability as a function of tb in Fig. 4.12. In summary,
we obtain argmax(tb,K21,K12)
(R
(10,01),(11,11)50,25
(tb; [0], [0]
)) =(8.1 s, 1.5, 0), which gives the
optimal success probability of 0.4233.
4.6 Conclusion
We have verified the validity of our regeneration-theory based model of the one-shot
load balancing policy by showing that the theoretical results are in good agreement
to the real-time experimental results as well as to the MC simulation results. One-
71
Chapter 4. Experimental, Theoretical and Simulation Results
0 0.25 0.5 0.75 1 1.25 1.5 1.75 20.24
0.26
0.28
0.3
0.32
0.34
0.36
0.38
0.4
0.42
0.44
K21
R(1
0,01
),(1
1,11
)50
,25
(0;
[0],
[0])
K12
= 0.1 K
12 = 1.5
MC Simulation
Figure 4.10: Probability of success as a function of LB gain of the first node when
LB is performed at time t = 0. Stars represent the Monte-Carlo simulation.
0 0.25 0.5 0.75 1 1.25 1.5 1.75 20.2
0.25
0.3
0.35
0.4
0.45
K12
R (
10,0
1),(
11,1
1)50
,25
(0;
[0],
[0])
K21
= 0.1 K
21 = 1
K21
= 2
Figure 4.11: Probability of success as a function of LB gain of the second node when
LB is performed at time t = 0. Stars represent the Monte-Carlo simulation.
shot load balancing experiments were performed by multiplying large matrices on
our custom-made distributed system comprising two heterogeneous nodes that were
72
Chapter 4. Experimental, Theoretical and Simulation Results
0 20 40 60 80 1000.365
0.37
0.375
0.38
0.385
0.39
0.395
0.4
0.405
tb, s
R (
10,0
1),(
11,1
1)50
,25
( t b;[
0],[
0])
Figure 4.12: Probability of success as a function of LB instant tb, while LB gains of
both nodes are kept at 0.5.
connected over the Internet and the IEEE 802.11b/g WLAN. Our results for the
distributed system without node-failures showed that for a given initial load and a
given load balancing gain there is an optimal load balancing instant that minimizes
the AOCT. In particular, if load balancing is performed before the optimal instant,
there is a likelihood that at least one of the nodes is not informed about the initial
load of the other node at the time of load balancing. On the other hand, if load
balancing is performed after the optimal instant, there is a likelihood that the faster
node becomes idle while either the load is in transit or the slower node is processing
tasks. We also looked at the interplay between the balancing gain and the size of
the random delay in the channel. The theoretical predictions, MC simulations and
the experimental results all showed that when the average transfer delay per task is
large compared to the average processing time per task, reduced LB gain minimizes
the AOCT.
Next, we presented a comparative analysis of the performance of LBP-1 and
73
Chapter 4. Experimental, Theoretical and Simulation Results
LBP-2 on a distributed system with recoverable node-failures. Under the policy LBP-
1, we saw that the apriori information about statistics of the failure and recovery
processes can be utilized to calculate the optimal LB gain that minimizes the AOCT
corresponding to a given initial load. We noticed that the presence of node-failures
and subsequent recoveries warrants the use of a reduced LB gain as compared to the
no-failure case. In addition, our studies revealed that when the average load-transfer
delays are small compared to the average recovery times, LBP-2 outperforms LBP-1.
In contrast, when the average load-transfer delays are large compared to the average
recovery times, the time wasted in transferring tasks at every failure instant, under
LBP-2, adversely affects the AOCT. Therefore, it is advantageous to use the LBP-1
instead of the LBP-2 in such situations.
Finally, we calculated the optimal one-shot LB parameters for a two-node distrib-
uted system in the context of permanent physical failure. The theoretical predictions
and MC simulations suggest that the probability of successfully serving initial num-
ber of tasks is a concave function of the load balancing gains. Further, we observed
that the probability of success can be maximized by selecting an appropriate load
balancing instant. In summary, for all three systems we found that the presence of
uncertainty (viz., node faulres/recoveries or random delays) calls for an attenuation
in the strength of LB action.
74
Chapter 5
Dynamic Load Balancing Policy
In this Chapter, we consider that external loads of different sizes (possibly corre-
sponding to different applications) arrive at a distributed-computing system ran-
domly in time and node space. Consequently, LB has to be performed periodically
to maintain load balance in the system. To this end, we propose a sender-initiated
distributed LB policy where each node can autonomously take LB actions repeat-
edly during run-time. As the LB actions are taken during run-time, the proposed
LB scheme comes under the DLB.
In our proposed DLB policy, every time an external load arrives at a node, the
node seeks a locally optimal one-shot-LB action. In particular, the locally optimal
one-shot-LB action is aimed to minimize the AOCT or the probability of success, as
appropriate, corresponding to the load present in the system just after the occurrence
of external-arrival. For clarity, we use the term external load to represent the loads
submitted to the system from some external source and not the loads transferred
from other nodes due to LB. Such a local one-shot LB action, which is required for
DLB, is different from the synchronous one-shot LB action described in Chapter (3)
in two ways: (1) the local action adapts to varying system parameters such as load
75
Chapter 5. Dynamic Load Balancing Policy
variability, randomness in the channel delay, and variable run-time processing speed
of the nodes, and (2) the local LB is performed in an asynchronous fashion, that is,
each node selects its own optimal LB instant and LB gain. (Recall that according to
the synchronized one-shot LB policy after the initial load assignment to nodes, all
the nodes execute LB synchronously using a common LB instant and a common LB
gain.)
Rest of the Chapter is organized as follows; We present the DLB policy in Sec-
tion (5.1) and describe the corresponding experimental results in Section (5.2). This
is followed by the formulation of computationally efficient sub-optimal LB policy for
an arbitrary number of nodes in Section (5.3). Finally, our conclusions are given in
Section (5.4).
5.1 Formulation of the DLB Policy
Consider a system of n distributed nodes with a given initial load, and assume that
external loads arrive randomly thereafter. We assume that nodes communicate with
each other at so-called “sync instants” on a regular basis. Upon the arrival of each
batch of external loads, the receiving node, and only the receiving node, prompts
itself to execute an optimal distributed one-shot LB. Namely, it finds the optimal
LB instant and gain and executes a LB action accordingly. Since load balancing is
performed locally at the external-load receiving node, say the jth node, the policy
depends only on its knowledge state vector ij, rather than the system knowledge state
I. Consequently, the number of possible knowledge states become 2(n−1). Further,
considering the periodic sync-exchanges between nodes, each node in the system is
continually assumed to be informed of the states of other nodes. Hence, the only
possible choice for the knowledge state vector of each jth node is: ij = (1 · · · 1) ≡ 1,
leading to a simpler optimization problem than the one detailed earlier.
76
Chapter 5. Dynamic Load Balancing Policy
Suppose that an external arrival occurs at the jth node at time t = ta. We need
to compute optimal LB gain and optimal LB instant for the jth node based on the
knowledge-state vector 1. Clearly, according to the knowledge of the jth node at
time ta, the effective queue length of the kth node is mk(j)(ta). To recall, mk(j)(ta) =
Qk(ta−ηjk∗), where ηjk∗ refers to the delay in the most recent communication received
by the jth node from the kth node. The goal is to minimize µ1m1(j),...,mn(j)
(ta + tb),
where tb is the LB instant of the jth node measured from the time of arrival ta.
By setting ta = 0, the system of queues, in the context of the jth node, at time ta
becomes statistically equivalent to the system of queues at time 0 with initial load
distribution mk(j) for all k ∈ {1, . . . , n}. Therefore, we can utilize the regeneration
theory to obtain the following difference-differential equation that can be solved to
calculate the optimal LB instant and the optimal LB gain.
dµ1m1(j),...,mn(j)
(tb)
dtb=
n∑
k=1
λdkµ1
m1(j)−δ1,k,...,mn(j)−δn,k(tb) − λµ1
m1(j),...,mn(j)(tb) + 1,(5.1)
where λ =∑n
k=1 λdk. In addition, the optimization over tb becomes unnecessary
since the jth node is already in the informed knowledge state 1. This claim has
been justified from the theoretical and experimental results of Chapter 4, where
we have shown that a node should perform LB immediately once it gets informed.
Therefore, our analysis becomes simpler as we can now use tb = 0, and calculate the
corresponding LB gains that minimize µ1m1(j),...,mn(j)
(0) using difference equations. In
practice, the optimal LB gains are calculated on-line by the jth receiver-node and
the LB is performed instantly at time t = ta.
The initial condition µ1m1(j),...,mn(j)
(0) is solved based on techniques similar to the
ones that are used to solve (3.14). But one notable difference here is that the local
LB action taken by the jth node at time 0 (measured from ta) does not consider
future load arrivals at the jth node due to past or future LB actions of other nodes.
More precisely, Lkj(0), for all k 6= j, are calculated based on (3.1), (3.4) and (3.5),
77
Chapter 5. Dynamic Load Balancing Policy
while setting Ljk(0) = 0 for all k. Therefore, we would expect to obtain a different
solution for locally optimal K than the one provided by (3.6).
The system parameter, namely the average processing time per task λ−1di
is up-
dated locally by each ith node. At every sync instant, the node broadcasts its current
processing rate and the current queue-size. Since the sync periods are adjusted ac-
cording to the arrival rates, the added overhead in transferring and processing the
knowledge state information grows in proportion to the arrival rates. The second
adaptive parameter is the mean transfer delay per task θji, which is updated by
θ(k)ji = α
(Zji,k
Lji,k
)+ (1− α)θ
(k−1)ji (5.2)
where, Zji,k is the actual delay incurred in sending Lji,k tasks to the jth node at the
kth successful transmission of the ith node, and α ∈ [0, 1] is the so-called “forgetting
factor” of the previous estimation [44]. Also, θ(0)ji is calculated empirically from many
experimental realizations of delays in transferring tasks from the ith node to the jth
node. The forgetting factor can be adjusted dynamically in order to accommodate
drastic changes in transfer delay per task. Steps for the DLB policy are described
next.
Detailed Algorithm for Dynamic Load Balancing
For an n-node distributed system, we specify the “sync” periods for each node by
δj, j = 1, . . . , n. These are the periods, for each node, at which each node broadcasts
its queue length and processing speed to other nodes. (In our experiments, we used
a common sync period of 1s.)
Algorithm:
∀t ≥ 0, at every jth node, the DLB algorithm is:
if mod (t, δj) = 0 then
78
Chapter 5. Dynamic Load Balancing Policy
Broadcast current queue size and current processing rate
end if
if “sync” is received then
Update queue size and processing rate of the corresponding node
end if
if external-load is received, say at time t = ta then
Calculate local excess load using (3.1), initial partitions using (3.2) or (3.4), and
optimal Kij using (5.1)
Perform LB only by the jth node in accordance to (3.5)
Update θkij using (5.2) to include delay in completion of the kth load transmission
end if
5.2 Results
In this section we present the results on DLB policy for the experiments conducted
over the Internet, where external loads of random sizes arrive randomly in time at
any node in the distributed system. To recall, each instant an external load arrives
to a node, the receiving node (and only the receiving node) takes a local, optimal
one-shot LB action to minimize the AOCT of the total load in the system at that
instant. As external tasks arrive with certain rate, the total load and the overall
completion time of the total load in the system change with time. Therefore, the
performance of DLB policy is now evaluated in terms of the average completion time
per task (ACTT) corresponding to all tasks that are executed within a specified
time-window. Here the completion time of each task is defined as the sum of the
processing time, the queuing time and the transfer time of the task.
For all the experiments, the tasks are generated independently according to a com-
pound (or generalized) Poisson process with Poisson-distributed marks [45]. More
79
Chapter 5. Dynamic Load Balancing Policy
precisely, the external loads arrive according to a Poisson process, and the numbers
of tasks at the load-arrival instants constitute a sequence of independent and iden-
tically distributed Poisson random variables. (Recall that the task size, in terms of
Bytes per task, is also random according to a geometric distribution.) Note that
since the proposed DLB policy is triggered by the arrival of tasks and is based on
the actual realization of the task number in each arrival, it is independent of the
statistics of the number of tasks per arrival as well as the statistics of the underlying
task-arrival process.
The experiments were conducted for three different cases: Experiment-1: The
first node receives, on average 55 external tasks, at each arrival and the average
inter-arrival time is set to be 40s while no external tasks are generated at the second
node. Experiment-2: The second node receives, on average, 22 external tasks at each
arrival and the average inter-arrival time is 9s while no external tasks are generated at
the first node. Experiment-3: The first and the second nodes independently receive,
on average, 16 and 40 external tasks, respectively, at each arrival and the average
inter-arrival times are 20s and 18s, for the first and the second nodes, respectively.
The empirical estimates of the processing rates of the first and the second nodes were
found to be 1.06 and 3.78 tasks per second, respectively. The estimate of the average
transfer delay per task, θ(k)ji , is updated after every transfer of tasks according to
(5.2), with θ(0)ji = .85s and α=.05. In Fig. 5.1, we show estimates of θ
(k)ji , as a function
of time, obtained from one of our DLB experiments.
Each experiment was conducted for a period (time-window) of 1 hour and the
ACTT corresponding to each case is listed in Table 5.1. We also show the ACTT
obtained using static policies that perform LB with fixed gains of K = 0.1 and
K = 1 at all arrival instants. It is clear from Table 5.1 that the ACTT is minimum
for the DLB policy for all three experiments. Considering Experiment-1, note that
the average rate of arrival at the first node is 1.37 tasks per second, since the inter-
80
Chapter 5. Dynamic Load Balancing Policy
1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 30000.75
0.8
0.85
0.9
0.95
1
1.05
1.1
t , s
Est
imat
e of
Ave
rage
Tra
nsfe
rD
elay
Per
Tas
k, s
Figure 5.1: Adaptive estimation of the average transfer delay per task.
arrival times are independent of arrival sizes. Therefore, the average arrival rate
of the first node is greater than its processing rate (1.06 tasks per second), but is
smaller than the combined processing rates of the nodes. With LB, some portion of
the arriving tasks is diverted to the second node, which reduces the effective arrival
rate at the first node and thus avoids load accumulation. In the static LB policy with
K=0.1, the first node keeps 90% of its excess load, and hence, the effective arrival
rate at the first node remains larger than its processing rate. Therefore, the queue-
length accumulates with every arrival, which results in a greater queuing delay, and
thus, excess ACTT. In contrast, in the static policy with K=1, the first node sends
all of its excess load to the second node at every LB instant. However, each batch
of transferred load undergoes a large delay, resulting in an increase in ACTT.
In case of Experiment-2, the average rate of arrival at the second node is 2.44
tasks per second, which is smaller than the processing speed of the second node. As
a result, the static LB with K = 1 gives a reduced ACTT compared to K = 0.1,
meaning that the increase in ACTT due to queuing delay at the second node for
K = 0.1 is greater than the increase in ACTT caused by the transfer delay when
81
Chapter 5. Dynamic Load Balancing Policy
Table 5.1: Experimental results for dynamic and static LB policies
Average Completion Time System Processing Rate
Experiment per Task (s) (task s−1)
Number DLB K=0.1 K=1 DLB K=0.1 K=1
1 22.55 73.87 49.76 1.75 1.69 1.11
2 8.61 15.82 11.67 3.3 3.06 2.92
3 9 10.56 10.77 3.29 3.73 2.84
K = 1. However, the DLB outperforms the static case of K = 1 due to excessive
delay in load transfer associated to this static LB case. For Experiment-3, the ACTTs
are evidently similar under both K = 0.1 and K = 1 static LB policies. This is
because ACTT is dominated by queuing delay in the K = 0.1 (at the slower first
node) case while it is dominated by transfer delay in the K = 1 case. On the other
hand, the DLB policy effectively uses the system resources, viz., the nodes and the
channel, to avoid excessive queuing delay as well as the transfer delay.
We now look at the effect of LB policies on the system processing rate (SPR),
which is calculated as the total number of tasks executed by the system in a certain
time-window divided by the active time of the system. The active time of the system
within a time-window is defined as the aggregate of all times for which there is at
least one task in the system that is either being processed or being transferred. The
SPR achieved under different LB policies are listed in Table 5.1. It is interesting to
note that in the case of Experiment-1, better SPR is achieved with K = 0.1 than with
K = 1 despite the fact that the latter performs better in terms of ACTT. To explain
this behavior, we first need to look at one extreme case when no LB is performed. In
this case, the SPR is always equal to λd1 independently of the size of time–window.
However, as we increase the time–window, the ACTT diverges to infinity since the
average rate of arrival is bigger than the average processing rate of the first node.
The performance for the case of a weak LB action with K = 0.1 is found to be
82
Chapter 5. Dynamic Load Balancing Policy
similar to the extreme case of no LB. In the second case when LB is performed with
K = 1, the active time of the system gets dominated by times when there are tasks
in transfer while both nodes are idle. Consequently, the number of tasks processed
by the system is less while the active time of the system may increase, resulting in a
reduced SPR. However, the LB action taken by the first node reduces the effective
arrival rate at the first node below its processing rate. As a result, the ACTT of the
system is bounded.
In the case of DLB policy, LB gains are chosen small enough to avoid large
transfer delays but large enough to lower the effective arrival rate at the first node.
Therefore, for Experiment-1, the DLB policy achieves the maximum SPR and the
minimum ACTT. The fact that nodes have large idle times while there are tasks in
transfer for the case of K=1 is depicted in Fig. 5.2. Observe that when there is an
arrival of 70 tasks at the first node around 2250s, 55 tasks are transferred to the
second node. On the other hand, the second node has an empty queue at the arrival
instant of the first node, and due to the transfer delay, it must wait another 50s to
receive the tasks. Further, the first node finishes the remaining 15 tasks and becomes
idle by the time the second node gets the transferred load. This behavior is repeated
at all arrival instants, which are marked by arrows in Fig. 5.2 (left). In contrast,
from Fig. 5.2 (right), it can be seen that, the transfer delay mostly overlaps with the
working times of the sender node, which results in smaller idle times on both nodes.
Similar results are observed for Experiment-2.
In the case of Experiment-3, the first node and the second node receive external
loads at a rate of 0.8 and 2.2 tasks per second, respectively. This means that even
if no LB is performed, both nodes process their own tasks without being idle for a
long time. Therefore, the SPR is expected to be close to the sum of the processing
rates of the nodes. However, when LB is performed, nodes may become idle due
to the transfer delay, resulting in smaller SPR. This is evident from our results of
83
Chapter 5. Dynamic Load Balancing Policy
Experiment-3 where the static LB policy with K = 0.1 achieves maximum SPR.
On the other hand, the DLB policy transfers the right amount of tasks at every LB
instant, so that the transfer delays plus the queuing delays at the receiving node are
smaller than the queuing delays for those tasks at the sender node. This reduces the
ACTT but may or may not increase SPR depending on the resulting active time.
2260 2280 2300 2320 2340 2360 2380 2400 2420 24400
10
20
30
40
50
60
70
t, s
Que
ue S
ize
Node 1Node 2External Tasks
2260 2280 2300 2320 2340 2360 2380 2400 2420 24400
20
40
60
80
100
120
t, s
Que
ue S
ize
Node 1Node 2External Tasks
Figure 5.2: One realization of the queues under a static LB policy using a fixed gain
K = 1 (left) and DLB policy (right).
5.2.1 Comparison to other DLB policies
Next, we will compare the performance of our DLB policy to versions of two existing
LB policies (discussed in Chapter 2.2) for heterogeneous and dynamic computing,
namely, the shortest-expected-delay (SED) policy and the never-queue (NQ) policy,
which we have adapted to our distributed-computing setting. Suppose that external
arrival of x tasks occurs at the ith node at time t. Let mj(i)(t) be the queue lengths
of the jth node as per the knowledge of the ith node at time t. Let lj(i)(t) be the
ACTT for the batch of x external tasks if all the external tasks join the queue of the
jth node. The average completion time per task (per batch of x arriving tasks) can
84
Chapter 5. Dynamic Load Balancing Policy
now be expressed as
lj(i)(t) =1
x
x∑r=1
(mj(i)(t) + r
λdj
+ θ(k)ji x
)=
mj(i)(t)
λdj
+x + 1
2λdj
+ θ(k)ji x, (5.3)
where θ(k)ji is the k-th update of average transfer delay per task sent from the ith
node to the jth node (with θ(k)ii = 0). In the SED policy, the batch of x tasks is
assigned to the node that achieves the minimum ACTT. Therefore, the receiver node
is identified as argminj
(lj(i)(t)
). On the other hand, in the NQ policy, all external
loads are assigned to a node that has an empty queue. If more than one node have
an empty queue, the SED policy is invoked among the nodes with the empty queues
to choose a receiver node. Similarly, if none of the queues is empty, the SED policy
is invoked again to choose the receiver node among all the nodes.
We implemented the SED and the NQ policies to perform the distributed com-
puting experiments on our test-bed. The experiments were conducted between two
nodes connected over the Internet (keeping the same processing speeds per task).
We performed three types of experiments for each policy: (i) Node 1 receiving on
average 20 tasks at each arrival and the average inter-arrival time was set to 12s
while no external tasks were generated at node 2; (ii) Node 2 receiving on average
25 tasks at each arrival and the average inter-arrival time was set to be 8s; and (iii)
Node 1 and node 2 independently receiving on average 10 and 15 external tasks at
each arrival and the average inter-arrival times were 8s and 7s, respectively. Each
experiment was conducted for a 2-hour period. The results are shown in Table 5.2
suggesting that the ACTT achieved from the DLB policy is approximately half the
ACTT achieved from either the SED or NQ policies.
85
Chapter 5. Dynamic Load Balancing Policy
Table 5.2: Experimental results of the ACTT for DLB policies
Experiment Average Completion Time per Task (s)
Number Proposed DLB SED NQ
i 7.61 15.19 15.55
ii 6.09 13.68 13.77
iii 4.71 6.92 7.44
5.3 Sub-optimal LB policy for an n-node system
The optimal LB gains for a multi-node system can be obtained by explicitly solving
(5.1) (if the system is without node-failures) or by explicitly solving equations that
are structurally similar to the ones given by Theorems 3 and 4 (if the system is with
permanent and recoverable node-failures). However, the complexity of any such
equation grows exponentially with the increase in number of nodes and the problem
soon becomes intractable. Specially, when the delays imposed by the channel differ
according to paths between nodes, the LB gains Kij, for all i, can no longer be
parameterized by one value K. In such cases, it is not computationally efficient
to perform the online optimization required by the DLB policy. Therefore, in this
section, we undertake a sub-optimal yet effective approach to calculate different LB
gains that can be computed efficiently by invoking a two-node solution every time a
load is exchanged between a pair of nodes.
Suppose that all the nodes are initially informed of the states of the other nodes
and the optimal one-shot LB instant is at tb = 0. Therefore, according to (3.5) given
in Chapter 3, the adjusted load to be transferred from the jth node to the ith node is
Lij(0) = bKijpijLexj (0)c. Now, we present our approach to calculate the sub-optimal
LB gains of the jth node.
At first, we arbitrarily choose a recipient node, say the ith node, that belongs to
86
Chapter 5. Dynamic Load Balancing Policy
Uj. For the remaining recipient nodes we assume that the jth node sends the full
partitions of its excess load; that is, we set Kkj = 1 for all k ∈ Uj \ {i}, while the
objective is to calculate Kij. Now, we can think of a two-node system comprising
the ith and the jth nodes, where upon the execution of LB at tb = 0, the ith and the
jth nodes have loads Qi(0) and Qj(0) − ∑k∈Uj
bKkjpkjLexj (0)c, respectively, while
bKijpijLexj (0)c tasks are in transit from the jth node to the ith node. Next, the
theorems in Chapter 3 is invoked to compute the suboptimal Kij for a two-node
system.
In the next step, we arbitrarily choose another recipient node, say the lth node,
that belongs to Uj \ {i} and assume that Kkj = 1 for all k ∈ Uj \ {i, l}. We also
utilize the sub-optimal Kij obtained from the previous step. We then calculate the
sub-optimal Klj by solving a two-node system where upon the execution of LB at
tb = 0, the lth and the jth nodes have loads Ql(0) and Qj(0)−∑k∈Uj
bKkjpkjLexj (0)c,
respectively, while bKljpljLexj (0)c tasks are in transit from the jth node to the lth
node. These steps are repeated until we calculate sub-optimal Kij for all i ∈ Uj.
Example: We will now consider a 5-node system for which λd1 , λd2 , λd3 , λd4 , and
λd5 (in units of task per second) are 0.25, 0.5, 0.75, 1 and 1.25, respectively, while
λ−1f1
, λ−1f2
, λ−1f3
, λ−1f4
and λ−1f5
are 250 s, 200 s, 150 s, 100 s and 10 s, respectively. The
load-information arrival times and the failure-notice arrival times between nodes are
all equal to 0.12 s, while the slope of mean transfer delay per task is θ = 0.4 s per
task.. The probability of success corresponding to different initial loads is listed in
Table 5.3. When the initial load is (m1,m2,m3,m4,m5) = (125, 100, 75, 50, 25), only
the 4th and the 5-th nodes are the recipient nodes (belonging to U). The sub-optimal
LB gains were computed to be K41 = 0.95, K42 = 0.9, K51 = 0.73 and K52 = 0.96,
which together give a sub-optimal success probability of 0.39. In Table 5.3, we also
show the probability of success obtained under (i) Full-LB policy, where Kij = 1 for
all i ∈ Uj, and (ii) Null-LB policy, where Kij = 0 for all i ∈ Uj. Clearly, for all three
87
Chapter 5. Dynamic Load Balancing Policy
initial load distributions, the proposed suboptimal LB policy outperforms both the
Full-LB and the Null-LB policies.
Table 5.3: Probability of success achieved under different policies
Initial load Probability of success
(m1,m2,m3,m4,m5) Sub-optimal LB Full-LB Null-LB
(125,100,75,50,25) 0.39 0.33 0.2
(500,0,0,0,0) 0.15 0.1 0.015
(100,100,100,100,100) 0.25 0.22 0.16
5.4 Conclusion
The optimal one-shot load-balancing approach has been adapted to develop a distrib-
uted and dynamic load-balancing policy, in which, at every external load arrival, the
receiver node executes load balancing autonomously. Further, the optimal gains are
calculated on-the-fly, based on the system parameters that are adaptively updated.
Thus, the dynamic-load-balancing policy can adapt to the changing traffic conditions
in the channel as well as the change in task processing rates induced from the type
of applications. We have shown experimentally that the proposed dynamic-load-
balancing policy minimizes the average completion time per task, while improving
the system processing rate. The interplay between the queuing delays and the trans-
fer delays as well as their effects on the average completion time per task and system
processing rate were investigated. In particular, the average completion time per
task achieved under the proposed dynamic-load-balancing policy is significantly less
than those achieved by the commonly used SED and NQ policies. This is attribut-
able to the fact that the dynamic-load-balancing policy achieves a higher success, in
comparison to SED and NQ policies, in reducing the likelihood of nodes being idle
88
Chapter 5. Dynamic Load Balancing Policy
while there are tasks in the system, comprising tasks in the queues as well as those
in transit. Finally, the tow-node model has been utilized to formulate a sub-optimal
yet effective and computationally efficient LB policy for a multi-node system.
89
Chapter 6
Application to Wireless Sensor
Networks
A multidimensional queuing framework for distributed systems have been utilized to
analyze the performance of distributed sensor networks, routing in wireless networks,
telecommunications, and other resource-allocation problems in computer science and
operational research [46–49]. Therefore, the theoretical approach given in Chapter 3
can be useful in solving complex queuing problems that arise in such dynamical
systems. For illustration, we apply the theory to develop an optimal LB policy for
energy-limited distributed sensor networks.
6.1 Description of wireless sensor networks
Wireless sensor networks typically consist of small battery powered processors de-
ployed over a region. These sensors communicate with each other over radio links.
In some situations, few sensors might be overloaded by collecting data at a high
rate while others remain idle. This could lead to a situation where the network
90
Chapter 6. Application to Wireless Sensor Networks
looses some sensing coverage as the overloaded sensors become rapidly depleted of
battery power. In addition, due to computational limitations of a sensor, it may not
be possible for the sensor to process its data in a timely fashion. Thus, by allow-
ing the sensors to process the raw data cooperatively we may not only extend the
lifetime of some batteries but also enhance the computing efficiency of the sensor
network. However, transferring data between sensors in turn requires energy, and
these transfers will also be accompanied by random delays. There is therefore a fun-
damental tradeoff between savings in queuing time per task, resulting from utilizing
the processing power of distributed sensors cooperatively, and the combined delay
and energy overhead resulting from the very collaborative nature of the sensors [20].
6.2 Queuing equation
Consider the queuing model of Chapter 3 and suppose that the LB is performed at
time t = 0. Let Tr1,...,rn(C) denote the total service time for all tasks in the system if
the network state at t = 0 (just after LB is performed) is as specified by C, while rk
tasks (k = 1, . . . , n) remain at the kth sensor at t = 0. Now, based on Assumptions
A1 and A2 and Convention C1 that are given in Chapter 3, we can prove that for
n ∈ IN, rk ∈ ZZ+, gk ≥ 0 and cki > 0, the expected value of Tr1,...,rn(C) satisfies the
following relation:
E[Tr1,...,rn
([g1 c11 . . . c1g1 ], . . . , [gn cn1 . . . cng2 ]
)]
=1
λ+
n∑
k=1
λdk
λE
[Tr1−δ1,k,...,rn−δn,k
([g1 c11 . . . c1g1 ], . . . , [gn cn1 . . . cng2 ]
)]
+n∑
k=1
gk∑i=1
λ̃ki
λE
[Tr1+δ1,kcki,...,rn+δn,kcki
([g1 c11 . . . c1g1 ], . . . , [gk − 1 ck1 . . .
ck(i−1) ck(i+1) . . . ckgk], . . . , [gn cn1 . . . cng2 ]
)], (6.1)
91
Chapter 6. Application to Wireless Sensor Networks
where λ =∑n
k=1(λdk+
∑gk
i=1 λ̃ki), δj,k is the Kronecker delta and rk−1 is set to 0 when
rk = 0. Note that E[T0,0,...,0([0], [0], . . . , [0])] = 0 since T I,F0,0,...,0([0], [0], . . . , [0]) = 0 a.
s., since there are no task to be processed in the system. We omit the proof of (6.1)
as it is similar to the proofs in Chapter 3.
6.3 LB policies for two cooperating sensors
Consider a two-node sensor network. At time t = 0, the first sensor (overloaded)
transfers L21 number of tasks to the second sensor (idle) using the following policy:
L21 = bKm1c, (6.2)
where K ∈ [0, 1] is the gain of the LB policy and bxc is the greatest integer less
than or equal to x. Suppose that for any particular value of K, the average energy
consumed for processing, transferring and receiving tasks by the jth sensor is εj. The
minimum–service-time (MST) policy is the LB policy that minimizes the expected
value of the total service time. More precisely, the MST policy is defined by the
optimal K that minimizes E[T I,F
r1,r2([0], [1 L21])
]for r1 = m1 −L21 and r2 = m2. The
fair-energy (FE) policy is the LB policy that ensures equal energy consumption for
each sensor. More precisely, the FE-policy is defined by the fair K that achieves
εfair4= ε1 = ε2.
Example 1: Suppose that at time t = 0, the first node has sensed data equivalent
to 100 tasks and the second node is idle, i.e., m1 = 100 and m2 = 0. (A task is defined
as the amount of data required by a preprocessing algorithm in order to compute
one value of a desired quantity and processing of one task is one execution of the
preprocessing algorithm.) Suppose that the service rates (in task per second) are
λd1 = 1 and λd2 = 0.5. Further, let the channel delay be θ = 0.2 s per task.
Moreover, by adopting the radio-energy model for an actual sensor [50], we set the
92
Chapter 6. Application to Wireless Sensor Networks
energy dissipation rate for each sensor to be 1 mJ per task for transmission, and 0.5
mJ per task for reception. Finally, the energy dissipation rates for processing tasks
at the first and second sensors are 4 mJ per task and 2 mJ per task, respectively.
In Fig. 6.1 (left), we see that the expected value of the total service time, cal-
culated according to (6.1), attains its minimum value of 74 s for K = 0.28. If we
assume an infinite transfer rate, the fair amount of tasks to be transferred to the sec-
ond node would only depend upon the processing rates of the sensors, and is given byλd2
λd1+λd2
×100, which is approximately equal to 33 tasks. Instead, with a transfer-rate
of 5 tasks per second, the MST policy (corresponding to K =0.28) transfers only 28
tasks to the second node in order to avoid excess delay in the channel. However, it
should also be noted that for any particular value of K, and according to our assump-
tions on energy consumption, ε1 =L21 +4r1 and ε2 =0.5L21+2(r2+L21). Next, we can
observe from Fig. 6.1 (right) that at K = 0.28, the energy consumption per sensor
becomes unfair as the first sensor consumes about 4.5 times the energy consumed
by the second sensor. However, the FE policy, corresponding to K =0.73, results in
εfair = 182 mJ of energy consumption at each sensor; nonetheless, at K = 0.73, the
expected value of the total service time is about 161 s.
In order to jointly investigate the effect of K on the interplay between the total
service time and the energy consumption of each sensor, we define the following
normalized quantities that, respectively, measure the policy’s deviation from the
points of minimum service time and fair-energy consumption:
σT21 =E[T I,F
r1,r2(C)
]− Tmin
Tmax
, σε21 =
√∑l=1,2(εl − εfair)2
2εmax
,
where Tmin4= inf
{E[T I,F
r1,r2(C)
], K ∈ [0, 1]
}, Tmax
4= sup
{E[T I,F
r1,r2(C)
], K ∈ [0, 1]
}
and εmax = maxK
(max(ε1, ε2)
). In order to achieve a fair tradeoff between the devi-
ation in service time and deviation in energy consumption per sensor, we introduce
a scheduling policy called the fair-tradeoff (FT) policy that is given by K yield-
93
Chapter 6. Application to Wireless Sensor Networks
ing σT21 = σε21 . In the case of Example 1, the FT policy is given by K = 0.5 for
which ε1 = 250 mJ, ε2 = 125 mJ and the expected value of the total service time is
approximately 110 s.
0 0.2 0.4 0.6 0.8 160
80
100
120
140
160
180
200
220
K
E[
T r1, r
2([0]
,[1
L21
])],
s
0 0.2 0.4 0.6 0.8 10
50
100
150
200
250
300
350
400
KA
vera
ge E
nerg
y C
onsu
mpt
ion,
mJ
ε1
ε2
ε1+ ε
2
Figure 6.1: Expected value of the total service time (left) and the battery-energy
consumed by the sensors (right) under different LB gains.
6.3.1 Extension to n cooperating sensors
In this section, the sub-optimal one-shot LB policy detailed in Chapter 5.3 is utilized
to calculate the LB gain Kij that dictates the load-transfer from the jth node to the
ith node. At each step, the sub-optimal LB gain can be selected based on either the
MST, the FE or the FT policies as per the requirement.
Example 2: Consider a five-node sensor network for which λd1 , λd2 , λd3 , λd4 ,
and λd5 (in units of tasks per second) are 0.25, 0.5, 0.75, 1 and 1.25, respectively,
while the energy dissipation rates for processing (in units of mJ per task) at the
first to fifth sensors are 0.5, 1, 1.5, 2 and 2.5, respectively. The load-transfer rate as
well as the energy dissipation rates for transmission and reception are set as given
in Example 1. The initial task-distribution is: m1 = 500, m2 = . . . = m5 = 0. The
94
Chapter 6. Application to Wireless Sensor Networks
average service time (AST) and the average energy consumption (AEC) per sensor
under the MST, FE and FT policies are listed in Table 6.1. The MST policy gives a
small AST compared to FE and FT policies, but the aggregate AEC of five sensors
is much larger. Similarly, the AST under the FE policy is much larger compared
to the other two. In summary, the FT policy offers a well-balanced performance in
terms of reducing the AST and the AEC together.
Table 6.1: The AST and the AEC under the MST, FE and FT policies.
MST policy FE policy FT policy
AST (s) AEC (mJ) AST (s) AEC (mJ) AST (s) AEC (mJ)
ε1 = 481 ε1 = 363 ε1 = 423
ε2 = 99 ε2 = 75 ε2 = 86
188.5 ε3 = 200 1091.6 ε3 = 100 616.0 ε3 = 154
ε4 = 330 ε4 = 145 ε4 = 243
ε5 = 459 ε5 = 207 ε5 = 345
6.4 Conclusions
We have applied the theory to design novel LB policies for a distributed sensor
network application. The interplay between the expected value of the total service
time and the energy consumption of each sensor is highlighted. To this end, we
have considered three different scheduling policies: (1) the minimum-service-time
policy that minimizes the expected value of the total service time, (2) the fair-
energy policy that ensures a fair consumption of energy by each sensor, and (3)
the fair-tradeoff policy that jointly considers the service time and the deviation in
energy consumption per sensor. Our preliminary results for a two-node and five-
node sensor networks indicates that the fair-tradeoff policy achieves a well-balanced
95
Chapter 6. Application to Wireless Sensor Networks
performance as compared to the minimum–service-time and fair-energy policies in
terms of reducing the average service time and the average energy consumption.
96
Chapter 7
Future work
The regeneration theory based multidimensional queuing model presented in this
dissertation can be adapted to design resilient communication networks and wireless
sensor networks. In this chapter, we present an overview of future research work in
these areas.
7.1 Resilient distributed networks
Random occurrences of external attacks on the communication links and/or nodes of
distributed networks have to be detected dynamically. Due to the strong dependence
of the load completion times on the network functionality and connectedness, mea-
suring the load completion times enables us to make dynamic decisions on whether or
not the network is under external attack at any given time. The idea is to periodically
submit loads to the network and form a sequence of random variables correspond-
ing to load completion times for each submission. Such a sequence implicitly bares
information about the network’s state. Then, a composite binary hypothesis-testing
problem for “network-condition” can be formed as follows: Under the null hypoth-
97
Chapter 7. Future work
esis the node/link failure and recovery rates belong to the “non-attack” mode and
the nodes and links are all functional. On the other hand, under the alternative
hypothesis, the aforementioned rates belong to the “attack” mode and there is at
least one failed node or link in the network. The data (or test statistic) for this
decision problem is the sequence of load completion times. As loads are submit-
ted by the network evaluator periodically and their completion times observed, the
network-condition is announced through the Neyman-Pearson decision rule, whose
performance (detection probability and false-alarm rate) can be fully characterized
based on the pdf of load completion times, which can be calculated as described in
Chapter 3.
7.2 Wireless sensor networks
A wireless sensor network consists of a large number of sensor nodes distributed over
a certain sensing region. Further, the network is partitioned into clusters, where all
the sensors belonging to a certain cluster transmit sensed data to a common sink (or
fusion) node. Design of co-operative energy-efficient data transfer strategy between
sensors within a cluster is one of the challenging problems in this area. For example,
once a sensor collects a sum of data, it can transmit those data to the corresponding
sink node either by directly establishing communication with the sink node or by
means of a multi-hop cooperative strategy using other sensor nodes as relays. Next,
we outline an energy-efficient cooperative data transfer strategy, where each sensor
must use, more-or-less, the same amount of energy to uplink the data to the sink
layer.
Each sensor can be assigned an integer quantity, called maximum-packets-number
(MPN), which represents the maximum number of data packets that can be trans-
mitted from the sensor to the sink node before the sensor runs out of energy. Clearly,
98
Chapter 7. Future work
MPN depends on time, and one way to calculate it is by normalizing the energy re-
serve of a sensor at any given time by the energy required to transmit one packet of
data from the sensor to its corresponding sink node. Now, we can think of MPN as
analogous to the number of tasks awaiting service at a node in a distributed comput-
ing system. Therefore, the theory from Chapter 3 of this dissertation can be utilized
to calculate corresponding MPN-partitions for each sensor. For example, the cur-
rent MPN of the ith sensor is compared to the current average MPN of the cluster
at that time to calculate the MPN-partition of the ith sensor. Next, when the jth
sensor collects a certain amount of data, it transmits the ith MPN-partition of its
total data packets to the sink node using the ith node as a relay. Finally, the MPNs
can be thought of as multi-dimensional queues and analyzed under a regeneration
theory framework. The idea is to compute the quantities, analogous to the LB gains,
that will readjust the initial MPN-partitions in order to obtain optimal data transfer
strategy, resulting in a uniform energy usage by all sensors.
99
Appendices
A Optimality of partitions in the ideal case
B Proof of Equation (3.24)
C Proof of conditional independence of random delays W′21 and Y
′1
D Special property of minimum of exponential random variables
100
Appendix A
Optimality of partitions in the
ideal case
By ideal case we mean that there are no delays, the queues are deterministic, and the
tasks are arbitrarily divisible. This effectively means that each node in the system has
the exact queue size of other nodes. Consequently, it follows that mi(j)(t) = Qi(t),
Ij = I and pij ≡ pi, independently of j. Assume further that LB actions are executed
together at time t at all the nodes that do not belong to I. Let Qfinali (t) be the total
load at node i ∈ I after the execution of LB. Then,
Qfinali (t) = Qi(t) + pi
∑j∈Ic
Lexj (t) = Qi(t) +
Lexi (t)∑
j∈I Lexj (t)
∑j∈Ic
Lexj (t). (A.1)
Since∑n
j=1 Lexj (t) = 0, we have
∑j∈I Lex
j (t) = −∑j∈Ic Lex
j (t). Therefore,
Qfinali (t) = Qi(t)− Lex
i (t) = λdi
∑nl=1 Ql(t)∑n
l=1 λdl
. (A.2)
Clearly, the overall completion time isPn
l=1 Ql(t)Pnl=1 λdl
for all the nodes.
101
Appendix B
Proof of Equation (3.24)
Let us look at the conditional distribution of W′21:
P{
W′21 ≤ t|τ = s, τ = W11
}= P
{W21 − τ ≤ t|τ = s, τ = W11
}
= P{
W21 ≤ t + s|τ = s, τ = W11
}. (B.1)
Note that {τ = s, τ = W11} = {W11 = s,W21 > s, Y1 > s, Y2 > s, X12 > s, X21 > s}.Therefore, (B.1) becomes
P{
W′21 ≤ t|τ = s, τ = W11
}
= P{
W21 ≤ t + s|W11 = s,W21 > s, Y1 > s, Y2 > s, X12 > s,X21 > s}
Exploiting mutual independence as per the Assumption A2, we obtain
P{
W′21 ≤ t|τ = s, τ = W11
}= P
{W21 ≤ t + s|W21 > s
}=
(1− e−λd2
t)u(t).2
102
Appendix C
Proof of conditional independence
of random delays W′21 and Y
′1
P{
W′21 ≤ t1, Y
′1 ≤ t2|τ = s, τ = W11
}
= P{
W21 − τ ≤ t1, Y1 − τ ≤ t2|τ = s, τ = W11
}
= P{
W21 ≤ t1 + s, Y1 ≤ t2 + s|τ = s, τ = W11
}
= P{
W21 ≤ t1 + s, Y1 ≤ t2 + s|W11 = s, W21 > s, Y1 > s, Y2 > s, X12 > s, X21 > s}
= P{
W21 ≤ t1 + s, Y1 ≤ t2 + s|W21 > s, Y1 > s}
=P{
W21 ≤ t1 + s, Y1 ≤ t2 + s,W21 > s, Y1 > s}
P{
W21 > s, Y1 > s}
=P{
s < W21 ≤ t1 + s}
P{
W21 > s} .
P{
s < Y1 ≤ t2 + s}
P{
Y1 > s} ,
103
Appendix C. Proof of conditional independence of random delays W′21 and Y
′1
where the last equality follows from the use of mutual independence of W21 and Y1
according to Assumption A2. Therefore, we get:
P{
W′21 ≤ t1, Y
′1 ≤ t2|τ = s, τ = W11
}= P
{W
′21 ≤ t1|W21 > s
}.P
{Y′1 ≤ t2|Y1 > s
},
which concludes the proof after noting that P{
W′21 ≤ t1|τ = s, τ = W11
}= P
{W
′21 ≤
t1|W21 > s}
and P{
Y′1 ≤ t2|τ = s, τ = W11
}= P
{Y′1 ≤ t2|Y1 > s
}. 2
104
Appendix D
Special property of minimum of
exponential random variables
Property: P{
τ = W11|τ = s}
=λd1
λ.
Proof: As per Convention C1, XFkj = ∞ when F = (11, 11), and Zki = ∞ a.s.
when C = ([0], [0]). Therefore, τ = min(W11,W21, Y1, Y2, X12, X21). Next, for any
s ≥ 0, we can write
P{τ = W11|τ ≤ s} =P{τ = W11, τ ≤ s}
P{τ ≤ s} (D.1)
Let β := min(W21, Y1, Y2, X12, X21). In accordance to A1 and A2, it is straightforward
to show that β is an exponential random variable with rate θt = λd2 + λf1 + λf2 +
λ12 + λ21, while W11 and β are independent. Observe that {τ = W11} ∩ {τ ≤ s} =
105
Appendix D. Special property of minimum of exponential random variables
{W11 ≤ β} ∩ {W11 ≤ s} = {W11 ≤ min(β, s)}.
P{W11 ≤ min(β, s)}=
∫ ∞
0
P{W11 ≤ min(β, s)|β = b}fβ(b)db
=
∫ ∞
0
P{W11 ≤ min(b, s)|β = b}fβ(b)db
=
∫ s
0
P{W11 ≤ b|β = b}fβ(b)db +
∫ ∞
s
P{W11 ≤ s|β = b}fβ(b)db (D.2)
By independence of W11 and β, P{W11 ≤ b|β = b} = P{W11 ≤ b} and P{W11 ≤s|β = b} = P{W11 ≤ s}. Therefore, (D.2) becomes:
P{W11 ≤ min(β, s)} =
∫ s
0
(1− e−λd1b)θte
−θtbdb +
∫ ∞
s
(1− e−λd1s)θte
−θtbdb
=λd1
λd1 + θt
[1− e−(λd1
+θt)s]
(D.3)
With P{τ ≤ s} = 1− e−(λd1+θt)s, and using (D.1) and (D.3), we get
P{τ = W11|τ ≤ s} =λd1
λ,
where λ = λd1 + λd2 + λf1 + λf2 + λ12 + λ21. But, it can also be shown that
P{τ = W11} =λd1
λ.
Therefore, the probability that the minimum is attained by W11 is independent of
the value of τ , which leads to the conclusion that P{
τ = W11|τ = s}
=λd1
λ. 2
106
References
[1] H. M. Lee, S. H. Chin, J. H. Lee, D. W. Lee, K. S. Chung, S. Y. Jung and H. C.
Yu, “A resource manager for optimal resource selection and fault tolerance service in
grids”, Proc. 4th IEEE International Symposium on Cluster Computing and the Grid,
Chicago, Illinois, USA 2004.
[2] A. Brandt and M. Brandt, “On a two-queue priority system with impatience and
its application to a call center”, Methodology and Computing in Applied Probability,
1:191-210, 1999.
[3] www.planetlab.org
[4] Z. Lan, V. E. Taylor, and G. Bryan, “Dynamic load balancing for adaptive mesh
refinement application”, Proc. ICPP’2001, Valencia, Spain, 2001.
[5] T. L. Casavant and J. G. Kuhl, “A taxonomy of scheduling in general-purpose distrib-
uted computing systems”, IEEE Trans. Software Eng., vol. 14, pp. 141–154, Feb. 1988.
[6] G. Cybenko, “Dynamic load balancing for distributed memory multiprocessors”,
IEEE Trans. Parallel and Distributed Computing, vol. 7, pp. 279–301, Oct. 1989.
[7] Chi-Chung Hui and Samuel T. Chanson, “Hydrodynamic load balancing”, IEEE
Trans. Parallel and Distributed Systems, vol. 10, No. 11, pp. 1118–1137, Nov. 1999.
[8] B.W. Kernighan and S. Lin, “An efficient heuristic procedure for partitioning graphs”,
The Bell System Technical Journal, Vol. 49, pp. 291–307, Feb 1970.
107
References
[9] S. Dhakal, “Load balancing in delay-limited distributed systems”, Masters of Science
Thesis, Electrical and Computer Engineering Department, University of New Mexico,
Dec 2003.
[10] M. M. Hayat, S. Dhakal, C. T. Abdallah, J. Chiasson, and J. D. Birdwell, “Dynamic
time delay models for load balancing. Part II: Stochastic analysis of the effect of delay
uncertainty”, Advances in Time Delay Systems, Springer Series on Lecture Notes
in Computational Science and Engineering, (Keqin Gu and Silviu-Iulian Niculescu,
Editors), vol. 38, pp. 371–385, Springer: Berlin, 2004.
[11] S. Dhakal, B. S. Paskaleva, M. M. Hayat, E. Schamiloglu, and C. T. Abdallah,
“Dynamical discrete-time load balancing in distributed systems in the presence of
time delays”, Proc. 42 nd IEEE Conference on Decision and Controls, Maui, Hawaii,
pp. 5128–5134, Dec 2003.
[12] J. Ghanem, C. T. Abdallah, M. M. Hayat, S. Dhakal, J.D Birdwell, J. Chiasson, and
Z. Tang, “Implementation of the load balancing algorithms over a local area network
and the Internet”, Proc. 43 rd IEEE Conference on Decision and Control, Bahamas
2004.
[13] http://setiathome.ss.berkeley.edu/
[14] M. Litzkow, M. Livny and M. Mutka, “Condor - A hunter of idle Workstations”, Proc.
8th International Conference of Distributed Computing Systems, pp. 104–111, June
1988.
[15] E. Gelenbe, D. Finkel, and S. K. Tripathi, “On the availability of a distributed com-
puter system with failing components”, ACM SIGMETRICS Performance Evaluation
Review, vol. 13, Issue 2, pp. 6–13, 1985.
[16] R. Sheahan, L. Lipsky, and P. Fiorini, “The Effect of Different Failure Recovery
Procedures on the Distribution of Task Completion Times”, Proc. IEEE DPDNS05,
Denver CO, April 2005.
108
References
[17] S. Dhakal, M.M. Hayat, and J.E. Pezoa, “Reliability in distributed queuing systems
in the presence of random delays”, IEEE Trans. Inf. Theory, under review, 2006.
[18] S. Dhakal, M. M. Hayat, J. E. Pezoa, C. Yang, and D. A. Bader, “Dynamic load
balancing in distributed systems in the presence of delays: A regeneration-theory
approach”, IEEE Trans. Parallel and Distributed Systems, to appear, 2006.
[19] S. Dhakal, M. M. Hayat, J. Ghanem, C. T. Abdallah, H. Jerez, J. Chiasson, and J.
D. Birdwell, “On the optimization of load balancing in distributed networks in the
presence of delay”, Advances in Communication Control Networks, Springer series
Lecture Notes in Control an Information Sciences, (S. Tarbouriech, C. T. Abdallah,
and J. Chiasson, Editors) LNCSE vol. 308, pp. 223–244, Springer-Verlag, 2004.
[20] S. Dhakal, J.E. Pezoa, and M.M. Hayat, “A regeneration-based approach for resource
allocation in cooperative distributed systems”, Submitted IEEE 32nd International
Conference on Acoustics, Speech, and Signal Processing (ICASSP 2007) , Hawaii,
USA.
[21] S. Dhakal, M. M. Hayat, J. E. Pezoa, C. T. Abdallah, J. D. Birdwell, and J. Chiasson,
“Load Balancing in the presence of random node failure and recovery”, Proc. IEEE
International Parallel and Distributed Processing Symposium (IPDPS ’06), Rhodes,
Greece, April 2006.
[22] S. Dhakal, M. M. Hayat, J. Ghanem, and C. T. Abdallah, “Load Balancing in Distrib-
uted Computing Over Wireless LAN: Effects of Network Delay”, Proc. IEEE Wireless
Communication & Networking Conference (WCNC-2005), New Orleans, LA, vol. 3,
pp. 1755–1760, March 13–17, 2005.
[23] J. Ghanem, S. Dhakal, C. T. Abdallah, M. M. Hayat, and H. Jerez “Load balanc-
ing in distributed systems with large time delays: Theory and experiment”, Proc.
IEEE/CSS 12th Mediterranean Conference on Control and Automation (MED ’04),
Aydin, Turkey, June 2004.
109
References
[24] Michel Trehel, Chantal Balayer, Abdelghani Alloui, “Modeling load balancing in-
side groups using queuing theory”, 10th International Conference on Parallel and
Distributed Computing System, New Orleans, Louisiana, Oct1 to Oct. 3, 1997.
http://lifc.univ-fcomte.fr/ trehel/PDCS97.ps
[25] A.Cortes, A. Ripoll, M.A. Senar and E. Luque, “Performance comparison of dynamic
load-balancing strategies for distributed computing”, Proc. IEEE 32nd Hawaii Con-
ference on System Sciences, vol.8, p. 8041, 1999.
[26] J. M. Bahi, C. Vivier, R. Couturier, “Dynamic load balancing and efficient load esti-
mators for asynchronous iterative algorithms”, IEEE Trans. Parallel and Distributed
Systems, Vol. 16, No. 4, Apr. 2005.
[27] D.L. Eager, E.D. Lazowska, and J. Zahorjan, “Adaptive load sharing in homogeneous
distributed systems”, IEEE Trans. Software Engineering, vol.12, pp. 662–675, no.5,
May 1986
[28] J. Liu and V.A. Saletore, “Self-scheduling on distributed-memory machines”, ACM
Int’l Conf. in Supercomputing,, pp. 814–823, Nov. 1993.
[29] K.M. Dragon and J.L. Gustafson, “A low-cost hypercube load balance algorithm”,
Proc. Fourth Conf. Hypercube Concurrent Computers and Applications, pp. 583–590,
1989.
[30] T.H. Tzen and L.M. Ni, “Dynamic loop scheduling for shared memory multiproces-
sors”, Int’l Conf. Parallel Processing, vol. 2, pp. 247–250, 1991.
[31] S. Zhou, “A trace-driven simulation study of dynamic load balancing”, IEEE Trans.
Software Eng., vol. 14, no. 9, pp. 1,327–1,341, Sept. 1988.
[32] H.C. Lin and C.S. Raghavendra, “A dynamic load-balancing policy with a central job
dispatcher (LBC)”, IEEE Trans. Software Eng., vol.18, no.2, pp. 148–158, Feb.1992.
[33] H.G. Rotithor and S.S. Pyo, “Decentralized decision making in adaptive task sharing”,
Proc. IEEE International Parallel and Distributed Processing Symposium, Dec. 1990.
110
References
[34] P. Krueger and M. Livny, “The Diverse Objectives of Distributed Scheduling Policies”,
Proc. Seventh Int’l Conf. Distributed Computing Systems, pp. 242–249, 1987.
[35] S. Shenker, A. Weinrib, “The optimal control of heterogeneous queuing systems: A
paradigm for load sharing and routing”, IEEE Trans. Computers, vol. 38, pp. 1724–
1735, Dec. 1989.
[36] K. Kabalan, W. Smari, J. Hakimian, “Adaptive load sharing in heterogeneous systems:
policies, modifications, and simulation”, Int’l Journal of Simulation Systems Science
and Tech., vol. 3, No. 1–2, pp. 89–100, Jun. 2002.
[37] V. Subramani, R. Kettimuthu, S. Srinivasan and P. Sadayappan, “Distributed Job
Schedulingon Computational Grids Using Multiple simultaneous Requests”, Proc.
11th IEEE International Sumposium on High Performance Distributed Computing
HPDC-11, 2002 (HPDC’02), Edinburgh, Scotland, July 24–26, pp. 359–368, 2002.
[38] S. Choi, M. Balik, and C. S. Hwang “Volunteer Availability based Fault Tolerant
Scheduling Mechanism in Desktop Grid Computing Environment”, Proc. 3rd IEEE
International Symposium on Network Computing and Applications, Boston, Massa-
chusetts, August 30th - September 1st, pp. 366–371, 2004.
[39] C. Knessly and C. Tiery,“Two tandem queues with general renewal input I: Diffusion
approximation and integral representation”, SIAM J. Appl. Math.,vol. 59, pp. 1917–
1959, 1999.
[40] F. Bacelli and P. Bremaud, Elements of queuing theory: Palm-Martingale Calculus
and Stochastic Recurrence. New York: Springer-Verlag, 1994.
[41] D. J. Daley and D. Vere-Jones, An introduction to the theory of point processes.
Springer-Verlag, 1988.
[42] J. Ghanem, “Implementation of load balancing policies in distributed systems”, Mas-
ters of Science Thesis, Electrical and Computer Engineering Department, University
of New Mexico, June 2004.
111
References
[43] G. Petrie, G. Fann, E. Jurrus, B. Moon, K. Perrine, C. Dippold, and D. Jones, “A
Distributed computing approach for remote sensing data”, Proc. 34th Symposium on
the Interface, pp. 477–489, 2002.
[44] V. Jacobson , “Congestion avoidance and control”, In Proc. SIGCOMM ’88, (Stan-
ford, CA, Aug. 1988), ACM.
[45] D. L. Snyder and M. I. Miller, Random Point Processes in Time and Space. New York:
Springer-Verlag, 1991.
[46] L. Tassiulas and A. Ephremides, “Stability properties of constrained queuing systems
and scheduling policies for maximum throughput in multihop radio networks”, IEEE
Trans. Automatic Control, vol. 37, no. 12, pp. 1936-1948, December 1992.
[47] M. J. Neely, E. Modiano, and C. E. Rohrs, “Dynamic power allocation and routing
for time varying wireless networks”, Proc. of IEEE INFOCOM, San Francisco, April
2003.
[48] D. Y. Burman and D. R. Smith, “A light traffic theorem for multi-server queues”,
Mathematics of Operations Research, 8:15-25, 1983.
[49] G. Koole, P. Sparaggis, and D. Towsley, “Minimizing response times and queue lengths
in systems of parallel queues”, Journal of Applied Probability, 36:1185-1193, 1999.
[50] J. Hill, R. Szewczyk, A.Woo, S. Hollar, D. Culler, and K. Pister, “System architecture
directions for networked sensors”, Proc. Int. IX Conf. ASPLOS, Cambridge, MA,
2000.
112