load balancing in communication constrained distributed...

Load Balancing in CommunicationConstrained Distributed Systems: A

Probabilistic Approach

by

Sagar Dhakal

B.E., Electrical and Electronics Engineering, Birla Institute ofTechnology, 2001

M.S., Electrical Engineering, University of New Mexico, 2003

DISSERTATION

Submitted in Partial Fulfillment of the

Requirements for the Degree of

Doctor of Philosophy

Engineering

The University of New Mexico

Albuquerque, New Mexico

December, 2006

c©2006, Sagar Dhakal

iii

Dedication

To my parents and my wife

iv

Acknowledgments

I would like to thank my advisor Prof. Majeed M. Hayat for his continual guidance

and encouragement in my research work. It has been a privilege working with him

for the past four and a half years. I will always admire his analytical approach to

attack hard problems and ability to solve them rigorously. In addition, I have enjoyed

and learned immensely from his classroom lectures, which are always well-prepared

and theoretically deep. I also appreciate the financial support that he has provided

during my graduate study. Further, I thank him for giving me the opportunities to

attend and present my research findings in international conferences.

I thank Prof. James Ellison for accepting to be in my dissertation committee and

sharing his expertise in the fields of probability theory and differential equations. His

deep academic grasp, excellent teaching skill and friendly nature has always been a

great inspiration to me. I have learnt a lot from his probability theory and measure

theory lectures and homework assignments. I would also like to thank Prof. Balu

Santhanam, Prof. Chaouki T. Abdallah, Prof. Sudharman K. Jayaweera and Prof.

Yasamin Mostofi for being in my committee, reading my dissertation and providing

useful suggestions during and after the defense exam.

v

I take this opportunity to thank Mr. Jorge E. Pezoa, Mr. Cundong Yang and Mr.

Mohamed Elyas, who are my colleagues from the Load Balancing Group at UNM,

for their helpful suggestions. In particular, I am grateful to Mr. Pezoa for helping

me with this work. Working with him has always been enjoyable and at times both

of us have found ourselves to be more creative and efficient while collaborating with

each other.

I would also like to thank my parents, whose blessings and encouragement have

lead to the successful completion of this work. Finally, I would like to extend my

warmest gratitude to my wife for being nice, supportive and patient during hard and

busy times. She also deserves a special mention for proof-reading the manuscript.

This work has been supported by Prof. Majeed M. Hayat’s National Science

Foundation Information Technology Research (ITR) Grants No. ANI-0312611 and

ANI-0312182.

vi



by

Sagar Dhakal

ABSTRACT OF DISSERTATION

Submitted in Partial Fulfillment of the

Requirements for the Degree of

Doctor of Philosophy

Engineering

The University of New Mexico

Albuquerque, New Mexico

December, 2006



by

Sagar Dhakal

B.E., Electrical and Electronics Engineering, Birla Institute of

Technology, 2001

M.S., Electrical Engineering, University of New Mexico, 2003

PhD, Engineering, University of New Mexico, 2006

Abstract

The effectiveness of cooperative computing in a distributed, reconfigurable environ-

ment depends upon appropriate utilization of the available computing and commu-

nication resources. Any cooperative strategy for load distribution among nodes is

called load balancing. On one hand, the overall computing power of the system

may be improved by distributing workloads to all constituent nodes in proportion

to their loads and processing rates. On the other hand, however, it may seem more

prudent to assign most of the incoming loads to the reliable nodes in order to im-

prove the robustness of the system. At the same time, since nodes are connected by

means of shared communication medium, such as the Internet or a wireless local area

network (WLAN), load-transfer activities are not instantaneous but assume finite de-

lays. Moreover, delays incurred in bandwidth-limited mediums, such as the wireless

viii

infrastructure-based network and the wireless ad-hoc networks, are random, making

their accurate prediction impossible. The presence of such random delays in inter-

node communication can work against the benefits of load balancing in two principal

ways: (1) the load-balancing decision will rely upon dated information about the

load-state of the systems, and (2) any load being transferred will remain in transit in

the network for a random amount of time, thereby postponing the intended benefi-

cial effect of load balancing. These two factors may create system instabilities where

loads are transferred unduly between nodes, thereby increasing the service time. In

addition, due to the possibility of spontaneous or attack-induced node failures, the

number of functional nodes is also dynamically changing in a random fashion. This

together with the occurrences of random communication delays will introduce un-

certainties in the node-fault detection and correction mechanisms. In summary, a

cooperative distributed computing system is an unpredictable, reconfigurable envi-

ronment, whose performance must be optimized in a stochastic framework. To this

end, designing an effective load-balancing policy for such systems is a constrained

optimization problem that aims to maximize the usage of computing resources and

the overall reliability of the system while minimizing communication overhead.

In this dissertation, a novel queuing approach, based on stochastic regeneration,

is formulated to analyze the joint evolution of the distributed queues corresponding

to a multidimensional queuing model of a general cooperative distributed system.

The model specifically considers the randomness and heterogeneity in processing

times of the nodes, randomness in delays in the communication network, and un-

certainty in the number of functional nodes. Coupled renewal equations are derived

for ceratin classes of static and dynamic load-balancing policies. In order to reduce

the computational overhead associated with the scalability of optimal load-balancing

policy to large-scale systems, a sub-optimal approach for dynamic load balancing is

also proposed and tested. Our approach is general and can be adapted to resilient

communication networks, routing in wireless networks, and wireless sensor networks.

ix

The performance of proposed load balancing policies are evaluated using analyt-

ical, experimental and Monte-Carlo simulation methods. In particular, the interplay

between the optimal amount of load-transfers between nodes, node-failure/recovery

rates, and the average load-transfer delays are rigorously investigated. The perfor-

mance of the proposed dynamic load-balancing policy is compared to that of existing

static and dynamic load-balancing policies. Additionally, the theory is applied to a

distributed wireless-sensor network and the interplay between the total service time

and the energy consumption of each sensor is shown.

x

Contents

List of Figures xv

List of Tables xvii

Glossary xviii

1 Introduction 1

1.1 Dissertation overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Overview of Load-balancing Policies 8

2.1 Classification of load-balancing policies . . . . . . . . . . . . . . . . . 8

2.1.1 Local versus global load balancing . . . . . . . . . . . . . . . . 9

2.1.2 Static versus dynamic load balancing . . . . . . . . . . . . . . 9

2.1.3 Centralized versus distributed load balancing . . . . . . . . . . 9

2.1.4 Sender-initiated versus receiver-initiated load balancing . . . . 10

xi

Contents

2.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2.1 Balancing scheme for SAMR applications . . . . . . . . . . . . 10

2.2.2 Hydrodynamic algorithm . . . . . . . . . . . . . . . . . . . . . 11

2.2.3 Graph partitioning method . . . . . . . . . . . . . . . . . . . 11

2.2.4 Load balancing using queuing theory . . . . . . . . . . . . . . 12

2.2.5 Shortest-expected-delay (SED) and never-queue (NQ) policy . 12

2.2.6 Fault-tolerant schemes . . . . . . . . . . . . . . . . . . . . . . 13

2.3 Prior work by the LB-research group at UNM . . . . . . . . . . . . . 13

3 One-shot Load Balancing Policy: A Regeneration-theory Approach 15

3.1 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.1.1 Definition of system-information, system-function and network

states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.1.2 Regeneration time . . . . . . . . . . . . . . . . . . . . . . . . 20

3.2 System without node-failures . . . . . . . . . . . . . . . . . . . . . . 21

3.2.1 Description of the one-shot load balancing policy . . . . . . . 22

3.2.2 Renewal equations . . . . . . . . . . . . . . . . . . . . . . . . 25

3.3 System with permanent node-failures . . . . . . . . . . . . . . . . . . 32

3.3.1 Renewal equations . . . . . . . . . . . . . . . . . . . . . . . . 33

3.3.2 Calculation of the initial condition . . . . . . . . . . . . . . . 41

3.4 System with recoverable node-failures . . . . . . . . . . . . . . . . . . 45

xii

Contents

3.4.1 Analysis of proactive LB policy: LBP-1 . . . . . . . . . . . . . 46

3.4.2 Analysis of reactive LB policy: LBP-2 . . . . . . . . . . . . . 52

3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4 Experimental, Theoretical and Simulation Results 55

4.1 Distributed computing system architecture . . . . . . . . . . . . . . . 56

4.2 Empirical estimation of system parameters . . . . . . . . . . . . . . . 57

4.3 System without node-failures . . . . . . . . . . . . . . . . . . . . . . 60

4.4 System with recoverable node-failures . . . . . . . . . . . . . . . . . . 63

4.5 System with permanent node-failures . . . . . . . . . . . . . . . . . . 68

4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5 Dynamic Load Balancing Policy 75

5.1 Formulation of the DLB Policy . . . . . . . . . . . . . . . . . . . . . 76

5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

5.2.1 Comparison to other DLB policies . . . . . . . . . . . . . . . . 84

5.3 Sub-optimal LB policy for an n-node system . . . . . . . . . . . . . . 86

5.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

6 Application to Wireless Sensor Networks 90

6.1 Description of wireless sensor networks . . . . . . . . . . . . . . . . . 90

6.2 Queuing equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

xiii

Contents

6.3 LB policies for two cooperating sensors . . . . . . . . . . . . . . . . . 92

6.3.1 Extension to n cooperating sensors . . . . . . . . . . . . . . . 94

6.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

7 Future work 97

7.1 Resilient distributed networks . . . . . . . . . . . . . . . . . . . . . . 97

7.2 Wireless sensor networks . . . . . . . . . . . . . . . . . . . . . . . . . 98

Appendices 100

A Optimality of partitions in the ideal case 101

B Proof of Equation (3.24) 102

C Proof of conditional independence of random delays W′21 and Y

′1 103

D Special property of minimum of exponential random variables 105

References 107

xiv

List of Figures

2.1 Average completion time as a function of LB instant. . . . . . . . . 14

2.2 Average overall completion time as a function of LB gain. . . . . . . 14

4.1 Empirically estimated pdfs of the processing time per task for the

Transmeta Crusoe machine (top) and Intel P4 machine (bottom) as

well as their exponential approximations (solid curves). . . . . . . . 58

4.2 Empirical pdf’s of the load-information delays from the first node to

the second node obtained on the Internet (left) and on the EECE

WLAN (right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4.3 Left: Average transfer delay as a function of the number of tasks

transferred between nodes. The stars are the actual realizations from

the experiments. Right: Empirical pdf of the transfer delay per task

on the Internet under a normal work-day of operation. . . . . . . . 59

4.4 Left: The AOCT as a function of LB instants for the experiments

over the Internet. The LB gain was fixed at 1. Right: Amount of

load transferred between nodes at different LB instants. . . . . . . . 61

4.5 The AOCT under different LB gains for the Internet (left) and the

WLAN (right). The LB instant was fixed at 2s. . . . . . . . . . . . . 62

xv

List of Figures

4.6 Left: The AOCT as a function of the LB gain in presence of large

transfer delay. The LB instant was fixed at 2s. Right: Theoretical

result on the optimal LB gain for different delays. . . . . . . . . . . 64

4.7 The average overall completion time as a function of the LB gain K

for the LBP-1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.8 A realization of the queues obtained from the experiments conducted

for LBP-1 and LBP-2. . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.9 The cumulative distribution function of the overall completion time

in LBP-1. The upper figure shows the case of an initial workload of

(50, 0), while the lower figure is for an initial workload of (25, 50). . . 69

4.10 Probability of success as a function of LB gain of the first node when

LB is performed at time t = 0. Stars represent the Monte-Carlo

simulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

4.11 Probability of success as a function of LB gain of the second node

when LB is performed at time t = 0. Stars represent the Monte-Carlo

simulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

4.12 Probability of success as a function of LB instant tb, while LB gains

of both nodes are kept at 0.5. . . . . . . . . . . . . . . . . . . . . . . 73

5.1 Adaptive estimation of the average transfer delay per task. . . . . . 81

5.2 One realization of the queues under a static LB policy using a fixed

gain K = 1 (left) and DLB policy (right). . . . . . . . . . . . . . . . 84

6.1 Expected value of the total service time (left) and the battery-energy

consumed by the sensors (right) under different LB gains. . . . . . . 94

xvi

List of Tables

4.1 Experimental results for LBP-1 using the theoretically determined

optimal LB gains. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

4.2 Experimental and simulation results for LBP-2. . . . . . . . . . . . . 67

4.3 Performance of the LBP-1 and the LBP-2 under different network

delays. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

5.1 Experimental results for dynamic and static LB policies . . . . . . . 82

5.2 Experimental results of the ACTT for DLB policies . . . . . . . . . 86

5.3 Probability of success achieved under different policies . . . . . . . . 88

6.1 The AST and the AEC under the MST, FE and FT policies. . . . . 95

xvii

Glossary

ACTT Average completion time per task.

AEC Average energy consumption.

AOCT Average overall completion time.

AST Average service time.

cdf Cumulative distribution function.

DLB Dynamic load balancing.

FE Fair-energy.

FT Fair-tradeoff.

LB Load balancing.

MC Monte-Carlo.

MPN Minimum-packets-number.

MST Minimum-service-time.

NQ Never queue.

pdf Probability density function.

xviii

Glossary

POSIX Portable operating system interface for unix.

SAMR Structure adaptive mesh refinement.

SED Shortest expected delay.

SPR System processing rate.

TCP Transmission control protocol.

UDP User datagram protocol.

UNM University of New Mexico.

WLAN Wireless local-area networks.

xix

Chapter 1

Introduction

Distributed systems typically comprise geographically dispersed processors or nodes

that can communicate over a shared network of arbitrary topology. Some examples

of distributed systems are grid-computing systems, distributed telecommunication

systems, wireless sensor networks and embedded systems. Quality of service in a

distributed system can be improved by allowing its constituent nodes to work co-

operatively. For example, in distributed grid-computing systems, nodes can jointly

process computational loads in order to decrease the overall processing time of the

load [1]. Similarly, in distributed telecommunication systems, the switching stations

(or call centers) can cooperatively handle the incoming traffic in order to minimize

the rejection of new calls while keeping acceptable response times for the admitted

calls [2].

In distributed systems, loads of different sizes (possibly corresponding to different

applications) arrive randomly in time and node space. Clearly, severe load imbalance

may occur if nodes work independently without sharing loads. However, this situa-

tion can be avoided if the overloaded nodes can transfer loads to the under-loaded

nodes for cooperative processing. Any strategy for load transfer or distribution

1

Chapter 1. Introduction

among nodes is called load balancing (LB). Ideally (if we ignore reliability of nodes

and communication constraints), an effective LB policy ensures an optimal use of

the distributed resources whereby no node remains in an idle state while any other

node is being utilized. But, there are several challenges arising from the distributed

systems that have to be addressed carefully in order to design an effective LB policy.

In many of today’s distributed-computing environments, the nodes are linked

by a delay- and bandwidth-limited communication medium that inherently inflicts

tangible delays on inter-node communications and load transfers. Examples include

distributed-systems over WLANs as well as clusters of geographically distant nodes

connected over the Internet such as PlanetLab [3]. Although the majority of LB poli-

cies developed heretofore take account of such time delays [4–8], they are predicated

on the assumption that delays are deterministic. In actuality, delays are random

in such communication media. In particular, WLANs inevitably introduce random

delays and packet losses due to scarce radio spectrum, random power fluctuations,

time-varying channel gains, and interference. Our earlier work has shown that LB

policies that do not account for the delay randomness may perform poorly in prac-

tical distributed settings where random delays are present [9–12]. For example, if

nodes have dated, inaccurate information about the state of other nodes, due to

random communication delays between nodes, then this could result in unnecessary,

periodic exchange of loads among them. Consequently, certain nodes may become

idle while loads are in transit, a condition that would result in prolonging the total

completion time of a load.

Generally, the performance of LB in delay-infested environments depends upon

the selection of LB instants as well as the amount of load-transfers allowed between

nodes. For example, if the network delay is negligible within the context of a certain

application, the best performance is achieved by allowing every node to send all its

excess load (relative to the average load per node in the system) to less occupied

2


nodes. On the other hand, in the extreme case for which the network delays are

excessively large, it would be more prudent to reduce the amount of load-transfers so

as to avoid time wasted while loads are in transit. Clearly, in a practical delay-limited

distributed-computing setting, the amount of load to be exchanged lies between these

two extremes and the amount of load-transfer has to be carefully chosen.

Another important aspect in distributed systems is the issue of fault-tolerance

and reliability of service. In general, distributed systems may utilize dynamic sets

of nodes that may join and leave the system in a random fashion. An example of

such systems is “SETI at Home” [13]. Such systems typically use dedicated as well

as dynamic nodes comprising a collection of desk-tops or portable computing devices

that are online and can be used remotely upon availability. However, these nodes can

go off-line anytime, regardless of the portion of load assigned to them. Furthermore,

the participation of any node may be interrupted by the local usage of the node by

its owner. Such scenarios induce an uncertainty in the availability of the number

of functional nodes, whereby any node may randomly fluctuate between a “failure”

(or “down”) and “working” (or “up”) states. In addition, each component of the

system, either a node or a communication link, can undergo permanent physical

failure. Due to the heterogeneous nature of a distributed system (where components

may be provided by different manufacturers), system warranty is not provided and

the issue of fault tolerance is largely left at the hands of the developers.

Generally, when a fault occurs at one point, it has to be communicated to other

components and the fault is detected upon the reception of such communications

after random amount of times. Therefore, one should expect uncertainty in the

information about the functional components of the system. Similarly, due to the

random load-transfer delays, the mechanisms that are responsible for retrieving loads

from the queues of a faulty node must also be analyzed in a statistical framework.

Most of the existing literature in distributed systems that offers analytical treatment

3


of reliability disregards uncertainties associated with the fault-detection and load-

retrieval procedures [1, 14–16]. But, these uncertainties degrade the performance of

a LB policy that does not account for them.

Finally, we should note that most distributed systems are composed of hetero-

geneous nodes with each node possibly having different processing rate. Also, due

to the unpredictable characteristics of the incoming load (or application), each node

exhibits fluctuations in run-time processing rates. Furthermore, in energy-limited

applications like wireless sensor networks, the amount of energy spent on transfer-

ring loads between nodes should also be considered. Necessary care must be exercised

so that no excessive amount of time and energy is unduly wasted in performing LB

while the collective processing power of the distributed system is utilized maximally.

In summary, to design an effective LB policy, the following inherent factors of the

distributed systems should be captured: (i) the amount of load at each node (i.e., the

queue size), (ii) the heterogeneity in the processing rates of the nodes and the run-

time variation in processing times, (iii) the randomness in delays and the bandwidth

constraints in the communication network components involved in the transfer of

loads, (iv) the uncertainty in the number of functional nodes and associated fault-

detection and load-retrieval process, and (v) the energy overhead resulting from the

collaborative nature of the nodes. To the best of our knowledge, there are no LB

policies designed with due consideration to the above mentioned factors [5].

1.1 Dissertation overview

Chapter 2 presents a brief survey of existing LB policies and also discusses our ear-

lier work in LB. The one-shot LB policy is detailed in Chapter 3 for three types

of distributed systems; namely, (1) system without node-failures, (2) system with

recoverable node-failures, and (3) system with permanent node-failures. In the one-

4


shot LB policy, once nodes are initially assigned certain amount of load, all nodes

would together execute LB only at one prescribed instant. A novel queuing approach,

based on regeneration in stochastic processes, is formulated to analyze the dynam-

ics of the distributed system evolving under a one-shot LB action. Our approach

specifically considers the heterogeneity and run-time fluctuation in processing rates

of the nodes, randomness in delays in the communication network and the failure

and recovery probabilities of each node. Coupled renewal equations characterizing

both the expected value of the overall completion time for a given initial load as well

as the probability of success in completing a given initial load have been derived for

different types of distributed systems.

Chapter 4 presents the experimental, theoretical and Monte-Carlo (MC) sim-

ulation results on the performance of one-shot LB policy in distributed systems

connected over the Internet and the WLAN. The results show that for an arbitrary

initial load, there exist an optimal amount of load-transfer and an optimal LB instant

associated with the one-shot LB policy, which together maximize the performance

of the system.

In Chapter 5, a sender-initiated distributed dynamic load balancing (DLB) policy

is presented where each node autonomously executes LB at every external load (load

originating outside the system) arrival at that node. In particular, every time an

external load arrives at a node, only the receiver node executes a locally optimal

one-shot LB action. While calculating the amount of load-transfers, the proposed

DLB policy effectively trades-off between the queuing delays and the transfer delays

in order to maximize the system performance. A comparative study shows that

the proposed DLB policy outperforms other commonly used dynamic load balancing

policies. The chapter also presents a sub-optimal yet effective and computationally

efficient DLB policy for a multi-node system.

In Chapter 6, our theory is applied to develop an optimal LB policy for energy-

5


limited distributed sensor networks. The results show that there is a fundamental

tradeoff between savings in completion time, resulting from utilizing the processing

power of a distributed system cooperatively, and the combined delay and energy

overhead resulting from the very collaborative nature of the servers. Finally, Chap-

ter 7 presents the potential application of our theoretical approach in solving complex

queuing problems associated with the resilient communication networks and wireless

sensor networks.

1.2 Contributions

Based on the multi-server queues, this dissertation presents the first analytical model

for a distributed system that specifically considers the randomness and heterogeneity

in processing times, randomness in delays in the communication network, the ran-

domness in the number of functional nodes, and the energy overhead resulting from

collaboration of nodes. Three fundamental vectors have been introduced to track the

complicated point processes associated with the joint evolution of the multiple queues.

Then, with a exponential-delays assumption, theory of stochastic regeneration for

distributed systems is detailed. Coupled renewal equations are derived for the aver-

age overall completion time and the probability of success corresponding to different

versions of one-shot LB policy, and a novel approach to dynamic load balancing is

presented subsequently.

Based on the results obtained from real-time LB experiments performed over the

WLAN and the Internet, Chapter 4 signifies the practical applicability and effec-

tiveness of the theoretical models of Chapter 3. It also analytically establishes our

earlier simulation-based notion that the one-shot LB policy can be optimized over

the amount of load-transfers between nodes and the selection of the LB instant.

Moreover, the Chapter sheds the following two valuable insights: (i) if the average

6


transfer delay per task is large compared to the average processing time per task,

the amount of load transfers have to be reduced appropriately in order to improve

the system performance, and (ii) the proactive LB policy outperforms the reactive

LB policy when the network delays are large compared to the average recovery times

of nodes and vice versa.

Chapter 5 presents a sender-initiated DLB policy that can efficiently handle ran-

domly arriving external loads. The proposed DLB policy adapts to the dynamic

environment of the distributed system and does not require synchronization among

nodes. The experimental results show that the proposed DLB policy outperforms

other commonly used DLB policies in delay-infested random environment. A sub-

optimal DLB policy is also given. Finally, in Chapter 6 the applicability of our

theoretical approach in other dynamical systems is exhibited by designing a novel

energy-aware LB policy for distributed wireless sensor network applications.

To date, our work has resulted in two journal papers [17, 18], two book chapters

[10,19] and five conference papers [11,20–23].

7

Chapter 2

Overview of Load-balancing

Policies

This Chapter begins with a brief description of different types of LB policies, which

is followed by a survey of related works in this area. Finally, prior work in LB

performed by the LB-research group at the University of New Mexico (UNM) is

summarized.

2.1 Classification of load-balancing policies

The LB policies can broadly be categorized as local versus global, static versus dy-

namic, centralized versus distributed and sender-initiated versus receiver-initiated [5].

Next, we provide a brief exposition to each category.

8

Chapter 2. Overview of Load-balancing Policies

2.1.1 Local versus global load balancing

In a local LB policy [24, 25], each node can transfer load only to a group of neigh-

boring processors, thereby minimizing remote communications. But, in a global bal-

ancing policy, a certain amount of global information is used to initiate LB among

all participant nodes. In this scheme, the load distribution cost can outweigh the

computational gain for a sufficiently large system.

2.1.2 Static versus dynamic load balancing

Static load distribution assigns load to nodes probabilistically or deterministically (as

in a round-robin fashion), without consideration of runtime events. This scheme has

a limited application in realistic distributed systems since it is generally impossible

to make predictions of arrival times of loads and processing times required for future

loads. On the other hand, in a DLB policy [26], the load distribution is made during

run-time based on current processing rates and network condition. A DLB policy can

use either local or global information. A global DLB policy is proposed in Chapter 5

of this dissertation.

2.1.3 Centralized versus distributed load balancing

Centralized schemes [27,28] store global information at a designated node. All sender

nodes (or receiver nodes) access the designated node to calculate the amount of load-

transfers as well as to identify where tasks are to be sent to (or received from). The

drawback of this scheme is that the LB is paralyzed if the particular node that con-

trols LB fails. Such centralized schemes also require synchronization among nodes.

In contrast, in a distributed LB policy, every node executes balancing autonomously.

In some cases [6, 29, 30], the idle nodes can fetch load during runtime from a shared

9


global queue.

2.1.4 Sender-initiated versus receiver-initiated load balanc-

ing

In sender-initiated LB policy [31], the overloaded nodes transfer one or more of their

tasks to the under-loaded nodes, while in receiver-initiated LB policy [32], the under-

loaded nodes request loads from the overloaded nodes. In some cases [33, 34], both

the under-loaded as well as the overloaded nodes can initiate load transfers.

2.2 Related work

In this section, we describe some LB policies that are commonly referred in the

current literature.

2.2.1 Balancing scheme for SAMR applications

The authors propose a DLB algorithm for structured adaptive mesh refinement

(SAMR) applications on distributed systems [4]. The balancing scheme is divided

into two phases: (i) global LB phase and (ii) local LB phase. Both the local and

the global balancing phase occur periodically, but the period for global LB is much

longer than the period for local LB. First, the load redistribution cost, which includes

both the communication and the computation overhead, for global LB is heuristi-

cally evaluated. Then, the global LB is invoked only if the computational gain is

bigger than the redistribution cost by some pre-defined factor. At the local level, LB

is performed based only on the processing rates of the nodes, while the presence of

stochastic delays is compromised.

10


2.2.2 Hydrodynamic algorithm

In the approach given in [7], each node is viewed as a liquid cylinder, where the cross-

sectional area corresponds to the buffer capacity of the node, the communication

links are modeled as liquid channels between the cylinders, the load is represented

by liquid, and the LB algorithm manages the flow of the liquid. The objective is

to reach the equilibrium state where the heights of the liquid columns are same in

all the cylinders. Load redistribution is performed so that each node obtains its

share in proportion to its capacity. A potential energy function is introduced, whose

minimum value corresponds to the state of equilibrium. It is assumed that the

communication channels induce fixed delays and LB activity is completed within a

fixed interval. Further, every LB step involves state information exchanges and load

migration among the neighboring nodes. The authors show that the global potential

energy converges geometrically to the equilibrium state.

2.2.3 Graph partitioning method

In [8], the authors consider a graph of arbitrary number of nodes having weighted

costs on edges. In load balancing, the weight of a node represents the size of the load

at that node and the cost of an edge represents the amount of data transfer between

the nodes connected over that edge. The objective is to partition the nodes into

subsets of given weights in order to minimize the sum of the costs on all edge cuts.

It is shown that finding an optimal solution using a strictly exhaustive procedure

requires an inordinate amount of computation, and therefore, solving the problem

heuristically is a quick approach to produce sub-optimal solutions. This model is

predicated by the assumption that the computation time and the communication

delays are deterministic.

11


2.2.4 Load balancing using queuing theory

A DLB policy based on queuing theory approach is discussed in [24]. The authors

propose an algorithm that compares the local load of every node to the load of all

other nodes and migrates a task whenever the load difference between nodes is more

than one task. Every time a new task is created, the proposed policy gets triggered.

If the communication cost is too high, the migration is avoided even if the imbalance

exists. The authors have developed their analytical model for two groups each with

two processors and the LB is performed in two phases: intra-group and inter-group.

The task arrival rate and the service rate at all the nodes are assumed to be the same.

They also assume fixed communication delays. Clearly, this model is not suitable for

delay-limited distributed systems.

2.2.5 Shortest-expected-delay (SED) and never-queue (NQ)

policy

The SED policy [35] is based on the multiple-queues model, and can be performed in

a centralized as well as distributed fashion. Whenever a new task arrives at a node,

the algorithm calculates the expected service time offered by each node. Then, the

task joins the queue of the node that gives the shortest expected service time. On

the other hand, in the NQ policy [36], all the incoming tasks are assigned to the node

that has an empty queue. If more than one node have an empty queue, the SED

policy is invoked only among the nodes with the empty queues. Similarly, if none of

the queues is empty, the SED policy is invoked among all nodes.

12


2.2.6 Fault-tolerant schemes

Checkpoint-resume or terminate-restart mechanisms are used to detect failures and

recover unprocessed tasks at the failed nodes [1, 14]. The node failure has also been

tackled by keeping multiple copies of load on different nodes [37]. But, the work of

Choi et al. [38] addresses performance of the system based only on the unreliable

resources, without addressing the cooperation between dedicated and non-dedicated

nodes.

2.3 Prior work by the LB-research group at UNM

We begin with a literary exposition to the LB model [9]. An arbitrary number of

distributed nodes having different queue sizes is considered. Initially, each node

broadcasts its queue-size information, which will be delayed by some time (referred

as communication delay) while reaching other nodes. All the nodes execute LB

together at common balancing instants (called LB instants). At the LB instant, each

node calculates its excess load by comparing its load to the total load of the system

and partitions its positive excess load among other nodes. Then, each partition is

scaled by multiplying with a common balancing gain (called LB gain K ∈ [0, 1])

before transferring them to the appropriate receiver nodes. We used the custom-

made MC simulation software [10] to evaluate the performance of this continuous

balancing policy. In our results on continuous LB policies, nodes were found to

unnecessarily exchange tasks back and forth, thereby prolonging the total completion

time. We also observed that a large LB gain (closer to one) produces a high degree

of fluctuations in the tail of the queues. Our preliminary studies revealed that for

distributed systems with realistic random communication delays, limiting the number

of balancing instants and optimizing the performance over the choice of the balancing

times as well as the LB gain at each balancing instant can result in significant

13


improvement in computing efficiency.

The degraded performance of the continuous LB policy in presence of random de-

lays motivated us to look into the one-shot LB policy [11]. In particular, once nodes

are initially assigned certain number of tasks, all nodes would together execute LB

only at one prescribed instant using a common LB gain K. The MC simulation re-

sults showed that for a given initial load and average processing rates, there exist an

optimal LB gain and an optimal balancing instant associated with the one-shot LB

policy, which together minimize the average overall completion time. This becomes

evident from the simulation results shown in Fig. 2.1 and Fig. 2.2. Similar results

have been obtained from the real-time experiments conducted over WLAN [23].

0 1 2 3 4 5 47

48

49

50

51

52

53

54

55

56

INSTANTS FOR FIRST BALANCING(ms)

CO

MP

LE

TIO

N T

IME

(ms)

Figure 2.1: Average completion time

as a function of LB instant.

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 140

45

50

55

60

65

GAIN PARAMETER(K )

MIN

IMU

M C

OM

PLE

TIO

N T

IME

(ms)

Small Delay Case

Figure 2.2: Average overall comple-

tion time as a function of LB gain.

However, these preliminary results obtained from MC simulations were not verified

theoretically in our earlier work. Further, the performance was evaluated only on

the two-node and three-node distributed systems without considering the possibility

of node failures/recoveries as well as the energy constraints of the nodes. In addi-

tion, our prior work focused on handling only an initial load without considering

subsequent arrivals of loads.

14

Chapter 3

One-shot Load Balancing Policy:

A Regeneration-theory Approach

In a one-shot LB policy, a cluster of distributed nodes take a single synchronized

scheduling action in order to fairly distribute the load in the system. In particular,

once nodes are initially assigned certain load, each node broadcasts the information

about its local load, while it receives load-information from all other nodes. Based

on the load-information, all nodes would together execute LB at one prescribed in-

stant, called the LB instant. Each load-information packet takes a random amount

of time to reach other nodes, thereby introducing randomness in the shared infor-

mation among the nodes. This behavior is found to be prominent specifically in the

case of distributed systems over the wireless infra-structure based as well as ad-hoc

communication networks. Consequently, a node may or may not have received the

load information from other nodes by the time LB is performed. In addition, once

LB is performed, loads are exchanged between nodes over the communication net-

work, where each load transfer takes random amount of time to reach the destination

node. Therefore, the dynamics of the distributed system, evolving under a one-shot

LB policy, should be represented as a stochastic system involving a set of distrib-

15

Chapter 3. One-shot Load Balancing Policy: A Regeneration-theory Approach

uted queues [10]. In our earlier work [11], MC studies and real-time experiments

conducted over WLAN confirmed that for every initial load distribution, there exist

an optimal LB instant and optimal amount of load exchanges associated with the

one-shot LB policy, which together minimize the average overall load completion

time.

The goal of this Chapter is to develop a novel theoretical approach that can

analytically characterize the performance of a one-shot LB policy (in terms of the

LB instant and the total load exchanges) in a distributed system. The concept of

regeneration in stochastic processes [39–41] is utilized to address the joint evolution

of distributed queues, which eventually leads to the formulation of coupled renewal

equations characterizing the distributed system. This system of renewal equations

are precisely solved to calculate the optimal LB instant and the optimal amount of

load-exchanges required to maximize the performance of the system. The theoretical

approach presented in this Chapter is useful in solving complex queuing problems

that arise in other important areas such as distributed telecommunication networks,

routing in wireless networks and wireless sensor networks. It should be noted that

the numerical results pertaining to the theoretical work of this Chapter are detailed

in Chapter 4.

This Chapter is organized as follows: In Section 3.1, we formally introduce the

concept of regeneration for a general distributed system that follows exponential

distribution of delays. Next, we present three different models of distributed systems;

namely, (1) system with no node failure in Section 3.2, (2) system with permanent

node failure in Section 3.3, and (3) system with recoverable node failure in Section 3.4.

Renewal equations are derived for each model. Finally, our conclusions are given in

Section 3.5.

16


3.1 Theory

Consider a system of n distributed nodes connected over a network of arbitrary

topology. Let mk ≥ 0 be an integer representing the initial queue length (number

of tasks) of the kth node at time t = 0. (Throughout the dissertation, a task is the

smallest and indivisible unit of load, while load is a collection of tasks.) The service

time for each task as well as the failure times and the recovery times, if any, of all

functional nodes are assumed to be random. We assume for the moment that there

is no future arrival of external tasks (i.e., tasks that are not present in the system at

t = 0) to the distributed system for t > 0. In addition, all the nodes are assumed

to be functional at t = 0 when the load-information is also broadcasted by each

node to all other nodes. Each load-information takes a random amount of time to

reach the destination server. In order to divide the total tasks of the system among

all functional nodes, LB is performed at time tb ≥ 0 so that each functional node,

the kth node, say, transfers an amount Ljk(tb) ≥ 0 tasks to the jth node (j 6= k)

that is functional according to the knowledge of the kth node. Naturally, these task

exchanges that occur over the shared communication network take random transfer

times.

In addition, each node is equipped with a backup system that performs the fol-

lowing duties only in the event of a permanent node failure: (i) If a node becomes

faulty, its backup node immediately sends a failure-notice to each functional node.

We assume that it takes a random time for the failure-notice to reach the destination.

(ii) If a node becomes faulty, its backup system immediately distributes all unserved

tasks of the node evenly among remaining functional nodes. (iii) Upon reception of

future tasks by a faulty node (due to LB performed by other nodes at time tb and

before their knowledge of the failed state of the faulty node), the backup system of

the faulty node evenly distributes those tasks back to the remaining functional nodes.

Note that a functional node detects the failure of any remote node by either receiving

17


a failure-notice sent by the backup system of the remote node or by receiving the

unserved tasks of the remote node, whichever happens first.

The purpose of this Chapter is to develop a LB policy, viz., the choice of the

balancing instant tb together with the amounts Ljk(tb) of load exchanges, that max-

imizes the joint performance of n nodes.

3.1.1 Definition of system-information, system-function and

network states

Any intelligent LB action performed by any functional node at time tb should utilize

its knowledge about: (1) the initial load of other nodes; (2) the number of functional

nodes in the system; (3) loads that are in transit over the communication network.

Note that due to the random arrival times of the LIs, each functional node, the kth

node, say, may or may not have received the aforementioned LIs about other nodes

by the time it performs LB at tb. Therefore, for each kth node we assign an n-tuple

binary vector, ik, that describes its information state about the queue-lengths of all

nodes. More precisely, a “1” entry for the jth component of ik (j 6= k) indicates that

the kth node has received the load-information from the jth node. (By definition,

the kth component of ik is always “1.”) Clearly, at t = 0 when the LIs are just

broadcasted, all the entries of ik (except for the kth component) are set to 0. We

define the system-information state as the concatenated vector I4= (i1, . . . , in). For

example, in a system with two nodes (n = 2), the state I = (10, 11) at time t

corresponds to the configuration for which the first node has not received the load-

information from the second node (i1 = (10)) while the second node has received the

load-information from the first node (i2 = (11)) by the time t.

Similarly, for each kth node, we assign a binary vector fk of size n, where a “1”

(“0”) entry in the jth component at time t indicates that the jth node is functional

18


(faulty) as perceived by the kth node at time t. We define the concatenated vector

F4= (f1, . . . , fn) as the system-function state. Note that since all nodes are assumed

to be functional at t = 0, F has all its entries set to “1” at t = 0.

In addition, due to random transfer-delays in the shared communication network,

each group of tasks being transferred over the network has a random arrival time.

Let gk ∈ {0, 1, 2, 3, ...} represent the number of different groups of tasks that are

simultaneously in transit to the kth node at time t. We can assign a vector ck of

size gk + 1 such that each component of ck (except the first component) represents

the number of tasks in a particular group in transit, while the first component of

ck is always set to gk. We now define the network state as the concatenated vector

C4= (c1, . . . , cn). For example, in a 3-node system, C = ([2 10 1], [1 5], [0]) at time

t corresponds to the network state for which two different groups of tasks (10 tasks

in the first group and 1 task in the second group) are being transferred to the first

node (c1 = [2 10 1]), one group of 5 tasks is being transferred to the second node

(c2 = [1 5]), while there is no transfer being made to the third node (c3 = [0]).

Finally, it should be noted that the implicit dependence of I,F and C on time t

becomes evident as we progress deeper into the stochastic analysis of the underlying

queuing process.

In order to achieve an analytically tractable solution, the following assumptions

are imposed on the random times that characterize the queues generated at t = 0.

Assumption A1 (Exponential distribution of delays): The following random

variables are exponentially distributed: (i) Wki: the service time for the ith customer

of the kth functional node (with rate λdk); (ii) Yk: the failure time of the kth func-

tional node (with rate λfk); (iii) Sk: the recovery time of the kth functional node

(with rate λsk); for any j 6= k, (iv) Xkj: the arrival time of the load-information sent

from the jth node to the kth node (with rate λkj); (v) XFkj: the arrival time of the

failure-notice sent from the jth node to the kth node (with rate λFkj); and (vi) Zki:

19


the arrival time of the ith group of tasks sent to the kth node (with rate λ̃ki).

Assumption A2 (Independence of delays): All the random variables listed in

Assumption A1 are mutually independent.

The above assumptions are well approximated according to the empirical data

obtained from actual LB experiments (which will be shown in Chapter 4) conducted

on a distributed system over a WLAN in the context distributed computing [19,22].

In addition, we will assume that the mean transfer delay for the ith group of tasks in

transit to the kth node is λ̃−1ki = θq, where θ is an experimentally calculated channel

constant (in seconds per task), and q is the number of tasks in the ith group.

Convention C1 (Degenerate cases): The following time delays are set to ∞almost surely (a.s.): (i) service time at any node with no customer, (ii) the service

time at a faulty node, (iii) the failure time of a faulty node, (iv) the recovery time

of a functional node, (v) the load-information arrival time when there is no load-

information in transit, (vi) the failure-notice arrival time when there is no failure-

notice in transit, and (vii) the load arrival time when there is no group of tasks in

transit.

3.1.2 Regeneration time

The key idea is to introduce a special random variable, called the regeneration time,

τ , defined as the minimum of the following six random variables: the time to the first

task service by any node, the time to the first occurrence of failure at any node, the

time to the first occurrence of recovery at any node, the time to the first arrival of an

load-information at any node, the time to the first arrival of a failure-notice at any

node, or the time to the first arrival of a group of tasks at any node. More precisely,

τ4= min

(mink(Wk1), mink(Yk), mink(Sk), minj 6=k(Xkj), minj 6=k(X

Fkj), mink,i(Zki)

). Note

that in light of Assumptions A1, A2, and Convention C1, it is straightforward to see

20


that τ is an exponentially distributed random variable.

The key property of the regeneration time τ is that upon the occurrence of the

regeneration event {τ = s}, new queues will emerge at time s (to be proved later)

that have similar statistical properties and dynamics as their predecessors but with

new initial system condition. More precisely, the occurrence of the regeneration event

{τ = s} gives birth to a new distributed queuing system at t = s whose random times

satisfy Assumptions A1 and A2 while having its own initial system condition. The

new initial system condition can be a new initial load distribution if the regeneration

event is a service to a task, a new F and a new C if the regeneration event is a

permanent node failure or a node recovery, a new I if the regeneration event is an

arrival of a load-information, a new F if the regeneration event is an arrival of a

failure-notice, or a new load distribution and a new C if the regeneration event is an

arrival of a load.

For the queuing system that emerges at the regeneration time τ , let the random

times (all measured from τ) W′ki, Y

′k , S

′k, X

′kj, X

F ′kj and Z

′ki, respectively, be the service

time for the ith task at the kth node, the failure-time of the kth node, the recovery-

time of the kth node, the arrival time of the load-information sent from the jth node

to the kth node, the arrival time of the failure-notice sent from the jth node to the

kth node, and the arrival time of the ith group of tasks sent to the kth node. To

prove that the queues are regenerated upon the occurrence of {τ = s}, it suffices to

show that the conditional distributions of W′ki, Y

′k , S

′k, X

′kj, X

F ′kj and Z

′ki given that

the event {τ = s} has occurred, satisfy assumptions A1 and A2.

3.2 System without node-failures

In this Section we consider a distributed system where nodes are always functional

[18]. Clearly, the failure time of any functional node can now be set to infinity

21


almost surely. First, we present a detailed exposition to the one-shot LB policy in

Section 3.2.1, while we derive the renewal equations for the distributed system in

Section 3.2.2.

3.2.1 Description of the one-shot load balancing policy

Let Qj(t) be the number of tasks in the queue of the jth node at time t. Due to the

random delay Xjl of the load-information sent from the lth node to the jth node, the

queue length of the lth node as perceived by the jth node at time t is delayed and is

given by Ql(t−Xjl). We assume that Ql(t−Xjl) = 0 a.s. for all Xjl > t, implying

that the jth node assumes that the lth node has zero queue size if it does not receive

the load-information from the lth node by the time t. At the LB instant t = tb,

every jth node (j = 1, . . . , n) computes its excess load by comparing its local load

to the average overall load of the system. More precisely, the excess load, Lexj (tb), is

random and is given by

Lexj (tb) =

(Qj(tb)−

λdj∑nk=1 λdk

n∑

l=1

Ql(tb −Xjl)

)+

, (3.1)

where and (x)+ 4= max(x, 0). Note that the second quantity inside the parenthesis

in (3.1) is simply the fair share of node j from the totality of the loads in the system.

This is a more plausible way to calculate the excess load of a node in a heterogeneous

computing environment as compared to our earlier methods that did not consider the

processing speeds of the nodes [10, 11]. With the inclusion of the processing speed

of the nodes in (3.1), a slower node would have a larger excess load than that of a

faster node.

Moreover, the excess load has to be partitioned among the n−1 nodes by assigning

a larger portion to a node with smaller relative load. To this end, we introduce two

different approaches to calculate the partitions, denoted by pij, which represent the

22


fraction of the excess load of the jth node to be sent to the ith node. For the

conservation of total load, any such partition should satisfy∑n

l=1 plj = 1 where

pjj = 0 by definition.

In our first approach, the fractions pij for i 6= j, are chosen as:

pij =

1n−2

(1− λ−1

diQi(tb−Xji)P

l6=j λ−1dl

Ql(tb−Xjl)

),

∑l 6=j Ql(tb −Xjl) > 0

λdi/∑

k 6=j λdk, otherwise

(3.2)

where n ≥ 3. Clearly, a node assigns a larger partition of its excess load to a node

with a small load relative to all other candidate recipient nodes. Indeed, it is easy

to check that∑n

l=1 plj = 1. For the special case when n = 2, pij = 1 whenever i 6= j.

But observe that pij ≤ 1n−2

for any ith node. This means that the maximum size of

the partition decreases as the number of nodes in the system increases, irrespective

of the processing rates of the nodes. Therefore, this partition may not be effective in

a scenario where some nodes may have very high processing rates, as compared to

most of the nodes in the system. This observation prompted us to consider a second

partition, which is described below.

In the second approach, the sender node locally calculates the excess load for

each node in the system, and calculates the portions to be transferred accordingly.

For convenience, define Qi(j)(t)4= Qi(t−Xji) and let Lex

i(j)(tb) be the excess load at

the ith node, as calculated by the jth node at the LB instant tb. Then, by using

similar rationale used in (3.1), we obtain the locally computed excess load

Lexi(j)(tb)

4= Qi(j)(tb)− λdi∑n

k=1 λdk

n∑

l=1

Ql(j)(tb). (3.3)

It is straightforward to verify that∑n

i=1 Lexi(j)(tb) = 0 almost surely. Let Vj

4= {j :

Lexj(j)(tb) > 0} be the collection of candidate sender nodes as perceived by the jth

node. Now for any j ∈ Vj, we partition the excess load of the jth node among only

23


those nodes that are perceived by the jth node to be below the average system load.

Let Uj4= {i : Lex

i(j)(tb) < 0} be the collection of such candidate recipient nodes. Now,

the partition pij can be defined as

pij =

Lexi(j)(tb)/

∑l∈Uj

Lexl(j)(tb), i ∈ Uj

0, otherwise.(3.4)

The above partition is most effective when delays are negligible, Qi(j)(t) are deter-

ministic, and tasks are arbitrarily divisible. In this case, if LB is executed together by

all the nodes that do not belong to Uj, each node finishes its tasks together, thereby,

minimizing the overall completion time. The proof of optimality of this partition is

shown in Appendix A.

When delays are present, the partitions defined by (3.2) or (3.4) may not be

effective in general, and the proportions pij must be adjusted. To incorporate this

adjustment, the adjusted load to be transferred to the ith node must be defined as

Lij(tb) = bKijpijLexj (tb)c, (3.5)

where bxc is the greatest integer less than or equal to x, and the parameters Kij ∈[0, 1] constitute the user-specified LB gains. The one-shot LB policy can be sum-

marized as follows: at the LB instant tb, the jth node compares its local load to

the average overall load of the system; then partitions its excess load among n − 1

available nodes using the fractions Kijpij, and dispatches the integral parts of the

adjusted excess loads to other nodes. The objective is to calculate the optimal tb

and the optimal LB gains Kij that will together minimize the total time to serve∑n

i=1 Qi(0) tasks by the totality of n nodes.

24


3.2.2 Renewal equations

In this section, we derive renewal equations that characterize the average overall

completion time (AOCT) for a given initial load distribution under the one-shot LB

policy. The overall completion time is defined as the maximum over completion times

for all nodes. For the moment, we will assume that all nodes execute LB using a

common LB gain Kij = K. This assumption will be relaxed in Chapter 5 to a setting

where nodes execute LB asynchronously using different LB gains.

Let T Im1,...,mn

(tb) be the overall completion time given that the LB is executed

at time tb, where the kth node has mk ≥ 0 tasks at time t = 0, and the system-

information state is I at time t = 0. Let the AOCT be given as µIm1,...,mn

(tb) :=

E[T Im1,...,mn

(tb)]. The goal is to calculate the AOCT when the system starts in an

uninformed state, viz., I = (10...0, 01...0, . . . , 00...1). However, to exploit the regen-

eration theory, we need to calculate the AOCT for an arbitrary system-information

state I.

Theorem 1: For n ∈ IN, m1, . . . ,mn ∈ ZZ+, and tb ≥ 0, the AOCT µIm1,...,mn

(tb)

satisfies the following set of 2n(n−1) (one for each initial system-information state I)

difference-differential equations shown in (3.6):

dµIm1,...,mn

(tb)

dtb

=n∑

k=1

λdkµI

m1−δ1,k,...,mn−δn,k(tb) +

n∑

k=1

∑

j 6=k

λkjµIkj

m1,...,mn(tb)− λµI

m1,...,mn(tb) + 1, (3.6)

where δj,k = 1 is the Kronecker delta, λ =∑n

k=1(λdk+

∑j 6=k λkj), Ikj is identical to I

with the exception that the jth component of ik is 1, and mk−1 is set to 0 whenever

mk = 0.

Before proving Theorem 1, we first present and prove Lemmas 1–3. As there

25


is no node failure, according to Convention C1 the regeneration time becomes τ =

min(mink(Wk1), minj 6=k(Xkj), mink,i(Zki)

). On {τ ≤ tb}, we define T

′Im1,...,mn

(tb) as

the time taken by the new queueing system emerging at τ to serve all the customers

in the system if LB is performed jointly by all nodes at time tb provided that the

system-information state at t = τ is specified by I while mk ≥ 0 tasks (k = 1, . . . , n)

are in the queue of the kth node at t = τ .

Lemma 1: For s ≤ tb, E[T Im1,...,mn

(tb)|τ = s, τ = W11] = s+E[T Im1−1,...,mn

(tb−s)].

Proof: The regeneration event {τ = s, τ = W11} implies that the first activity

to occur in the queuing system is precisely the completion of the first task at the

first server. Since in this case LB is not performed till τ , load transfer does not

occur in the queuing system that emerges prior to time t = s. Therefore, according

to Convention C1, P{Z′ki = ∞|τ = s, τ = W11

}= 1. Also, observe that upon the

occurrence of {τ = s, τ = W11} the system-information state I remains unchanged,

while the system load distribution at time τ becomes m1 − 1 tasks in the queue of

the first node, and m2, . . . ,mn tasks in the queue of the remaining nodes. Therefore,

by construction E[T Im1,...,mn

(tb)|τ = s, τ = W11] = E[τ + T′Im1−1,...,mn

(tb)|τ = s, τ =

W11] = s + E[T′Im1−1,...,mn

(tb)|τ = s, τ = W11].

The proof of Lemma 1 is complete once we show that E[T′Im1−1,...,mn

(tb)|τ = s, τ =

W11] = E[T Im1−1,...,mn

(tb − s)]. Observe that by definition, W′k1 = Wk1 − τ and X

′jk =

Xjk − τ for k, j ∈ {1, . . . , n}, j 6= k. Moreover, it is elementary to show (see

Appendix B for proof in a more general setting) that P{

W′21 ≤ t|τ = s, τ = W11

}=(

1 − e−λd2t)u(t) and P

{X

′jk ≤ t|τ = s, τ = W11

}=

(1 − e−λjkt

)u(t), and for

all j ≥ 2 and k ∈ {1, . . . , n}, W′kj have identical distributions. In other words,

conditional upon the occurrence of {τ = s, τ = W11}, all random times of the newly

emerging queuing system satisfy Assumption A1. Similarly, it can also be shown

that conditional upon the occurrence of {τ = s, τ = W11}, all random times of the

26


emerging queuing system, viz., W′ki, and X

′jk are mutually independent, thereby,

satisfying Assumption A2. The generalized proof of conditional independence of the

random times is provided in Appendix C.

In summary, we have shown that conditional on the occurrence of {τ = s, τ =

W11}, the random times characterizing the queuing system at time s satisfy Assump-

tions A1 and A2. Therefore, by shifting the time origin from t = 0 to t = s, we can

look at the emergent queuing system as the original queuing system with m1 − 1

tasks at the first node, m2, . . . , mn tasks in the queue of the remaining nodes, while

the system-information state I is unchanged. Nonetheless, due to the shift of origin,

the scheduling instant is now at tb − s units of time from the new origin. Therefore,

we conclude that E[T′Im1−1,...,mn

(tb)|τ = s, τ = W11] = E[T Im1−1,...,mn

(tb − s)], which

completes the proof of Lemma 1. 2

Lemma 2: For s ≤ tb, E[T Im1,...,mn

(tb)|τ = s, τ = Xkj] = s + E[T Ikj

m1,...,mn(tb − s)].

Proof: Here, the regeneration event is the arrival of the load-information sent

from the jth node to the kth node. Therefore, upon the occurrence of {τ = s, τ =

Xkj}, in accordance to Section 3.1.1, the jth component of ik becomes 1, while all

other il (l = 1, . . . , n and l 6= k) as well as the load distribution remain unchanged.

Therefore, E[T Im1,...,mn

(tb)|τ = s, τ = Xkj] = E[τ + T ′Ikj

m1,...,mn(tb)|τ = s, τ = Xkj] =

s + E[T ′Ikj

m1,...,mn(tb)|τ = s, τ = Xkj], where Ikj is identical to I except that the jth

component of ik is 1. Next, based on similar analysis as given in Lemma 1, we

can show that all the random times of the queuing system that emerges upon the

occurrence of {τ = s, τ = Xkj} satisfy Assumptions A1 and A2. Therefore, by

shifting the origin from t = 0 to t = s, we obtain E[T ′Ikj

m1,...,mn(tb)|τ = s, τ = Xkj] =

E[T Ikj

m1,...,mn(tb − s)]. 2

27


Lemma 3: E[T Im1,...,mn

(tb)|τ > tb] = tb + E[T Im1,...,mn

(0)].

Proof: In this case the regeneration event occurs after the LB instant. In other

words, nothing has occurred in the system until tb, therefore, the load distribution

as well as the system-information state I of the queuing system at tb is exactly the

same as that of the original queues. Next, we will analyze the random times of the

queuing system that emerges at time tb. Let T′′Im1,...,mn

(tb) be the time taken by the new

queueing system emerging at tb to serve all customers if LB is performed jointly by all

nodes at time tb, and provided that the system-information state at t = tb is specified

by I while mk ≥ 0 tasks (k = 1, . . . , n) are in the queue of the kth node at t = tb.

Therefore, by construction, E[T Im1,...,mn

(tb)|τ > tb] = E[tb + T′′Im1,...,mn

(tb)|τ > tb]. Let

the random times characterizing the queuing system emerging at tb be W′′ki, X

′′kj and

Z′′ki, all measured from tb. Note that we have maintained the same symbols (with

the addition of double primes) to characterize the same random delays. Since no

LB is done prior to tb, there is no load in transfer between nodes. Thus, based on

Convention C1, we obtain P{Z′′ki = ∞|τ > tb

}= 1. Also, for the ith customer of the

kth server, W′′ki = Wki − tb and for k, j ∈ {1, . . . , n}, j 6= k, X

′′jk = Xjk − tb. Now, it

is straightforward to show that the random times W′′ki and X

′′kj satisfy Assumptions

A1 and A2 (Refer to Lemma 7 in Section 3.3 for the detailed proof,). Consequently,

nothing has changed in the initial condition as well as the statistics of the queues

while tb units of time has elapsed. Therefore, in the new queuing system, we can

shift the origin to the right by tb units of time that gives the LB instant at t = 0.

Thus, E[T′′Im1,...,mn

(tb)|τ > tb] = E[T Im1,...,mn

(0)]. 2

Proof of Theorem 1:

Exploiting the properties of conditional expectation, we can write the AOCT as:

E[T Im1,...,mn

(tb)] = E[E[T I

m1,...,mn(tb)|τ ]

]=

∫ ∞

0

E[T Im1,...,mn

(tb)|τ = s]fτ (s)ds

28


where, fτ (t) is the probability density function (pdf) of τ . Splitting the above inte-

gral, we get

E[T Im1,...,mn

(tb)]

=

∫ tb

0

E[T Im1,...,mn

(tb)|τ = s]fτ (s)ds +

∫ ∞

tb

E[T Im1,...,mn

(tb)|τ = s]fτ (s)ds.(3.7)

Now, for s ≤ tb, we can write

E[T Im1,...,mn

(tb)|τ = s] =n∑

k=1

∑

j 6=k

E[T Im1,...,mn

(tb)|τ = s, τ = Xkj]P{

τ = Xkj|τ = s}

+n∑

k=1

E[T Im1,...,mn

(tb)|τ = s, τ = Wk1]P{

τ = Wk1|τ = s}

. (3.8)

And, note that

∫ ∞

tb

E[T Im1,...,mn

(tb)|τ = s]fτ (s)ds = E[T Im1,...,mn

(tb)|τ > tb]P{τ > tb}. (3.9)

We now apply Lemmas 1 and 2 to (3.8), Lemma 3 to (3.9), and substitute in (3.7)

to obtain

µIm1,...,mn

(tb) =

∫ tb

0

( n∑

k=1

(s + µI

m1−δ1,k,...,mn−δn,k(tb − s)

)P{

τ = Wk1|τ = s}

+n∑

k=1

∑

j 6=k

(s + µIkj

m1,...,mn(tb − s)

)P{

τ = Xkj|τ = s})

× fτ (s) ds

+(µI

m1,...,mn(0) + tb

)P{τ > tb} (3.10)

Recall that τ is an exponential random variable with rate (inverse of mean) given as

λ =∑n

k=1(λdk+

∑j 6=k λkj), while P{τ = Wk1|τ = s} =

λdk

λand P{τ = Xkj|τ = s} =

29


λkj

λ(see Appendix D for details). Therefore, Equation (3.10) becomes

µIm1,...,mn

(tb) =(µI

m1,...,mn(0) + tb

)e−λtb +

∫ tb

0

sλe−λsds

+

∫ tb

0

[ n∑

k=1

λdkµI

m1−δ1,k,...,mn−δn,k(tb − s) +

n∑

k=1

∑

j 6=k

λkjµIkj

m1,...,mn(tb − s)

]e−λsds.

(3.11)

By direct differentiation of (3.11) with respect to tb and rearranging terms we obtain

(3.6). 2

It should be noted that in order to solve the renewal equations of Theorem 1, we

first need to calculate the corresponding initial conditions, namely, µIm1,...,mn

(0). For

simplicity, we will explicitly calculate the initial condition for a two-node system.

Nonetheless, our approach demonstrates the fundamental technique to calculate the

initial condition for a multi-node system.

Initial condition: LB at tb = 0

In the case of a two-node system, equation (3.6) yields four equations involving

µ(1k1,k21)m1,m2 (tb) for ki ∈ {0, 1}. In [22], a brute-force method (based on conditional

probabilities) was used to calculate µ(1k1,k21)m1,m2 (0). Here, we solve this more efficiently

using the concept of regeneration. Without loss of generality, suppose m1 > m2.

Using (3.1) and (3.5), and with p21 = 1, we obtain

L21(0) =

bK(λd2

m1−λd1m2)

λd1+λd2

c if (k1, k2) ∈ {(1, 0), (1, 1)}bKλd2

m1

λd1+λd2

c, otherwise.(3.12)

Similarly, we can calculate L12(0). For ease of notation, let L214= L21(0) and L12

4=

L12(0). The delay in transferring load Lkj is termed as load-transfer delay from the

jth to the kth node. Recall that in Assumption A1, the transfer delay of the load

30


in transit to the kth node is also assumed to follow an exponential pdf with rate

λ̃k1. Suppose T1(r1; L12) is the waiting time at node 1 before all the tasks (including

that sent from node 2) are served, where r1 is the number of tasks at node 1 just

after LB is performed at time t = 0, i.e. r1 = m1 − L21, and L12 is the number

of tasks in transit. Let the cumulative distribution function (cdf) of T1(r1; L12) be

FT1(r1; L12; t) = P{T1(r1; L12) ≤ t},

With LB at time t = 0, the only possible events at the first node can either be

the arrival of L12 tasks sent by the second node or the service to a task by the first

node (if r1 > 0). If the regeneration event occurring at time s ∈ [0, t] is the arrival

of L12 tasks, a new queue is born at node 1 with r1 + L12 tasks, where service time

of each task still follows exponential distribution, and there is no task in transit to

the first node. On the other hand, if the regeneration event is the service to a task

at the first node, a new queue is born at node 1 with r1 − 1 tasks, where service

time of each task still follows exponential distribution, and there are L12 tasks (with

exponentially distributed transfer time) in transit to the first node . Therefore,

P{T1(r1; L12) ≤ t} =

∫ t

0

fτ (s)

[P{T1(r1 − 1; L12) ≤ t− s}λd1

λ′

+ P{T1(r1 + L12; 0) ≤ t− s} λ̃11

λ′

]ds (3.13)

where, λ′ = λd1 + λ̃11. Similarly, we can solve for P{T2(r2; L21) ≤ t}. Differentiating

(3.13) with respect to t, we get:

dFT1(r1; L12; t)

dt= −λ′FT1(r1; L12; t) + λd1FT1(r1 − 1; L12; t) + λ̃11FT1(r1 + L12; 0; t).

The initial condition FT1(0; L12; t) and FT1(r1 + L12; t) can further be decomposed

into simpler recursive equations by invoking the regeneration theory again. We have

to calculate FT2(r2; L21; t) using similar approach. For simplicity of notation, let

FTk(t)

4= FTk

(rk; Lkj; t). Now, the overall completion time is TC = max(T1, T2), and

31


recall that its average E[TC ] is µ(1k1,k21)m1,m2 (0). By exploiting the independence of T1 and

T2, the explicit solution is given as:

µ(1k1,k21)m1,m2

(0) = E[max(T1, T2)

]=

∫ ∞

0

t [fT1(t)FT2(t) + FT1(t)fT2(t)] dt, (3.14)

where fT1(t) and fT2(t) are the pdfs of T1 and T2, respectively.

3.3 System with permanent node-failures

In this Section, we consider a distributed system in which any node can fail perma-

nently with some probability [17]. One-shot LB is performed jointly by all functional

servers at time tb in order to fairly distribute the loads in the system. In addition,

each node is equipped with a backup system that performs certain duties, as men-

tioned in Section 3.1, only in the event of a permanent node failure. The concept of

regeneration is exploited to calculate the probability of successfully serving all the

tasks in a finite amount of time for a given initial load distribution. To the best

of our knowledge, issues of reliability and scheduling in a distributed systems with

random delays have been addressed together for the first time in this work.

Let T I,Fm1,...,mn

(tb;C) denote the time taken by the system to serve all the tasks

in the system if LB is performed by all functioning nodes at time tb, and the initial

system condition at t = 0 is as specified by I,F,C while mk tasks (k = 1, . . . , n) are

in the queue of the kth node. Our objective is to calculate the probability of success in

serving all tasks defined by P{T I,Fm1,...,mn

(tb;C) < ∞} for I = (10...0, 01...0, . . . , 00...1),

F = (11...1, 11...1, . . . , 11...1), and C = ([0], [0], . . . , [0]). (That is, we assume a null

information state at t = 0, that all nodes are functional, and no tasks are in transit.)

However, it turns out that it is necessary to calculate the probability of success

corresponding to arbitrary initial system conditions.

32


3.3.1 Renewal equations

For brevity, we will consider a two-node system (n = 2); however, our approach can

be extended in a straightforward way to a multi-node model. Let RI,Fm1,m2

(tb;C)4=

P{T I,Fm1,m2

(tb;C) < ∞}. Trivially, RI,F0,0

(tb; [0], [0]

)= 1 for all tb ≥ 0 and for any I and

F, since there is no task to be serviced in this case. Also, RI,(0f12,f210)m1,m2

(tb; [0], [0]

)= 0

if either m1 > 0 or m2 > 0, for any I and for fij ∈ {0, 1} (i, j ∈ {0, 1}), since

in this case both the nodes have already failed while there is at least one unserved

task in the system. Our main results are the renewal equations characterizing the

probability RI,Fm1,m2

(tb;C), which are given below in the form of difference-differential

equations.

Theorem 2: For n = 2,m1,m2 ∈ ZZ+, i1, i2 ∈ {0, 1} and tb ≥ 0, the probability

RI,Fm1,m2

(tb;C) satisfies equations (3.15)–(3.23) shown below:

d

dtbR(1i1,i21),(11,11)

m1,m2

(tb; ([0], [0])

)= −λR(1i1,i21),(11,11)

m1,m2

(tb; ([0], [0])

)

+λd1R(1i1,i21),(11,11)m1−1,m2

(tb; ([0], [0])

)+ λd2R

(1i1,i21),(11,11)m1,m2−1

(tb; ([0], [0])

)

+λ21R(1i1,11),(11,11)m1,m2

(tb; ([0], [0])

)+ λ12R

(11,i21),(11,11)m1,m2

(tb; ([0], [0])

)

+λf1R(1i1,i21),(01,11)0,m2

(tb; ([0], [1 m1])

)+λf2R

(1i1,i21),(11,10)m1,0

(tb; ([1 m2], [0])

), (3.15)

d

dtbR

(1i1,i21),(01,11)0,m2

(tb; ([0], [1 m1])

)= −λ′R(1i1,i21),(01,11)

0,m2

(tb; ([0], [1 m1])

)

+ λd2R(1i1,i21),(01,11)0,m2−1

(tb; ([0], [1 m1])

)+ λ21R

(1i1,11),(01,11)0,m2

(tb; ([0], [1 m1])

)

+ λ12R(11,i21),(01,11)0,m2

(tb; ([0], [1 m1])

)+ λF

21R(1i1,i21),(01,01)0,m2

(tb; ([0], [1 m1])

)

+ λ̃21R(1i1,i21),(01,01)0,m1+m2

(tb; ([0], [0])

), (3.16)

33


d

dtbR

(1i1,i21),(11,10)m1,0

(tb; ([1 m2], [0])

)= −λ′′R(1i1,i21),(11,10)

m1,0

(tb; ([1 m2], [0])

)

+ λd1R(1i1,i21),(11,10)m1−1,0

(tb; ([1 m2], [0])

)+ λ21R

(1i1,11),(11,10)m1,0

(tb; ([1 m2], [0])

)

+ λ12R(11,i21),(11,10)m1,0

(tb; ([1 m2], [0])

)+ λF

12R(1i1,i21),(10,10)m1,0

(tb; ([1 m2], [0])

)

+ λ̃12R(1i1,i21),(10,10)m1+m2,0

(tb; ([0], [0])

), (3.17)

R(10,i21),(01,11)0,m2

(tb; ([0], [1 m1])

)= R

(11,i21),(01,11)0,m2

(tb; ([0], [1 m1])

)(3.18)

R(1i1,01),(11,10)m1,0

(tb; ([1 m2], [0])

)= R

(1i1,11),(11,10)m1,0

(tb; ([1 m2], [0])

), (3.19)

R(1i1,i21),(01,01)0,m2

(tb; ([0], [1 m1])

)=

λd2

λd2 + λf2 + λ̃21

R(1i1,i21),(01,01)0,m2−1

(tb; ([0], [1 m1])

)

+λ̃21

λd2 + λf2 + λ̃21

R(1i1,i21),(01,01)0,m1+m2

(tb; ([0], [0])

), (3.20)

R(1i1,i21),(10,10)m1,0

(tb; ([1 m2], [0])

)=

λd1

λd1 + λf1 + λ̃12

R(1i1,i21),(10,10)m1−1,0

(tb; ([1 m2], [0])

)

+λ̃12

λd1 + λf1 + λ̃12

R(1i1,i21),(10,10)m1+m2,0

(tb; ([0], [0])

), (3.21)

R(1i1,i21),(01,01)0,m1+m2

(tb; ([0], [0])

)=

(λd2

λd2 + λf2

)m1+m2

, (3.22)

and

R(1i1,i21),(10,10)m1+m2,0

(tb; ([0], [0])

)=

(λd1

λd1 + λf1

)m1+m2

, (3.23)

where λ =∑2

k=1(λdk+ λfk

+∑

j 6=k λkj), λ′ = λd2 + λ21 + λ12 + λF

21 + λ̃21 + λf2 , λ′′ =

λd1 + λ21 + λ12 + λF12 + λ̃12 + λf1 and mk − 1 is set to 0 when mk = 0.

34


Before proving Theorem 2, we first present and prove Lemmas 4–7. For this

purpose, on {τ ≤ tb}, we define T ′I,Fm1,...,mn

(tb;C) as the time taken by the new queueing

system emerging at τ to serve all the tasks in the system if LB is performed by all

functioning nodes at time tb provided that the system condition at t = τ is specified

by I,F,C while mk ≥ 0 tasks (k = 1, . . . , n) are in the queue of the kth node.

Lemma 4: For s ≤ tb, P{

T(10,01),(11,11)m1,m2

(tb; ([0], [0])

)< ∞

∣∣τ = s, τ = W11

}=

P{T (10,01),(11,11)m1−1,m2

(tb − s; ([0], [0])

)< ∞}.

Proof: We begin by noting that the regeneration event {τ = s, τ = W11} is

precisely service to the first task at the first node before any other activity takes

place in the queuing system. Thus, no failure-notice has been sent and no load-

redistribution (required only upon failure) has been made in the queuing system

that emerges prior to time t = s. Therefore, according to Convention C1, we obtain

P{XF ′

kj = ∞|τ = s, τ = W11

}= 1 and P

{Z′ki = ∞|τ = s, τ = W11

}= 1. Next, we

observe that the system condition of the queues that emerge upon the occurrence

of {τ = s, τ = W11} becomes I = (10, 01),F = (11, 11),C = ([0], [0]),m1 − 1

tasks in the queue of the first node, and m2 tasks in the queue of the second

node. Therefore, by construction P{

T(10,01),(11,11)m1,m2

(tb; ([0], [0])

)< ∞|τ = s, τ =

W11

}= P

{τ + T

′(10,01),(11,11)m1−1,m2

(tb; ([0], [0])

)< ∞|τ = s, τ = W11

}. The proof is com-

plete once we establish that P{

T′(10,01),(11,11)m1−1,m2

(tb; ([0], [0])

)< ∞|τ = s, τ = W11

}=

P{T (10,01),(11,11)m1−1,m2

(tb − s; ([0], [0])

)< ∞}.

Next, we observe that by definition, W′k1 = Wk1−τ, Y

′k = Yk−τ and X

′jk = Xjk−τ

for k, j ∈ {1, 2}, j 6= k. Moreover, it is elementary to show (see Appendix B for proof)

that the conditional distribution of W′21 is

P{

W′21 ≤ t|τ = s, τ = W11

}=

(1− e−λd2

t)u(t), (3.24)

where u(.) is the unit step function. Similarly, it can be show that P{

Y′k ≤ t|τ =

35


s, τ = W11

}=

(1−e−λfk

t)u(t), P

{X

′jk ≤ t|τ = s, τ = W11

}=

(1−e−λjkt

)u(t), and

for all j ≥ 2 and k ∈ {1, 2}, W′kj and Wkj have identical distributions. Therefore,

conditional upon the occurrence of {τ = s, τ = W11}, all random times of the newly

emerging queuing system satisfy Assumption A1.

The conditional independence of W′21 and Y

′1 is proved in Appendix C. Similarly,

it can also be shown that conditional upon the occurrence of {τ = s, τ = W11},W

′kj, Y

′k , and X

′jk are mutually independent. Therefore, upon the occurrence of

{τ = s, τ = W11}, all random times of the emerging queuing system also satisfy

Assumption A2.

In summary, we have shown that conditional on the occurrence of {τ = s, τ =

W11}, the random times characterizing the queuing system at time s satisfy As-

sumptions A1 and A2. Therefore, by shifting the time origin from t = 0 to t =

s, we can think of the emergent queuing system as the original queuing system

but with m1 − 1 tasks in the queue of the first server, while other system initial

conditions remain the same. In addition, due to the shift of origin, the LB in-

stant is now at tb − s units of time from the new origin. Therefore, we conclude

that P{

T′(10,01),(11,11)m1−1,m2

(tb; ([0], [0])

)< ∞|τ = s, τ = W11

}= P{T (10,01),(11,11)

m1−1,m2

(tb −

s; ([0], [0]))

< ∞}, which completes the proof of Lemma 4. 2


T(10,01),(11,11)m1,m2

(tb; ([0], [0])

)< ∞|τ = s, τ = X12

}=

P{T (11,01)),(11,11)m1,m2

(tb − s; ([0], [0])

)< ∞}.

Proof: Note that upon the occurrence of {τ = s, τ = X12}, the system information

state of the emerging queues becomes I = (11, 01), while other system conditions

remain same as that of the original queues. Therefore, P{

T(10,01),(11,11)m1,m2

(tb; ([0], [0])

)<

∞|τ = s, τ = X12

}= P

{τ + T

′(11,01),(11,11)m1,m2

(tb; ([0], [0])

)< ∞|τ = s, τ = X12

}.

Using similar analysis as in Lemma 4, we can show that upon the occurrence of

{τ = s, τ = X12}, the random times characterizing the emerging queuing system

36


satisfy Assumptions A1 and A2. Therefore, P{

T′(11,01),(11,11)m1,m2

(tb; ([0], [0])

)< ∞|τ =

s, τ = X12

}= P{T (11,01)),(11,11)

m1,m2

(tb − s; ([0], [0])

)< ∞}. 2


T(10,01),(11,11)m1,m2

(tb; ([0], [0])

)< ∞|τ = s, τ = Y1

}=

P{T (10,01),(01,11)0,m2

(tb − s; ([0], [1 m1])

)< ∞}.

Proof: In this case, the regeneration event is the failure of the first node. Thus,

according to the one-shot LB policy (refer to Section 3.1) , the occurrence of {τ =

s, τ = Y1} triggers the back-up system of the first node to send a failure notice as

well as m1 tasks from its queue to the second node. Since the failure-notice is in

transit at time s, the system function state of the emergent queuing system becomes

F = (01, 11); while the network state at time s becomes C = ([0], [1 m1]) due to

a group of m1 tasks that are in transit to the second node. Further, at t = s, the

system information state is I = (10, 01) while there are m2 tasks at the second node

and there is no task at the first node. Therefore,

P{

T (10,01),(11,11)m1,m2

(tb; ([0], [0])

)< ∞|τ = s, τ = Y1

}

= P{

τ + T′(10,01),(01,11)0,m2

(tb; ([0], [1 m1])

)< ∞|τ = s, τ = Y1

}. (3.25)

Next, we analyze the random times characterizing the queues that emerge upon the

occurrence of {τ = s, τ = Y1}. In light of Assumption A1, XF ′21 and Z

′21 follow

exponential distributions with rates λF21 and λ̃21, respectively. On the other hand,

according to C1, P{W

′1i = ∞|τ = s, τ = Y1

}= 1 for the ith task of the first node.

As no failure-notice has been sent from the second node to the first node and no

load-redistribution has been made by the second node prior to time t = s, we can

use Convention C1 to write P{XF ′

12 = ∞|τ = s, τ = Y1

}= 1 and P

{Z′1i = ∞|τ =

s, τ = Y1

}= 1 for the ith group of load transferred to the first node. Similar

to the proof of Lemma 4, the conditional distributions of W′2j, Y

′2 , X

′kj for j 6= k,

conditional on the occurrence of {τ = s, τ = Y1}, can each be shown to satisfy

37


Assumptions A1 and A2, thereby justifying the notion of regeneration of queues at

time τ . Therefore, when τ = s, we can shift the time origin from t = 0 to t = s and

obtain P{

T′(10,01),(01,11)0,m2

(tb; ([0], [1 m1])

)< ∞|τ = s, τ = Y1

}= P{T (10,01),(01,11)

0,m2

(tb −

s; ([0], [1 m1]))

< ∞}, which in conjunction with (3.25) completes the proof. 2

Lemma 7: P{

T(10,01),(11,11)m1,m2

(tb; ([0], [0])

)< ∞|τ > tb

}= P{T (10,01),(11,11)

m1,m2

(0; ([0], [0])

)<

∞}.

Proof: The occurrence of the event {τ > tb} implies that the system condition of

the queues at time tb is exactly the same as the initial system condition of the original

queues. Let for {τ > tb}, T ′′I,Fm1,...,mn

(tb;C) be the time taken by the new queueing

system emerging at tb to serve all tasks if LB is performed by all functioning nodes

at time tb, and provided that the system condition at t = tb is specified by I,F,C

while mk ≥ 0 tasks are in the queue of the kth node. Therefore, by definition,

P{

T(10,01),(11,11)m1,m2

(tb; ([0], [0])

)< ∞|τ > tb

}= P

{tb + T

′′(10,01),(11,11)m1,m2

(tb; ([0], [0])

)<

∞|τ > tb

}. Let the random times characterizing the queuing system emerging at

tb be W′′ki, Y

′′k , X

′′kj, X

F ′′kj and Z

′′ki, all measured from tb. Clearly, no failure-notice has

been sent and no customer-redistribution has been made in the queuing system that

emerges prior to time t = tb. Therefore, based on Convention C1, P{XF ′′

kj = ∞|τ >

tb}

= 1 and P{Z′′ki = ∞|τ > tb

}= 1. For the ith task of the kth node, W

′′ki = Wki−tb

and for k, j ∈ {1, 2}, j 6= k, Y′′k = Yk− tb and X

′′jk = Xjk− tb. Based on Assumptions

A1 and A2 it is straightforward to show that ,

P{

W′′ki ≤ t|τ > tb

}=

(1− e−λdk

t)u(t).

P{

W′′ki ≤ t1, Y

′′k ≤ t2|τ > tb

}= P

{W

′′ki ≤ t1|τ > tb

}P{

Y′′k ≤ t2|τ > tb

}.

Similarly, conditional on the occurrence of {τ > tb}, the conditional distributions

of W′′ki, Y

′′k , X

′′kj, XF ′′

kj and Z′′ki can be shown to satisfy A1 and A2. Consequently,

38


nothing has changed in the initial condition as well as the statistics of the queues

while tb units of time has elapsed. Therefore, we can shift the origin by tb units

of time, which makes the LB instant at t = 0 for the new queuing system. Thus,

P{

T′′(10,01),(11,11)m1,m2 (tb; ([0], [0])

)< ∞|τ > tb

}= P{T (10,01),(11,11)

m1,m2

(0; ([0], [0])

)< ∞}. 2

Proof of Theorem 2:

We begin by proving Equation (3.15) for the case I = (10, 01). Observe that

P{T (10,01),(11,11)m1,m2

(tb; [0], [0]

)< ∞}=

∫ tb

0

P{

T (10,01),(11,11)m1,m2

(tb; [0], [0]

)<∞|τ = s

}fτ (s)ds

+

∫ ∞

tb

P{

T (10,01),(11,11)m1,m2

(tb; [0], [0]

)<∞|τ = s

}fτ (s)ds. (3.26)

We can write the first integrand on the right side of (3.26) as

P{

T (10,01),(11,11)m1,m2

(tb; [0], [0]

)< ∞|τ = s

}=

2∑

k=1

P{

T (10,01),(11,11)m1,m2

(tb; [0], [0]

)< ∞|τ = s, τ = Wk

}P{τ = Wk|τ = s}

+2∑

k=1

∑

j 6=k

P{

T (10,01),(11,11)m1,m2

(tb; [0], [0]

)< ∞|τ = s, τ = Xkj

}P{τ = Xkj|τ = s}

+2∑

k=1

P{

T (10,01),(11,11)m1,m2

(tb; [0], [0]

)< ∞|τ = s, τ = Yk

}P{τ = Yk|τ = s}.(3.27)

Also, note that

∫ ∞

tb

P{

T (10,01),(11,11)m1,m2

(tb; [0], [0]

)< ∞|τ = s

}fτ (s)ds

= P{

T (10,01),(11,11)m1,m2

(tb; [0], [0]

)< ∞|τ > tb

}P{τ > tb}. (3.28)

We now apply Lemmas 4–6 to (3.27), Lemma 7 to (3.28), and substitute in (3.26) to

39


obtain

R(10,01),(11,11)m1,m2

(tb; [0], [0]

)=

∫ tb

0

[R

(10,01),(11,11)m1−1,m2

(tb − s; [0], [0]

)P{τ = W1|τ = s}

+ R(10,01),(11,11)m1,m2−1

(tb − s; [0], [0]

)P{τ = W2|τ = s}

+ R(10,11),(11,11)m1,m2

(tb − s; [0], [0]

)P{τ = X21|τ = s}

+ R(11,01),(11,11)m1,m2

(tb − s; [0], [0]

)P{τ = X12|τ = s}

+ R(10,01),(01,11)0,m2

(tb − s; [0], [1 m1]

)P{τ = Y1|τ = s}

+ R(10,01),(11,10)m1,0

(tb − s; [1 m2], [0]

)P{τ = Y2|τ = s}

]fτ (s) ds

+ R(10,01),(11,11)m1,m2

(0; [0], [0]

)P{τ > tb} (3.29)

Next, observe that in conjunction with A1, A2, and C1, it is straightforward to show

that fτ (t) = λe−λtu(t), where λ =∑2

k=1(λdk+ λfk

+∑

j 6=k λkj). Moreover, we can

show that (see Appendix D for proof) P{τ = Wk1|τ = s} =λdk

λ, P{τ = Xkj|τ =

s} =λkj

λ, and P{τ = Yk|τ = s} =

λfk

λ. Therefore, Equation (3.29) becomes

R(10,01),(11,11)m1,m2

(tb; [0], [0]

)= e−λtbR(10,01),(11,11)

m1,m2

(0; [0], [0]

)

+

∫ tb

0

[λd1R

(10,01),(11,11)m1−1,m2

(tb − s; [0], [0]

)+ λd2R

(10,01),(11,11)m1,m2−1

(tb − s; [0], [0]

)

+ λ21R(10,11),(11,11)m1,m2

(tb − s; [0], [0]

)+ λ12R

(11,01),(11,11)m1,m2

(tb − s; [0], [0]

)

+ λf1R(10,01),(01,11)0,m2

(tb − s; [0], [1 m1]

)+ λf2R

(10,01),(11,10)m1,0

(tb − s; [1 m2], [0]

) ]e−λs ds.

(3.30)

Finally, by differentiating (3.30) with respect to tb and rearranging terms we obtain

(3.15).

Proofs of Equations (3.15)–(3.17) for any given I can be achieved in a similar fash-

ion. Next, recall that according to the one-shot LB policy, no LB action is taken by a

faulty node, say the kth faulty node, whereby the information state vector ik becomes

redundant and does not effect the success probability. This observation leads to the

40


proof of (3.18) and (3.19). Also, note that in the case of a two-node model, whenever a

failure is detected, viz., F = (10, 10) or F = (01, 01), no LB action is taken by the only

remaining functional node. Consequently, tb and I have no role in determining the

success probability in these cases (but we do not drop tb and I from the notation just

to maintain consistency). Therefore, we obtain Equations (3.20)–(3.23) by exploiting

the following conditional probabilities, which can be proved similarly to Lemmas 4–

7: P{

TI,(01,01)0,m2

(tb; ([0], [1 m1])

)< ∞|τ = W21

}= P{T I,(01,01)

0,m2−1

(tb; ([0], [1 m1])

)< ∞},

P{

T(I,(01,01)0,m2

(tb; ([0], [1 m1])

)< ∞|τ = Z21

}= P{T I,(01,01)

0,m1+m2

(tb; ([0], [0])

)< ∞} and

P{

T(I,(01,01)0,m2

(tb; ([0], [1 m1])

)< ∞|τ = Y2

}= 0. This completes the proof of Theo-

rem 2. 2

3.3.2 Calculation of the initial condition

In order to solve the renewal equations (3.15)–(3.17) of Theorem 2, we first need

to calculate their initial conditions corresponding to tb = 0. In this case, LB is

performed at time 0 by each functional node, say the kth node, based on its infor-

mation state ik, so that Ljk(0) ≥ 0 tasks are transferred to the jth functional node.

Let T̃Fr1,r2

(C) be the time to serve all the tasks in the system given that the system

function state and the network state at t = 0 are F and C, respectively, while rk

tasks are in the queue of the kth node at t = 0. For simplicity, let Ljk4= Ljk(0).

Then, by construction, RI,(11,11)m1,m2

(0; ([0], [0])

)= P{T̃ (11,11)

r1,r2

(([1 L12], [1 L21])

)< ∞},

RI,(01,11)0,m2

(0; ([0], [1 m1])

)= P{T̃ (01,11)

0,r2

(([1 L12], [1 m1])

)< ∞} and

RI,(11,10)m1,0

(0; ([1 m2], [0])

)= P{T̃ (11,10)

r1,0

(([1 m2], [1 L21])

)< ∞}, where rk = mk − Ljk.

41


Theorem 3: For n = 2, r1, r2 ∈ ZZ+, gk ≥ 0 and cki > 0, the probability

P{T̃Fr1,r2

(C) < ∞} satisfies the relations shown in (3.31)–(3.35):

P{T̃ (11,11)r1,r2

(([g1 c11 . . . c1g1 ], [g2 c21 . . . c2g2 ])

)< ∞}

= P{T̃ (11,11)r1−1,r2

(([g1 c11 . . . c1g1 ], [g2 c21 . . . c2g2 ])

)< ∞}λd1

λ

+ P{T̃ (11,11)r1,r2−1

(([g1 c11 . . . c1g1 ], [g2 c21 . . . c2g2 ])

)< ∞}λd2

λ

+ P{T̃ (01,11)0,r2

(([g1 c11 . . . c1g1 ], [g2+1 c21 . . . c2g2 r1])

)< ∞}λf1

λ

+ P{T̃ (11,10)r1,0

(([g1+1 c11 . . . c1g1 r2], [g2 c21 . . . c2g2 ])

)< ∞}λf2

λ

+

g1∑i=1

P{T̃ (11,11)r1+c1i,r2

(([g1−1 c11 . . . c1(i−1) c1(i+1) . . . c1g1 ], [g2 c21 . . . c2g2 ])

)< ∞} λ̃1i

λ

+

g2∑j=1

P{T̃ (11,11)r1,r2+c2j

(([g1 c11 . . . c1g1 ], [g2−1 c21 . . . c2(j−1) c2(j+1) . . . c2g2 ])

)< ∞} λ̃2j

λ,

(3.31)

P{T̃ (01,11)0,r2

(([g1 c11 . . . c1g1 ], [g2 c21 . . . c2g2 ])

)< ∞}

= P{T̃ (01,11)0,r2−1

(([g1 c11 . . . c1g1 ], [g2 c21 . . . c2g2 ])

)< ∞}λd2

λ′

+

g1∑i=1

P{T̃ (01,11)0,r2

(([g1−1 c11 . . . c1(i−1) c1(i+1) . . . c1g1 ], [g2+1 c21 . . . c2g2 c1i])

)< ∞} λ̃1i

λ′

+

g2∑j=1

P{T̃ (01,11)0,r2+c2j

(([g1 c11 . . . c1g1 ], [g2−1 c21 . . . c2(j−1) c2(j+1) . . . c2g2 ])

)< ∞} λ̃2j

λ′,

(3.32)

42


P{T̃ (11,10)r1,0

(([g1 c11 . . . c1g1 ], [g2 c21 . . . c2g2 ])

)< ∞}

= P{T̃ (11,10)r1−1,0

(([g1 c11 . . . c1g1 ], [g2 c21 . . . c2g2 ])

)< ∞}λd1

λ′′

+

g1∑i=1

P{T̃ (11,10)r1+c1i,0

(([g1−1 c11 . . . c1(i−1) c1(i+1) . . . c1g1 ], [g2 c21 . . . c2g2 ])

)< ∞} λ̃1i

λ′′

+

g2∑j=1

P{T̃ (11,10)r1,0

(([g1+1 c11 . . . c1g1c2j], [g2−1 c21 . . . c2(j−1) c2(j+1) . . . c2g2 ])

)< ∞} λ̃2j

λ′′,

(3.33)

P{T̃ (01,11)0,r2

(([0], [0])

)< ∞} =

(λd2

λd2 + λf2

)r2

, (3.34)

and

P{T̃ (11,10)r1,0

(([0], [0])

)< ∞} =

(λd1

λd1 + λf1

)r1

, (3.35)

where λ̃−1ki = θcki, λ =

∑2k=1(λdk

+ λfk) +

∑g1

i=1 λ̃1i +∑g2

j=1 λ̃2j, λ′ = λd2 + λf2 +

∑g1

i=1 λ̃1i +∑g2

j=1 λ̃2j, λ′′ = λd1 +λf1 +

∑g1

i=1 λ̃1i +∑g2

j=1 λ̃2j and rk−1 is set to 0 when

rk = 0.

Proof of Theorem 3:

Consider Equation (3.31) with g1=1, c11=L12, g2=1 and c21=L21. We define

the regeneration random variable as ξ = min(W11,W21, Y1, Y2, Z11, Z21). Note that

all the random delays characterizing the queueing system at time 0 are assumed to

satisfy A1, A2 and C1. It is straightforward to show that ξ is an exponential r.v.

43


with rate λ =∑2

k=1 λdk+ λfk

+ λ̃k1. Observe that

P{T̃ (11,11)r1,r2

(([1 L12], [1 L21])

)< ∞}

=

∫ ∞

0

[ 2∑

k=1

P{

T̃ (11,11)r1,r2

(([1 L12], [1 L21])

)< ∞|ξ = s, ξ = Wk1

}P{ξ = Wk1|ξ = s}

+2∑

k=1

P{

T̃ (11,11)r1,r2

(([1 L12], [1 L21])

)< ∞|ξ = s, ξ = Yk

}P{ξ = Yk|ξ = s}

+2∑

k=1

P{

T̃ (11,11)r1,r2

(([1 L12], [1 L21])

)< ∞|ξ = s, ξ = Zk1

}P{ξ = Zk1|ξ = s}

]fξ(s)ds.

(3.36)

We can show (similarly to the proofs of Lemmas 4–7) that

P{

T̃ (11,11)r1,r2

(([1 L12], [1 L21])

)< ∞|ξ = s, ξ = W11

}= P{T̃ (11,11)

r1−1,r2

(([1 L12], [1 L21])

)< ∞}

P{

T̃ (11,11)r1,r2

(([1 L12], [1 L21])

)< ∞|ξ = s, ξ = W21

}= P{T̃ (11,11)

r1,r2−1

(([1 L12], [1 L21])

)< ∞}

P{

T̃ (11,11)r1,r2

(([1 L12], [1 L21])

)< ∞|ξ = s, ξ = Y1

}= P{T̃ (11,11)

0,r2

(([1 L12], [1 L21 r1])

)< ∞}

P{

T̃ (11,11)r1,r2

(([1 L12], [1 L21])

)< ∞|ξ = s, ξ = Y2

}= P{T̃ (11,11)

r1,0

(([1 L12 r2], [1 L21])

)< ∞}

P{

T̃ (11,11)r1,r2

(([1 L12], [1 L21])

)< ∞|ξ = s, ξ = Z11

}= P{T̃ (11,11)

r1+L12,r2

(([0], [1 L21])

)< ∞}

P{

T̃ (11,11)r1,r2

(([1 L12], [1 L21])

)< ∞|ξ = s, ξ = Z21

}= P{T̃ (11,11)

r1,r2+L21

(([1 L12], [0])

)< ∞}

Applying the above identities in (3.36) and using P{τ = Wk1|ξ = s} =λdk

λ, P{τ =

Yk|ξ = s} =λfk

λand P{τ = Zk1|ξ = s} = λ̃k1

λ, we obtain (3.31) since

∫∞0

fξ(s)ds = 1.

Similarly, we can prove Theorem 3 for (3.32)–(3.35). Note that the one-shot LB

policy requires that whenever a group of tasks arrive at a failed node, the back-up

system of the failed node should immediately send the group of tasks back to the

functional node. To this end, the following conditional probabilities are true and are

44


required to prove (3.32) and (3.33):

P{T̃

(11,10)r1,0

(([g1 c11 . . . c1g1 ], [g2 c21 . . . c2g2 ])

)< ∞|ξ = s, ξ = Z2j

}

= P{T̃ (11,10)r1,0

(([g1+1 c11 . . . c1g1 c2j], [g2−1 c21 . . . c2(j−1) c2(j+1) . . . c2g2 ])

)< ∞}

P{T̃

(01,11)0,r2

(([g1 c11 . . . c1g1 ], [g2 c21 . . . c2g2 ])

)< ∞|ξ = s, ξ = Z1i

}

= P{T̃ (01,11)0,r2

(([g1−1 c11 . . . c1(i−1) c1(i+1) . . . c1g1 ], [g2+1 c21 . . . c2g2 c1i])

)< ∞}

for j ∈ {1, ..., g2} and i ∈ {1, ..., g1}. 2

3.4 System with recoverable node-failures

Distributed systems (like “SETI at Home” [13]) utilize dynamic sets of remote nodes,

where nodes may join and leave the system in a random fashion. In fact, some of

these remote nodes are available only when they are not being used by their owners.

In addition, any node may randomly get disconnected from the Internet. In this

section, any unavailable node is considered to have failed, while a participating node

is considered to be functional. In particular, we consider that any node in the

system randomly fluctuates between a “failed” (or “down”) and “functional” (or

“up”) states.

We present two different LB policies: namely, a proactive LB policy called LBP-1

and a reactive LB policy called LBP-2 [21]. Given an initial load distribution, the

policy LBP-1 takes a proactive LB action at time t = 0 by predicting the failure

times, recovery times and the random load transfer times. In the policy LBP-2, the

LB action is performed at time t = 0 by considering the random load transfer times,

while disregarding the failure times and recovery times of nodes. Nonetheless, at

every occurrence of a node failure, a reactive scheduling action is taken to compensate

the failure.

45


3.4.1 Analysis of proactive LB policy: LBP-1

Consider a two-node distributed system. The policy LBP-1 allows a one-time and

a one-way load transfer between the nodes by predicting the failure times, recovery

times and the random load transfer times. No other balancing action is taken after-

wards. At time t = 0, both nodes are assumed to be functional and only one of the

nodes, say the kth node transfers Ljk tasks to the jth node, where

Ljk = bKmkc, (3.37)

where K ∈ [0, 1] is the LB gain. No other load transfer occurs afterwards and each

node will process its remaining tasks as well as the tasks transferred to it. The

optimal policy is to choose the sender node k and the receiver node j and calculate

the LB gain K that will minimize the AOCT. For the rest of this section, without

loss of generality, we will suppose that node 1 is the sender node.

Simplification of system-information and system-function states

We will assume that the delays in transferring LIs and failure notices between the

nodes are negligible as compared to the total computing time of the tasks as well

as to the delays in transferring the actual load between the nodes. Thus, nodes can

periodically communicate without incurring significant overheads, whereby each node

gets continually (instantly) informed of the initial load-states as well as the functional

states of other nodes. In other words, at any given time, we assume that every node

in the system has the current load information as well as the current function state

(either functional or faulty) of all other nodes. More precisely, i1 = i2 = . . . =

in = [1, . . . , 1], while f1 = f2 = . . . = fn. Therefore, under this assumption, we can

completely omit the system-information state from our analysis and also represent the

system-function state by a n-bit vector f that is common for all nodes. For example,

46


in a two-node system, the possible system functions states are: [0, 0], [0, 1], [1, 0] or

[1, 1].

In addition, it should be noted that there is no need to delay the LB instant

beyond time t = 0 as every node already has an up-to-date system information at

time t = 0 (due to negligible communication delays). Therefore, we will assume that

all the nodes take a synchronized LB action at tb = 0. These assumptions significantly

simplify our analysis as we will now obtain simple difference equations (instead of

difference-differential equations as in the previous two sections) that characterize the

AOCT.

Expected completion time and cumulative distribution function

Let T fr1,r2

(C) denote the time taken by the system to serve all the tasks in the system

if LB is performed by all functional nodes at time t = 0, and the initial system

condition at t = 0 is as specified by f ,C while rk tasks (k = 1, 2) are in the queue

of the kth node. According to LBP-1, LB is performed at t = 0 based on (3.37).

Therefore, at t = 0, r1 = m1 − L21, r2 = m2, C = ([0], [1 L21]) and f = [1, 1]. Our

objective is to calculate: E[T 1,1m1−L21,m2

([0], [1 L21]). Let µfr1,r2

(C)4= E[T f

r1,r2(C)] and

and pfr1,r2

(t;C)4= P{T f

r1,r2(C) ≤ t}. Clearly, µf

0,0([0], [0]) = 0 and pf0,0(t; [0], [0]) = 1

(for all t ≥ 0) since there is no task at any of the nodes and there is no task in the

network.

Theorem 4: For n = 2, r1, r2 ∈ ZZ+, L > 0 and t ∈ [0,∞), the AOCT µfr1,r2

(C)

and the cumulative distribution function pfr1,r2

(t;C) satisfies the four matrix relations

47


shown below:

µ0,0r1,r2

([0], [1 L]

)

µ0,1r1,r2

([0], [1 L]

)

µ1,0r1,r2

([0], [1 L]

)

µ1,1r1,r2

([0], [1 L]

)

=

1 −λs2/λA −λs1/λA 0

−λf2/λB 1 0 −λs1/λB

−λf1/λC 0 1 −λs2/λC

0 −λf1/λD −λf2/λD 1

−1

×

1λA

+ λ̃21

λAµ0,0

r1,r2+L

([0], [0]

)

1λB

+λd2

λBµ0,1

r1,r2−1

([0], [1 L]

)+ λ̃21

λBµ0,1

r1,r2+L

([0], [0]

)

1λC

+λd1

λCµ1,0

r1−1,r2([0], [1 L]

)+ λ̃21

λCµ1,0

r1,r2+L

([0], [0]

)

1λD

+λd1

λDµ1,1

r1−1,r2

([0], [1 L]

)+

λd2

λDµ1,1

r1,r2−1

([0], [1 L]

)+ λ̃21

λDµ1,1

r1,r2+L

([0], [0]

)

,

µ0,0r1,r2

([0], [0]

)

µ0,1r1,r2

([0], [0]

)

µ1,0r1,r2

([0], [0]

)

µ1,1r1,r2

([0], [0]

)

=

1 −λs2/λ′A −λs1/λ

′A 0

−λf2/λ′B 1 0 −λs1/λ

′B

−λf1/λ′C 0 1 −λs2/λ

′C

0 −λf1/λ′D −λf2/λ

′D 1

−1

×

1λ′A1

λ′B+

λd2

λ′Bµ0,1

r1,r2−1

([0], [0]

)

1λ′C

+λd1

λ′Cµ1,0

r1−1,r2([0], [0]

)

1λ′D

+λd1

λ′Dµ1,1

r1−1,r2

([0], [0]

)+

λd2

λ′Dµ1,1

r1,r2−1

([0], [0]

)

,

48


d

dt

p1,1r1,r2

(t; [0], [1 L]

)

p0,1r1,r2

(t; [0], [1 L]

)

p1,0r1,r2

(t; [0], [1 L]

)

p0,0r1,r2

(t; [0], [1 L]

)

=

−λD λf1 λf2 0

λs1 −λB 0 λf2

λs2 0 λC λf1

0 λs2 λs1 λA

p1,1r1,r2

(t; [0], [1 L]

)

p0,1r1,r2

(t; [0], [1 L]

)

p1,0r1,r2

(t; [0], [1 L]

)

p0,0r1,r2

(t; [0], [1 L]

)

+

λd1p1,1r1−1,r2

(t; [0], [1 L]

)+ λd2p

1,1r1,r2−1

(t; [0], [1 L]

)+ λ̃21p

1,1r1,r2+L

(t; [0], [0]

)

λd2p0,1r1,r2−1

(t; [0], [1 L]

)+ λ̃21p

0,1r1,r2+L

(t; [0], [0]

)

λd1p1,0r1−1,r2

(t; [0], [1 L]

)+ λ̃21p

1,0r1,r2+L

(t; [0], [0]

)

λ̃21p0,0r1,r2+L

(t; [0], [0]

),

,

d

dt

p1,1r1,r2

(t; [0], [0]

)

p0,1r1,r2

(t; [0], [0]

)

p1,0r1,r2

(t; [0], [0]

)

p0,0r1,r2

(t; [0], [0]

)

=

−λ′D λf1 λf2 0

λs1 −λ′B 0 λf2

λs2 0 λ′C λf1

0 λs2 λs1 λ′A

p1,1r1,r2

(t; [0], [0]

)

p0,1r1,r2

(t; [0], [0]

)

p1,0r1,r2

(t; [0], [0]

)

p0,0r1,r2

(t; [0], [0]

)

+

λd1p1,1r1−1,r2

(t; [0], [0]

)+ λd2p

1,1r1,r2−1

(t; [0], [0]

)

λd2p0,1r1,r2−1

(t; [0], [0]

)

λd1p1,0r1−1,r2

(t; [0], [0]

)

0,

,

where λA = λs1 + λs2 + λ̃21, λB = λd2 + λs1 + λf2 + λ̃21, λC = λd1 + λf1 + λs2 + λ̃21,

λD = λd1+λd2+λf1+λf2+λ̃21, λ′A = λs1+λs2 , λ′B = λd2+λs1+λf2 , λ′C = λd1+λf1+λs2 ,

λ′D = λd1 + λd2 + λf1 + λf2 , λ̃21 = θL, while as per Convention C1 λdkas well as

rk − 1 are both set to 0 whenever rk = 0.

Proof of Theorem 4:

49


Consider the case when the system initial condition is given by f = [0, 0] and

C = ([0], [1 L]). Based on Convention C1, the regeneration random variable can

be written as: τ = min(S1, S2, Z21), where Sk is the recovery time of the kth node

and Z21 is the random transfer delay of L tasks on their way to the second node.

Now, we can invoke Assumptions 1 and 2 (listed in Section 3.1) and formulate the

regeneration theory as mentioned in Lemmas 1–7 in order to obtain:

E[T 0,0r1,r2

([0], [1 L]

)|τ = s, τ = S1] = s + E[T 1,0r1,r2

([0], [1 L]

),

E[T 0,0r1,r2

([0], [1 L]

)|τ = s, τ = S2] = s + E[T 0,1r1,r2

([0], [1 L]

)and

E[T 0,0r1,r2

([0], [1 L]

)|τ = s, τ = Z21] = s + E[T 0,0r1,r2+L

([0], [0]

), (3.38)

where s ∈ [0,∞). Next, we use the iterated conditional expectations to write:

E[T 0,0r1,r2

([0], [1 L])] =

∫ ∞

0

E[T 0,0r1,r2

([0], [1 L]

)|τ = s]fτ (s)ds, (3.39)

where fτ (t) = (λs1 + λs2 + λ̃21)e−(λs1+λs2+λ̃21)su(s) is the pdf. of τ . Observe that

{τ = s} =⋃2

k=1{τ = s, τ = Sk}⋃{τ = s, τ = Z21}. Using this in (3.39) and

applying the results from (3.38) and Appendix D, we get:

E[T 0,0r1,r2

([0], [1 L])] =1

λA

+λs1

λA

E[T 1,0r1,r2

([0], [1 L])] +λs2

λA

E[T 0,1r1,r2

([0], [1 L])]

+λ̃21

λA

E[T 0,0r1,r2+L([0], [0])], where λA = λs1 + λs2 + λ̃21.

50


Similarly, we can show the following results:

E[T 0,1r1,r2

([0], [1 L])] =1

λB

+λs1

λB

E[T 1,1r1,r2

([0], [1 L])] +λd2

λB

E[T 0,1r1,r2−1([0], [1 L])]

+λf2

λB

E[T 0,0r1,r2

([0], [1 L])] +λ̃21

λB

E[T 0,1r1,r2+L([0], [0])], where λB = λd2 + λs1 + λf2 + λ̃21,

E[T 1,0r1,r2

([0], [1 L])] =1

λC

+λd1

λC

E[T 1,0r1−1,r2

([0], [1 L])] +λf1

λB

E[T 0,0r1,r2

([0], [1 L])]

+λs2

λC

E[T 1,1r1,r2

([0], [1 L])] +λ̃21

λB

E[T 1,0r1,r2+L([0], [0])], where λC = λd1 + λf1 + λs2 + λ̃21,

E[T 1,1r1,r2

([0], [1 L])] =1

λD

+λd1

λD

E[T 1,1r1−1,r2

([0], [1 L])] +λd2

λD

E[T 1,1r1,r2−1([0], [1 L])]

+λf1

λD

E[T 0,1r1,r2

([0], [1 L])] +λf2

λD

E[T 1,0r1,r2

([0], [1 L])] +λ̃21

λB

E[T 1,0r1,r2+L([0], [0])],

where λD = λd1 + λd2 + λf1 + λf2 + λ̃21.

Rearranging the above four equations, one for each E[T fr1,r2

([0], [1 L])], we obtain the

first matrix relation given in Theorem 4. For the case when C = ([0], [0]), we can

repeat the above analysis, while setting Z21 = ∞ a.s. in accordance to Convention

C1. Also, note that the CDF pfr1,r2

(t;C) can be written as pfr1,r2

(t;C) = E[1T fr1,r2

(C)≤t],

where 1A is an indicator function for the event A. This will enable us to use the

smoothing property of expectations and exploit the regeneration theory to get the

desired matrix relation. This completes the proof of Theorem 4.2

As a final observation we point out that swapping the roles of the sender and

receiver nodes does not change the nature of our analysis. Therefore, we relax the

assumption that the first node is the sender node. Thus, we can theoretically cal-

culate the LB gain and the sender/receiver pair (i.e., which node should be sending

tasks to the other) that will minimize the AOCT. This allows the optimal implemen-

tation of LBP-1.

51


3.4.2 Analysis of reactive LB policy: LBP-2

In the policy LBP-2, all nodes execute LB together at time t = 0 without con-

sideration of the future possibilities for failures and subsequent recoveries of nodes.

Further, each node is assumed to be equipped with a backup system that can send

or receive tasks. Every time a node fails, the backup system of the failed node gets

activated and transfers loads to all other nodes (functional or failed). Each such

transfer is performed in order to compensate the loss in overall computing time of

the system due to the failure of a node. As in LBP-1, we will assume that the com-

munication delays are negligible, while we take account of the random load transfer

delays.

The initial LB action is taken at t = 0 to achieve an “approximately” uniform

division of the total system-load among all the nodes assuming that all nodes will

remain functional. To this end, we utilize the one-shot LB policy described in Sec-

tion (3.2.1). In particular, based on the hypothesis that nodes will never fail, we

exploit the theory given in Section (3.2.2) to calculate the optimal LB gains that

minimize the AOCT.

Now suppose that the kth node fails at time t > 0. The average time that the

kth node will remain in the failed mode is: E[Sk] = λ−1sk

. In contrary, had the kth

node been functional during the λ−1sk

units of time, it would have serviced λdkλ−1

sk

number of tasks (which is the average recovery time multiplied by the service rate

of the kth node). In other words, the failure of the kth node should result in an

accumulation, on average, of λdk/λsk

unattended tasks during its recovery period.

Therefore, the system that has previously been balanced at t = 0 becomes suddenly

unbalanced again. Consequently, the kth node has to be allowed another balancing

opportunity, where it can transfer

(λdiPn

j=1 λdj

)(λdk

λsk

)number of tasks to every ith

node, i 6= k, in the system. However, the steady-state probability of any ith node

52


to be functional is

(λsi

λfi+λsi

). Thus, at every failure instant of the kth node, the

reactive policy LBP-2 considers the steady-state functional probabilities of all ith

nodes (i 6= k), and transfers LFAILik number of tasks from the kth node to every ith

node in the system, where

LFAILik = b

(λsi

λfi+ λsi

)(λdi∑nj=1 λdj

)(λdk

λsk

)c. (3.40)

3.5 Conclusion

We have undertaken a novel queuing approach to analyze the stochastic dynamics,

evolving under the one-shot LB policy, of cooperative systems comprising distrib-

uted nodes. In the one-shot LB policy, each functional node first utilizes its local

information about other nodes to calculate the number of tasks to be transferred,

which is instantly followed by a synchronized load balancing action taken together

by all functional nodes. Our model specifically captures the effects of random com-

munication delays and random load-transfer delays in the communication network.

We have introduced three fundamental random vectors to track the underlying point

processes associated with the distributed system. At any given time, these vectors

store information about load distribution among nodes, available functional nodes

and loads in the communication network. In addition, our model assumes that all

the random delays follow exponential distributions. Under this assumption, a re-

generation theory has been formulated yielding coupled renewal equations for three

different types of distributed systems; namely, (1) a system with no node-failure, (2)

a system with random node-failures, and (3) a system with random node-failures

and random node-recoveries.

In particular, we have derived a set of renewal equations that characterize the

expected value of the overall completion time for a certain amount of load initially

53


given to the system with no node-failure. Similarly, we have obtained a different set

of renewal equations characterizing the probability of successfully serving an initial

amount of load present in the system with random node failures. For the system

with random node-failures and random node-recoveries, we have considered two dif-

ferent LB policies. The first policy, LBP-1, preemptively performs load balancing by

utilizing the statistical information about the failure and recovery processes. By as-

suming that the communication delays are negligible, we have obtained surprisingly

simple recursive relations that characterize the expected value of the overall com-

pletion times as well as their cumulative distribution functions. On the other hand,

in the second policy, LBP-2, the initial LB action is taken without predicting the

node-failure process. Instead, at every node-failure instant, the policy LBP-2 enables

the back-up system of the failed node to take a LB action, whereby distributing the

uncompleted load that has accumulated during the recovery time.

54

Chapter 4

Experimental, Theoretical and

Simulation Results

We present the experimental, theoretical and MC simulation-based results on the per-

formance of one-shot LB policy as applied to a distributed computing application.

The distributed computing application was chosen to be the matrix-multiplication

problem performed over a distributed system comprising two nodes that were con-

nected by (i) the Internet and (ii) the UNM EECE infrastructure-based IEEE 802.11b/g

WLAN. Over the Internet, we used a 650 MHz Intel Pentium III processor-based

computer (the first node) and a 2.66 GHz Intel P4 processor-based computer (the

second node). For the WLAN setup, the first node was replaced by a 1 GHz Trans-

meta Crusoe processor-based computer while the second node was kept the same as

in the Internet-based experiments.

In this Chapter, we first introduce our distributed computing test-bed in Sec-

tion (4.1) followed by the experimentally calculated empirical estimates of the system

parameters in Section (4.2). Next, we present the numerical results for LB policies

for three different distributed systems; namely, (1) system with no node-failures in

55

Chapter 4. Experimental, Theoretical and Simulation Results

Section 4.3, (2) system with recoverable node-failures in Section 4.4, and (3) sys-

tem with permanent node-failures in Section 4.5. The objective of this Chapter is

to compare the theoretical predictions, obtained by solving the equations given in

Chapter (3), to the real-time experimental results obtained over our test-bed.

4.1 Distributed computing system architecture

The LB policy has been implemented on a distributed computing system to experi-

mentally determine its performance. The system consists of nodes that are processing

jobs in a cooperative environment. Each node is equipped with a back-up system

that always saves the context of the application running on the node. The soft-

ware architecture of the distributed system is divided into three layers: application,

load-balancing and communication. In this section we provide just a brief exposition

to each layer; the interested readers are referred to the work of Ghanem [42] for a

detailed description.

The application that is used to evaluate the performance of LB policy is matrix

multiplication, where the service to one task is defined as the multiplication of one

row times a static matrix duplicated on all nodes. To achieve variability in the

processing speed of the nodes, the randomness is introduced in the size of each

task (row) by independently choosing its arithmetic precision with an exponential

distribution. In addition, the application layer updates the load information that

is being transferred between the nodes. The LB policy is implemented at the load-

balancing layer with a software using a multi-threaded process, where the POSIX-

threads programming standard is used. One of the threads schedules and triggers

the LB actions at predefined or calculated amount of times. At the LB instant,

the load-balancing thread calculates the number of the tasks to be transferred to

other nodes and accesses the shared data to specify the tasks to be sent by the

56


communication layer. A different thread is coded for the back-up system of each

node in order to tackle the node-failures. If required, the failure-thread can also

compute the number of tasks to be transferred to other nodes. The communication

layer of each node handles the transfer of data from/to that node to/from the other

nodes within the system. Each node uses the UDP transport protocol to transfer its

load-information to the other nodes. The load information consists of the current

queue size, the estimate of current processing rate, and the estimate of network

delays between nodes. On the other hand, the communication layer uses the TCP

transport protocol to transfer the application data (tasks) between the nodes.

Identical copies of the above mentioned software runs on each node of the system.

Therefore, once the nodes are synchronized, every node can execute load balancing

autonomously at the prescribed LB instant by utilizing its local information. Finally,

it should be noted that the software platform is coded in ANSI-C over UNIX-based

systems. It has been successfully tested on SPARC processor-based machines run-

ning Solaris operating system and on IA-32 processor-based computers running both

Linux and Microsoft Windows operating systems.

4.2 Empirical estimation of system parameters

At first, experiments were performed to estimate the system parameters, namely,

the processing rates of the nodes (λdi), the load-information rates (λik), and the

load-transfer rate per task (θ−1). To recall, one task is a row (an array of numbers)

and the processing time of a task is the time required to multiply a static matrix

of fixed size by the row. As mentioned in Section 4.1, the task sizes are generated

randomly and independently, according to an exponential distribution, which will, in

turn, result in independent and identically-distributed processing times of tasks. The

empirically calculated pdfs of the processing time per task for each node is shown in

57


Fig. 4.1. Clearly, each empirical pdf can be approximated with an exponential pdf

of appropriate rates.

0 2 4 6 8 10 120

0.5

1

1.5

t , s

f W1

(t)

0 1 2 3 4 50

0.5

1

1.5

2

t , s

f W2

(t)

Figure 4.1: Empirically estimated

pdfs of the processing time per task for

the Transmeta Crusoe machine (top)

and Intel P4 machine (bottom) as well

as their exponential approximations

(solid curves).

In Fig. 4.2 we show the empirical pdfs for the load-information delays over the

Internet as well as the WLAN, each of which can be approximated with an exponen-

tial pdf. In the experiments, each load-information packet had fixed size of 30 Bytes.

From Fig. 4.3 (left), we see that the average transfer delay grows linearly with the

increase in number of tasks. Further, in the same figure (right) the transfer delay per

task can also be approximated as an exponential random variable. Although there

are slight shifts observed in the pdfs of the load-information delays and the transfer

delay per task, in our approximation we have maintained the exponential form of

58


the pdf and compensated the shift through the choice of the exponential parameter.

To summarize, empirical results for the pdfs are found to be in good agreement with

Assumption A1 of Section 3.1.

0.6 0.8 1 1.2 1.4 1.60

2

4

6

8

10

12

t, s

f X21

(t)

0.6 0.8 1 1.2 1.4 1.6 1.80

1

2

3

4

5

6

7

8

t, sf X

21

(t)

Figure 4.2: Empirical pdf’s of the load-information delays from the first node to the

second node obtained on the Internet (left) and on the EECE WLAN (right).

0 10 20 30 40 50 600

5

10

15

20

25

Number of Tasks

Ave

rage

Tra

nsfe

r D

elay

, s

0.1 0.2 0.3 0.4 0.5 0.60

5

10

15

20

t , s

f Z(

)

t

Figure 4.3: Left: Average transfer delay as a function of the number of tasks trans-

ferred between nodes. The stars are the actual realizations from the experiments.

Right: Empirical pdf of the transfer delay per task on the Internet under a normal

work-day of operation.

59


4.3 System without node-failures

In the experiments conducted over the Internet, the first node and the second node

were initially assigned 100 and 60 tasks, respectively, where each task had a mean

size of 120 Bytes. In this context, the processing rates per task of the first node

and the second node were found to be 0.69 and 1.85, respectively. Firstly, fixing the

LB gain at K = 1, we optimized the AOCT by triggering the LB action at different

instants. The analytical and experimental results of this optimization are shown in

Fig. 4.4 (left). The experimental results are plotted by taking the AOCTs obtained

from 20 experiments for each tb. It can be seen that the AOCT becomes small

after tb = 1s. This behavior is attributed to the load-information delay imposed

by the channel. The empirically calculated average load-information delay from the

first node to the second node was 0.7s, and from the second node to the first node

was 0.9s. Therefore, any LB action performed before 0.7s is blind in the sense that

there is no knowledge of the initial load of the other node; both nodes exchange

tasks in this case. This behavior is evident from the experimental results shown in

Fig. 4.4 (right), which depicts the mean number of tasks transferred as a function

of tb. Further, when LB action is taken between 0.7s and 0.9s, the first node will

most likely have knowledge of the initial load of the second node, while the second

node would still be unknown about the initial load of the first node. Consequently,

according to (3.5), the first node sends a smaller portion of its load to the second

node while the second node still sends the same amount of load to the first node.

This means that the slower node (the first node) would eventually execute more tasks

than the faster node (the second node); hence, a larger AOCT is expected. On the

other hand, any LB action taken after 1s is not advantageous because there would

be a low probability for information to arrive. If tb is delayed for too long, the slower

node ends up computing more tasks, resulting in a larger AOCT (not shown in the

figure).

60


For the experiments over WLAN, the initial load at the first node and the second

node were set to 100 and 60 tasks, respectively, while the processing rates per task

were estimated to be 1.07 and 1.85, respectively. A similar behavior, as in the case of

Internet, was observed for these WLAN experiments. Since the delay in transmitting

small packets, which is referred to here as load-information delay, fluctuates randomly

in WLAN, more realizations of the experiments are required for each tb to get a

smoother plot of the AOCT.

0 0.25 0.5 0.75 1 1.25 1.5 1.75 2 2.25 2.5 2.75 350

55

60

65

70

75

80

85

90

95

100

tb, s

Ave

rage

Ove

rall

Com

plet

ion

Tim

e, s

Experimental

Theoretical

MC Simulation

0 0.5 1 1.5 2 2.5 30

20

40

60

80

Ave

rage

Tas

k T

rans

ferr

ed

tb,s

Pentium IIIPentium 4

Figure 4.4: Left: The AOCT as a function of LB instants for the experiments over

the Internet. The LB gain was fixed at 1. Right: Amount of load transferred between

nodes at different LB instants.

Our next goal is to minimize the AOCT over K while keeping tb fixed. The

experiments were performed with the same initial configurations, and the LB was

triggered at 1s using different gains. The results obtained over the Internet and

WLAN are shown in Fig. 4.5. It is seen that the theoretical, MC-simulation, and

experimental results are in good agreement and the optimal K is approximately 1.

This is almost equivalent to the hypothetical case when transfer delay is absent,

in which case, perfect LB is achieved when K = 1 (or when, on average, 55 tasks

are transferred from the first node to the second node, as given by (3.5)). For

61


experiments over the Internet, the empirically calculated average transfer delay per

task was found to be 0.17s, and the average delay to transfer 55 tasks from the first

node to the second node is therefore 9s, approximately. On the other hand, the

second node does not finish its initial load until 32s, which means that there are no

idle times at the second node before the arrival of the transfer. Therefore, any transfer

incurring a delay less than 32s is effectively equivalent, as far as the second node is

concerned, to an instantaneous transfer. For experiments over WLAN, the initial

load at the first node and the second node were set to 100 and 60 tasks, respectively,

while the processing rates per task were estimated to be 1.07 and 1.85, respectively.

The average delay to transfer 55 tasks was 5.5s, and the optimal performance was

obtained for K = 1 as expected.

0 0.2 0.4 0.6 0.8 10

20

40

60

80

100

120

140

160

K

Ave

rage

Ove

rall

Com

plet

ion

Tim

e,s

ExperimentalTheoreticalMC Simulation

0 0.2 0.4 0.6 0.8 10

20

40

60

80

100

K

Ave

rage

Ove

rall

Com

plet

ion

Tim

e,s


Figure 4.5: The AOCT under different LB gains for the Internet (left) and the WLAN

(right). The LB instant was fixed at 2s.

These results motivate us to look further into the effect of K on the AOCT.

Specifically, we consider the types of applications that impose a mean transfer delay

to be greater than the mean processing time of the initial load at the receiver end,

thereby resulting in an idle time for the receiving node. This kind of situation can

arise in real applications like processing of satellite images where the images are large

in size, and thus the time to transfer them is greater than their processing time [43].

62


We simulated this type of behavior in our matrix-multiplication setup by increasing

the mean size (in Bytes) of each row while simultaneously reducing the number of

columns to be multiplied in the static matrix. Clearly, a larger row size increases

the mean transfer delay per row (a task) as well as the mean processing time per

task. However, by reducing the number of columns in the static matrix, the mean

processing time per task can be reduced. By using this approach, we were able to

achieve a mean transfer delay per task of 0.72s while keeping the processing rates at

1.06 and 3.78 tasks per second for the first node and the second node, respectively.

The initial loads were still 100 and 60 tasks at the first and second nodes, respectively.

Now, according to (3.5), with K = 1, the load to be transferred from the first node is

64 tasks, producing a delay of 46s. On the other hand, the second node, on average,

finishes its initial load around 16s, and it would therefore have long idle time while

it is awaiting the arrival of load. This discussion is also supported by our theoretical

and experimental results shown in Fig. 4.6 (left), where the AOCT is at minimum

when K = 0.7, which holds for both experimental and theoretical curves. The error

between the theoretical and experimental minima is approximately 12%. Finally,

Fig. 4.6 (right) shows the analytical optimal gain as a function of the mean transfer

delay per task.

4.4 System with recoverable node-failures

All the experiments for policies LBP-1 and LBP-2 were conducted over the EECE

infrastructure-based IEEE 802.11b/g network at the University of New Mexico. The

processing rates per task of the first and the second nodes were estimated to be 1.07

and 1.85, respectively. Further, the nodes are assumed to fail and recover indepen-

dently and randomly according to an exponential pdf. In order to achieve this in our

experiments, we have coded a process that dynamically generates failure instants

63


0 0.2 0.4 0.6 0.8 140

50

60

70

80

90

100

110

K

Ave

rage

Ove

rall

Com

plet

ion

Tim

e,s


0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

Mean Delay Per Task, s

Opt

imal

K

Figure 4.6: Left: The AOCT as a function of the LB gain in presence of large transfer

delay. The LB instant was fixed at 2s. Right: Theoretical result on the optimal LB

gain for different delays.

and sends signals, at all such failure instants, to the application layer ordering it

to stop executing tasks. Also, at every failure instant, the same process generates

a recovery time and waits for that amount of time before sending a new signal to

the application layer ordering it to resume the execution of tasks. In this section,

the average failure time for both nodes is 20s, while the average recovery times of

the first and second nodes are 10s and 20s, respectively. Clearly, the first node is

expected to be available for more time than the second node.

Initially, experiments were performed to assess the performance of LBP-1. The

first node was assigned 100 tasks, while the second node was assigned 60 tasks. The

LB was performed at time t = 0 according to (3.37) by transferring load from the

first node to the second node using different values for K and the corresponding

AOCT was computed. The theoretical, MC-simulation, and experimental results for

the AOCT are shown in Fig. (4.7). For comparison, the results for the no-failure case

(when the failure rate is set to zero) are also shown. From the theoretical curves,

it can be seen that the minimum AOCT occurs at K = 0.35, while the minimum

64


occurs at K = 0.45 for the no-failure case. Note that in the former case, the first

node transfers 35 tasks to the second node, while in the latter case it transfers 45

tasks to the second node. In both cases, the AOCT is minimized when the first

node transfers tasks to the second node, which has a higher processing rate. But, in

presence of node failure, the amount of transfer has to be reduced for the optimal

performance because the second node is now less reliable. Intuitively, we can state

that the optimal K in case of node failure will always be less than the optimal K for

the no failure case.

0 0.2 0.4 0.6 0.8 140

60

80

100

120

140

160

180

K

Ave

rag

e O

vera

ll C

om

ple

tion

Tim

e,

s

TheoreticalExperimentalNo Failure CaseMC Simulation

Figure 4.7: The average overall com-

pletion time as a function of the LB

gain K for the LBP-1.

0 20 40 60 80 100 120 1400

20

40

60

80

100

120

140

Time, s

Qu

eu

e S

ize

LBP2LBP1

P4

Crusoe

Figure 4.8: A realization of the queues

obtained from the experiments con-

ducted for LBP-1 and LBP-2.

Next, we conducted experiments for LBP-2. The initial load distribution was 100

and 60 tasks at the first and second nodes, respectively. Note that from (3.40) the

amount of load to be transferred at every failure instant happens to be a constant,

depending on system parameters that are already set to certain values. The optimal

gain K for the initial LB (which does not account for node failure) was found to

be 1. Using this optimal gain, the AOCT was calculated using 60 independent

realizations of the experiments and was found to be 109.17s. We also performed

65


the MC simulation under the same initial set-up for the LBP-2, and the AOCT

turned out to be 112.43s using 500 realizations. Recall that in the case of LBP-

1 (see Fig. (4.7)), the minimum AOCT is 117s, which is greater than the value

obtained for the LBP-2. This is expected since LBP-1 takes a preemptive action in

the beginning by predicting the failure instants, while LBP-2 avoids the prediction by

taking an action of transferring tasks only when failures occur. In order to compare

the dynamics of each policy, we show in Fig. 4.8 the actual queues of each node, under

one realization of the experiments performed for LBP-1 and LBP-2. The longer flat

portions of the queues correspond to the recovery times of the nodes. Also, the

downward (upward) jumps in the queues under LBP-2 correspond to the action of

transferring (receiving) tasks after every failure instant.

In order to compare the performance of LBP-1 and LBP-2 in the presence of

small network delays, we conducted experiments for each policy using different initial

loads. The average transfer delay per task was estimated to be equal to 0.02s. For

each initial load distribution that is listed in Table 4.1, the theoretical model was

utilized to calculate the optimal LB gains and the sender/receiver pair for LBP-1

that minimizes the AOCT. It was found that if the initial load of the first node is

smaller than the initial load of the second node, then the load transfer has to be

made from the second node to the first node; otherwise, the first node has to be

the sender node. Using the respective optimal LB gains and the sender/receiver

pairs, the actual experiments were conducted and the AOCT was calculated using

20 independent realizations for each initial load distribution. In Table 4.1 we also

list the theoretically calculated AOCT under the no node-failure case. The initial

optimal LB gains used in the experiments under LBP-2 were calculated apriori based

on the theoretical results for the initial condition as given in Section (3.2). In Table

4.2 we have listed the results obtained from our MC simulations and the real-time

experiments. We can see from both Tables that LBP-2 outperforms LBP-1 in all

cases.

66


Table 4.1: Experimental results for LBP-1 using the theoretically determined optimal

LB gains.

Average Overall

Initial Optimal Completion Time (s)

Load LB Gain Node Failure Without

(m1,m2) Kopt Theo. Exp. Node

Pred. Result Failure

(200,200) 0.15 274.95 264.72 141.94

(200,100) 0.35 210.13 207.32 106.93

(100,200) 0.15 210.13 229.19 106.93

(200,50) 0.5 177.09 172.56 89.32

(50,200) 0.25 177.09 215.66 89.32

Table 4.2: Experimental and simulation results for LBP-2.

Initial Initial Optimal Average Overall

Load LB Gain Completion Time (s)

(m1,m2) Kopt MC Exp.

Simulation Result

(200,200) 1.00 277.9 263.4

(200,100) 1.00 202.4 188.8

(100,200) 0.80 203.07 212.9

(200,50) 1.00 170.81 171.42

(50,200) 0.95 189.72 177.6

We also studied the performance of LBP-1 and LBP-2 under different amount

of transfer delays in the channel. The results are shown in Table 4.3, and it can be

seen that when the average transfer delay per task is bigger than 1s, LBP-1 results

67


in a smaller AOCT than LBP-2. This is attributable to the amount of time needed

in making load transfers at every failure instant in the case of LBP-2, which may

result in frequent idle times at the receiver node while it waits for the load to arrive.

On the other hand, LBP-1 only makes a one-time transfer at the beginning of load

execution. We observed that if the load-transfer time between nodes is in the order

of the average recovery time of the sender node, LBP-1 performs better than LBP-2.

Table 4.3: Performance of the LBP-1 and the LBP-2 under different network delays.

Average Delay Calculated Average Overall

Per Task Completion Time (s)

(s) LBP-1 LBP-2

.01 116.82 112.43

0.5 117.76 115.94

1 120.99 122.25

2 127.62 133.02

3 131.64 142.86

Finally, using results of Theorem 4 given in Section (3.4), we computed p1,1r1,r2

(t;C)

corresponding to the LB gain that minimizes the AOCT. The average transfer delay

per task was taken to be 0.02s. As an example, in Fig. 4.9 we present the cumulative

distribution function for the overall completion time for two different initial load

distributions given by (50, 0) and (25, 50).

4.5 System with permanent node-failures

In this section, we present the numerical results, based on the theory given in Sec-

tion (3.3), for the one-shot LB policy applied to a system comprising two distributed

68


0 50 100 150 200 2500

0.5

1

t (s)p

1,1

25

,50(t

)

0 50 100 150 200 2500

0.5

1

t (s)

p1

,15

0,0

(t)

No Failure Failure

FailureNo Failure

Figure 4.9: The cumulative distribution function of the overall completion time in

LBP-1. The upper figure shows the case of an initial workload of (50, 0), while the

lower figure is for an initial workload of (25, 50).

nodes, where each node can fail permanently in random amount of time. The the-

oretical results are compared to the MC simulation results as well. At first, we

will provide a brief revision to the one-shot LB problem that was detailed earlier in

Section (3.2.1).

Recall that the load to be transferred from the jth node to the ith node (for

i, j ∈ {1, 2}, i 6= j) at the LB instant tb is given by

Lij(tb) =

⌊Kij

(Qj(tb)−

Q∗i(j)(tb) + Qj(tb)

2

)+⌋

, (4.1)

where Qk(0) = mk for k ∈ {1, 2} (which is the initial load of the kth node), and

Q∗i(j)(tb) is calculated based on the system-information state I at tb. More precisely,

Q∗i(j)(tb) = Qi(tb) if the jth node has received the load-information sent from the

69


ith node by time tb, while Q∗i(j)(tb) = 0 otherwise. Notice that in (4.1), Lij(tb) is

calculated without considering the processing rates of the nodes, while in (3.1) the

calculation of the excess load of the jth node involves the processing rates of all

the nodes in the system. With the new expression for Lij(tb), the upper bound for

Kij becomes equal to2Qj(tb)

Qj(tb)−Q∗i(j)

(tb)instead of the upper bound equal to 1 in 3.5.

Therefore, in our new approach, it is possible for a node to transfer all its initial

load to another node in the system irrespective of the processing rates. This is quite

effective in situations where the faster nodes are more likely to fail permanently

than the relatively slower nodes, thereby necessitating large load-transfers in the

opposite directions, viz., from the faster to the slower nodes. The objective is to find

the optimal one-shot LB policy, defined by the choice of optimal tb together with

the optimal Kij, that maximizes computing reliability. More precisely, the optimal

one-shot LB policy is defined by argmaxtb,K12,K21

(R

(10,01),(11,11)m1,m2

(tb; [0], [0]

)).

Example 1: Consider the case for which m1 = 50 tasks and m2 = 25 tasks. The

service rates (in tasks per second) are λd1 = 0.5 and λd2 = 0.75; the failure times be:

λ−1f1

= 80 s and λ−1f2

= 50 s. The mean arrival times of the load-information packets

and the failure-notices be: λ−121 = λ−1

12 = λF−121 = λF−1

12 = 0.12 s, and let the slope of

mean transfer delay per task be θ = 0.4 s per task. Note that in this example the

first node with a smaller service rate is assigned a larger initial load. Intuitively, a

good LB policy would distribute the load considering both the service rates and the

failure rates of the nodes.

Let us first look at the solution to the initial condition corresponding to tb = 0,

when I = (10, 01),F = (11, 11) and C = ([0], [0]). Therefore, from (4.1), we obtain,

Lij(0) = bKijmj

2c with 0 ≤ Kij ≤ 2. Now, we can precisely solve the difference

equations listed in Theorem 3 of Section (3.3.2) to calculate R(10,01),(11,11)50,25

(0; [0], [0]

)).

In Fig. 4.10, the success probability under different choice of K12 is plotted as a

function of K21. Small value for K21 implies that the first (slower) node keeps most

70


of its initial load. Consequently, load distribution remains unfair even after LB is

performed. Therefore, the time required to serve all customers becomes “large” and

the success probability is “small.” On the other hand, when K21 approaches 2, the

first node transfers most of its initial load to the second node. Consequently, almost

all the tasks have to be executed by the second (less reliable) node, thereby reducing

the success probability.

In Fig. 4.11, we fix K21 at 0.1, 1 and 2, while varying K12 between 0 and 2.

Note that when K21 = 2, the success probability can be maximized by choosing

K12 = 1.7. This can be explained by the fact that since the first node sends all 50

tasks to the second node, a large value for K12 ensures that the second node can

also send more tasks to the first node, which will avoid load-accumulation at the

second node. However, for K12 = 2, the success probability decreases, which can

be attributed to the fact that too many tasks get accumulated at the slower first

node. In Figs. 4.10 and 4.11, we have also shown results from the MC simulations

of the one-shot LB policy, where each success probability is calculated by averaging

outcomes (failures or successes) from 5000 independent realizations of the policy.

The MC simulation results are in good agreement to our theoretical results. Finally,

by numerically solving the renewal equations of Theorem 3 of Section (3.3.1), we

have plotted the success probability as a function of tb in Fig. 4.12. In summary,

we obtain argmax(tb,K21,K12)

(R

(10,01),(11,11)50,25

(tb; [0], [0]

)) =(8.1 s, 1.5, 0), which gives the

optimal success probability of 0.4233.

4.6 Conclusion

We have verified the validity of our regeneration-theory based model of the one-shot

load balancing policy by showing that the theoretical results are in good agreement

to the real-time experimental results as well as to the MC simulation results. One-

71


0 0.25 0.5 0.75 1 1.25 1.5 1.75 20.24

0.26

0.28

0.3

0.32

0.34

0.36

0.38

0.4

0.42

0.44

K21

R(1

0,01

),(1

1,11

)50

,25

(0;

[0],

[0])

K12

= 0.1 K

12 = 1.5

MC Simulation

Figure 4.10: Probability of success as a function of LB gain of the first node when

LB is performed at time t = 0. Stars represent the Monte-Carlo simulation.

0 0.25 0.5 0.75 1 1.25 1.5 1.75 20.2

0.25

0.3

0.35

0.4

0.45

K12

R (

10,0

1),(

11,1

1)50

,25

(0;

[0],

[0])

K21

= 0.1 K

21 = 1

K21

= 2

Figure 4.11: Probability of success as a function of LB gain of the second node when

LB is performed at time t = 0. Stars represent the Monte-Carlo simulation.

shot load balancing experiments were performed by multiplying large matrices on

our custom-made distributed system comprising two heterogeneous nodes that were

72


0 20 40 60 80 1000.365

0.37

0.375

0.38

0.385

0.39

0.395

0.4

0.405

tb, s

R (

10,0

1),(

11,1

1)50

,25

( t b;[

0],[

0])

Figure 4.12: Probability of success as a function of LB instant tb, while LB gains of

both nodes are kept at 0.5.

connected over the Internet and the IEEE 802.11b/g WLAN. Our results for the

distributed system without node-failures showed that for a given initial load and a

given load balancing gain there is an optimal load balancing instant that minimizes

the AOCT. In particular, if load balancing is performed before the optimal instant,

there is a likelihood that at least one of the nodes is not informed about the initial

load of the other node at the time of load balancing. On the other hand, if load

balancing is performed after the optimal instant, there is a likelihood that the faster

node becomes idle while either the load is in transit or the slower node is processing

tasks. We also looked at the interplay between the balancing gain and the size of

the random delay in the channel. The theoretical predictions, MC simulations and

the experimental results all showed that when the average transfer delay per task is

large compared to the average processing time per task, reduced LB gain minimizes

the AOCT.

Next, we presented a comparative analysis of the performance of LBP-1 and

73


LBP-2 on a distributed system with recoverable node-failures. Under the policy LBP-

1, we saw that the apriori information about statistics of the failure and recovery

processes can be utilized to calculate the optimal LB gain that minimizes the AOCT

corresponding to a given initial load. We noticed that the presence of node-failures

and subsequent recoveries warrants the use of a reduced LB gain as compared to the

no-failure case. In addition, our studies revealed that when the average load-transfer

delays are small compared to the average recovery times, LBP-2 outperforms LBP-1.

In contrast, when the average load-transfer delays are large compared to the average

recovery times, the time wasted in transferring tasks at every failure instant, under

LBP-2, adversely affects the AOCT. Therefore, it is advantageous to use the LBP-1

instead of the LBP-2 in such situations.

Finally, we calculated the optimal one-shot LB parameters for a two-node distrib-

uted system in the context of permanent physical failure. The theoretical predictions

and MC simulations suggest that the probability of successfully serving initial num-

ber of tasks is a concave function of the load balancing gains. Further, we observed

that the probability of success can be maximized by selecting an appropriate load

balancing instant. In summary, for all three systems we found that the presence of

uncertainty (viz., node faulres/recoveries or random delays) calls for an attenuation

in the strength of LB action.

74

Chapter 5

Dynamic Load Balancing Policy

In this Chapter, we consider that external loads of different sizes (possibly corre-

sponding to different applications) arrive at a distributed-computing system ran-

domly in time and node space. Consequently, LB has to be performed periodically

to maintain load balance in the system. To this end, we propose a sender-initiated

distributed LB policy where each node can autonomously take LB actions repeat-

edly during run-time. As the LB actions are taken during run-time, the proposed

LB scheme comes under the DLB.

In our proposed DLB policy, every time an external load arrives at a node, the

node seeks a locally optimal one-shot-LB action. In particular, the locally optimal

one-shot-LB action is aimed to minimize the AOCT or the probability of success, as

appropriate, corresponding to the load present in the system just after the occurrence

of external-arrival. For clarity, we use the term external load to represent the loads

submitted to the system from some external source and not the loads transferred

from other nodes due to LB. Such a local one-shot LB action, which is required for

DLB, is different from the synchronous one-shot LB action described in Chapter (3)

in two ways: (1) the local action adapts to varying system parameters such as load

75

Chapter 5. Dynamic Load Balancing Policy

variability, randomness in the channel delay, and variable run-time processing speed

of the nodes, and (2) the local LB is performed in an asynchronous fashion, that is,

each node selects its own optimal LB instant and LB gain. (Recall that according to

the synchronized one-shot LB policy after the initial load assignment to nodes, all

the nodes execute LB synchronously using a common LB instant and a common LB

gain.)

Rest of the Chapter is organized as follows; We present the DLB policy in Sec-

tion (5.1) and describe the corresponding experimental results in Section (5.2). This

is followed by the formulation of computationally efficient sub-optimal LB policy for

an arbitrary number of nodes in Section (5.3). Finally, our conclusions are given in

Section (5.4).

5.1 Formulation of the DLB Policy

Consider a system of n distributed nodes with a given initial load, and assume that

external loads arrive randomly thereafter. We assume that nodes communicate with

each other at so-called “sync instants” on a regular basis. Upon the arrival of each

batch of external loads, the receiving node, and only the receiving node, prompts

itself to execute an optimal distributed one-shot LB. Namely, it finds the optimal

LB instant and gain and executes a LB action accordingly. Since load balancing is

performed locally at the external-load receiving node, say the jth node, the policy

depends only on its knowledge state vector ij, rather than the system knowledge state

I. Consequently, the number of possible knowledge states become 2(n−1). Further,

considering the periodic sync-exchanges between nodes, each node in the system is

continually assumed to be informed of the states of other nodes. Hence, the only

possible choice for the knowledge state vector of each jth node is: ij = (1 · · · 1) ≡ 1,

leading to a simpler optimization problem than the one detailed earlier.

76


Suppose that an external arrival occurs at the jth node at time t = ta. We need

to compute optimal LB gain and optimal LB instant for the jth node based on the

knowledge-state vector 1. Clearly, according to the knowledge of the jth node at

time ta, the effective queue length of the kth node is mk(j)(ta). To recall, mk(j)(ta) =

Qk(ta−ηjk∗), where ηjk∗ refers to the delay in the most recent communication received

by the jth node from the kth node. The goal is to minimize µ1m1(j),...,mn(j)

(ta + tb),

where tb is the LB instant of the jth node measured from the time of arrival ta.

By setting ta = 0, the system of queues, in the context of the jth node, at time ta

becomes statistically equivalent to the system of queues at time 0 with initial load

distribution mk(j) for all k ∈ {1, . . . , n}. Therefore, we can utilize the regeneration

theory to obtain the following difference-differential equation that can be solved to

calculate the optimal LB instant and the optimal LB gain.

dµ1m1(j),...,mn(j)

(tb)

dtb=

n∑

k=1

λdkµ1

m1(j)−δ1,k,...,mn(j)−δn,k(tb) − λµ1

m1(j),...,mn(j)(tb) + 1,(5.1)

where λ =∑n

k=1 λdk. In addition, the optimization over tb becomes unnecessary

since the jth node is already in the informed knowledge state 1. This claim has

been justified from the theoretical and experimental results of Chapter 4, where

we have shown that a node should perform LB immediately once it gets informed.

Therefore, our analysis becomes simpler as we can now use tb = 0, and calculate the

corresponding LB gains that minimize µ1m1(j),...,mn(j)

(0) using difference equations. In

practice, the optimal LB gains are calculated on-line by the jth receiver-node and

the LB is performed instantly at time t = ta.

The initial condition µ1m1(j),...,mn(j)

(0) is solved based on techniques similar to the

ones that are used to solve (3.14). But one notable difference here is that the local

LB action taken by the jth node at time 0 (measured from ta) does not consider

future load arrivals at the jth node due to past or future LB actions of other nodes.

More precisely, Lkj(0), for all k 6= j, are calculated based on (3.1), (3.4) and (3.5),

77


while setting Ljk(0) = 0 for all k. Therefore, we would expect to obtain a different

solution for locally optimal K than the one provided by (3.6).

The system parameter, namely the average processing time per task λ−1di

is up-

dated locally by each ith node. At every sync instant, the node broadcasts its current

processing rate and the current queue-size. Since the sync periods are adjusted ac-

cording to the arrival rates, the added overhead in transferring and processing the

knowledge state information grows in proportion to the arrival rates. The second

adaptive parameter is the mean transfer delay per task θji, which is updated by

θ(k)ji = α

(Zji,k

Lji,k

)+ (1− α)θ

(k−1)ji (5.2)

where, Zji,k is the actual delay incurred in sending Lji,k tasks to the jth node at the

kth successful transmission of the ith node, and α ∈ [0, 1] is the so-called “forgetting

factor” of the previous estimation [44]. Also, θ(0)ji is calculated empirically from many

experimental realizations of delays in transferring tasks from the ith node to the jth

node. The forgetting factor can be adjusted dynamically in order to accommodate

drastic changes in transfer delay per task. Steps for the DLB policy are described

next.

Detailed Algorithm for Dynamic Load Balancing

For an n-node distributed system, we specify the “sync” periods for each node by

δj, j = 1, . . . , n. These are the periods, for each node, at which each node broadcasts

its queue length and processing speed to other nodes. (In our experiments, we used

a common sync period of 1s.)

Algorithm:

∀t ≥ 0, at every jth node, the DLB algorithm is:

if mod (t, δj) = 0 then

78


Broadcast current queue size and current processing rate

end if

if “sync” is received then

Update queue size and processing rate of the corresponding node

end if

if external-load is received, say at time t = ta then

Calculate local excess load using (3.1), initial partitions using (3.2) or (3.4), and

optimal Kij using (5.1)

Perform LB only by the jth node in accordance to (3.5)

Update θkij using (5.2) to include delay in completion of the kth load transmission

end if

5.2 Results

In this section we present the results on DLB policy for the experiments conducted

over the Internet, where external loads of random sizes arrive randomly in time at

any node in the distributed system. To recall, each instant an external load arrives

to a node, the receiving node (and only the receiving node) takes a local, optimal

one-shot LB action to minimize the AOCT of the total load in the system at that

instant. As external tasks arrive with certain rate, the total load and the overall

completion time of the total load in the system change with time. Therefore, the

performance of DLB policy is now evaluated in terms of the average completion time

per task (ACTT) corresponding to all tasks that are executed within a specified

time-window. Here the completion time of each task is defined as the sum of the

processing time, the queuing time and the transfer time of the task.

For all the experiments, the tasks are generated independently according to a com-

pound (or generalized) Poisson process with Poisson-distributed marks [45]. More

79


precisely, the external loads arrive according to a Poisson process, and the numbers

of tasks at the load-arrival instants constitute a sequence of independent and iden-

tically distributed Poisson random variables. (Recall that the task size, in terms of

Bytes per task, is also random according to a geometric distribution.) Note that

since the proposed DLB policy is triggered by the arrival of tasks and is based on

the actual realization of the task number in each arrival, it is independent of the

statistics of the number of tasks per arrival as well as the statistics of the underlying

task-arrival process.

The experiments were conducted for three different cases: Experiment-1: The

first node receives, on average 55 external tasks, at each arrival and the average

inter-arrival time is set to be 40s while no external tasks are generated at the second

node. Experiment-2: The second node receives, on average, 22 external tasks at each

arrival and the average inter-arrival time is 9s while no external tasks are generated at

the first node. Experiment-3: The first and the second nodes independently receive,

on average, 16 and 40 external tasks, respectively, at each arrival and the average

inter-arrival times are 20s and 18s, for the first and the second nodes, respectively.

The empirical estimates of the processing rates of the first and the second nodes were

found to be 1.06 and 3.78 tasks per second, respectively. The estimate of the average

transfer delay per task, θ(k)ji , is updated after every transfer of tasks according to

(5.2), with θ(0)ji = .85s and α=.05. In Fig. 5.1, we show estimates of θ

(k)ji , as a function

of time, obtained from one of our DLB experiments.

Each experiment was conducted for a period (time-window) of 1 hour and the

ACTT corresponding to each case is listed in Table 5.1. We also show the ACTT

obtained using static policies that perform LB with fixed gains of K = 0.1 and

K = 1 at all arrival instants. It is clear from Table 5.1 that the ACTT is minimum

for the DLB policy for all three experiments. Considering Experiment-1, note that

the average rate of arrival at the first node is 1.37 tasks per second, since the inter-

80


1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 30000.75

0.8

0.85

0.9

0.95

1

1.05

1.1

t , s

Est

imat

e of

Ave

rage

Tra

nsfe

rD

elay

Per

Tas

k, s

Figure 5.1: Adaptive estimation of the average transfer delay per task.

arrival times are independent of arrival sizes. Therefore, the average arrival rate

of the first node is greater than its processing rate (1.06 tasks per second), but is

smaller than the combined processing rates of the nodes. With LB, some portion of

the arriving tasks is diverted to the second node, which reduces the effective arrival

rate at the first node and thus avoids load accumulation. In the static LB policy with

K=0.1, the first node keeps 90% of its excess load, and hence, the effective arrival

rate at the first node remains larger than its processing rate. Therefore, the queue-

length accumulates with every arrival, which results in a greater queuing delay, and

thus, excess ACTT. In contrast, in the static policy with K=1, the first node sends

all of its excess load to the second node at every LB instant. However, each batch

of transferred load undergoes a large delay, resulting in an increase in ACTT.

In case of Experiment-2, the average rate of arrival at the second node is 2.44

tasks per second, which is smaller than the processing speed of the second node. As

a result, the static LB with K = 1 gives a reduced ACTT compared to K = 0.1,

meaning that the increase in ACTT due to queuing delay at the second node for

K = 0.1 is greater than the increase in ACTT caused by the transfer delay when

81


Table 5.1: Experimental results for dynamic and static LB policies

Average Completion Time System Processing Rate

Experiment per Task (s) (task s−1)

Number DLB K=0.1 K=1 DLB K=0.1 K=1

1 22.55 73.87 49.76 1.75 1.69 1.11

2 8.61 15.82 11.67 3.3 3.06 2.92

3 9 10.56 10.77 3.29 3.73 2.84

K = 1. However, the DLB outperforms the static case of K = 1 due to excessive

delay in load transfer associated to this static LB case. For Experiment-3, the ACTTs

are evidently similar under both K = 0.1 and K = 1 static LB policies. This is

because ACTT is dominated by queuing delay in the K = 0.1 (at the slower first

node) case while it is dominated by transfer delay in the K = 1 case. On the other

hand, the DLB policy effectively uses the system resources, viz., the nodes and the

channel, to avoid excessive queuing delay as well as the transfer delay.

We now look at the effect of LB policies on the system processing rate (SPR),

which is calculated as the total number of tasks executed by the system in a certain

time-window divided by the active time of the system. The active time of the system

within a time-window is defined as the aggregate of all times for which there is at

least one task in the system that is either being processed or being transferred. The

SPR achieved under different LB policies are listed in Table 5.1. It is interesting to

note that in the case of Experiment-1, better SPR is achieved with K = 0.1 than with

K = 1 despite the fact that the latter performs better in terms of ACTT. To explain

this behavior, we first need to look at one extreme case when no LB is performed. In

this case, the SPR is always equal to λd1 independently of the size of time–window.

However, as we increase the time–window, the ACTT diverges to infinity since the

average rate of arrival is bigger than the average processing rate of the first node.

The performance for the case of a weak LB action with K = 0.1 is found to be

82


similar to the extreme case of no LB. In the second case when LB is performed with

K = 1, the active time of the system gets dominated by times when there are tasks

in transfer while both nodes are idle. Consequently, the number of tasks processed

by the system is less while the active time of the system may increase, resulting in a

reduced SPR. However, the LB action taken by the first node reduces the effective

arrival rate at the first node below its processing rate. As a result, the ACTT of the

system is bounded.

In the case of DLB policy, LB gains are chosen small enough to avoid large

transfer delays but large enough to lower the effective arrival rate at the first node.

Therefore, for Experiment-1, the DLB policy achieves the maximum SPR and the

minimum ACTT. The fact that nodes have large idle times while there are tasks in

transfer for the case of K=1 is depicted in Fig. 5.2. Observe that when there is an

arrival of 70 tasks at the first node around 2250s, 55 tasks are transferred to the

second node. On the other hand, the second node has an empty queue at the arrival

instant of the first node, and due to the transfer delay, it must wait another 50s to

receive the tasks. Further, the first node finishes the remaining 15 tasks and becomes

idle by the time the second node gets the transferred load. This behavior is repeated

at all arrival instants, which are marked by arrows in Fig. 5.2 (left). In contrast,

from Fig. 5.2 (right), it can be seen that, the transfer delay mostly overlaps with the

working times of the sender node, which results in smaller idle times on both nodes.

Similar results are observed for Experiment-2.

In the case of Experiment-3, the first node and the second node receive external

loads at a rate of 0.8 and 2.2 tasks per second, respectively. This means that even

if no LB is performed, both nodes process their own tasks without being idle for a

long time. Therefore, the SPR is expected to be close to the sum of the processing

rates of the nodes. However, when LB is performed, nodes may become idle due

to the transfer delay, resulting in smaller SPR. This is evident from our results of

83


Experiment-3 where the static LB policy with K = 0.1 achieves maximum SPR.

On the other hand, the DLB policy transfers the right amount of tasks at every LB

instant, so that the transfer delays plus the queuing delays at the receiving node are

smaller than the queuing delays for those tasks at the sender node. This reduces the

ACTT but may or may not increase SPR depending on the resulting active time.

2260 2280 2300 2320 2340 2360 2380 2400 2420 24400

10

20

30

40

50

60

70

t, s

Que

ue S

ize

Node 1Node 2External Tasks

2260 2280 2300 2320 2340 2360 2380 2400 2420 24400

20

40

60

80

100

120

t, s

Que

ue S

ize

Node 1Node 2External Tasks

Figure 5.2: One realization of the queues under a static LB policy using a fixed gain

K = 1 (left) and DLB policy (right).

5.2.1 Comparison to other DLB policies

Next, we will compare the performance of our DLB policy to versions of two existing

LB policies (discussed in Chapter 2.2) for heterogeneous and dynamic computing,

namely, the shortest-expected-delay (SED) policy and the never-queue (NQ) policy,

which we have adapted to our distributed-computing setting. Suppose that external

arrival of x tasks occurs at the ith node at time t. Let mj(i)(t) be the queue lengths

of the jth node as per the knowledge of the ith node at time t. Let lj(i)(t) be the

ACTT for the batch of x external tasks if all the external tasks join the queue of the

jth node. The average completion time per task (per batch of x arriving tasks) can

84


now be expressed as

lj(i)(t) =1

x

x∑r=1

(mj(i)(t) + r

λdj

+ θ(k)ji x

)=

mj(i)(t)

λdj

+x + 1

2λdj

+ θ(k)ji x, (5.3)

where θ(k)ji is the k-th update of average transfer delay per task sent from the ith

node to the jth node (with θ(k)ii = 0). In the SED policy, the batch of x tasks is

assigned to the node that achieves the minimum ACTT. Therefore, the receiver node

is identified as argminj

(lj(i)(t)

). On the other hand, in the NQ policy, all external

loads are assigned to a node that has an empty queue. If more than one node have

an empty queue, the SED policy is invoked among the nodes with the empty queues

to choose a receiver node. Similarly, if none of the queues is empty, the SED policy

is invoked again to choose the receiver node among all the nodes.

We implemented the SED and the NQ policies to perform the distributed com-

puting experiments on our test-bed. The experiments were conducted between two

nodes connected over the Internet (keeping the same processing speeds per task).

We performed three types of experiments for each policy: (i) Node 1 receiving on

average 20 tasks at each arrival and the average inter-arrival time was set to 12s

while no external tasks were generated at node 2; (ii) Node 2 receiving on average

25 tasks at each arrival and the average inter-arrival time was set to be 8s; and (iii)

Node 1 and node 2 independently receiving on average 10 and 15 external tasks at

each arrival and the average inter-arrival times were 8s and 7s, respectively. Each

experiment was conducted for a 2-hour period. The results are shown in Table 5.2

suggesting that the ACTT achieved from the DLB policy is approximately half the

ACTT achieved from either the SED or NQ policies.

85


Table 5.2: Experimental results of the ACTT for DLB policies

Experiment Average Completion Time per Task (s)

Number Proposed DLB SED NQ

i 7.61 15.19 15.55

ii 6.09 13.68 13.77

iii 4.71 6.92 7.44

5.3 Sub-optimal LB policy for an n-node system

The optimal LB gains for a multi-node system can be obtained by explicitly solving

(5.1) (if the system is without node-failures) or by explicitly solving equations that

are structurally similar to the ones given by Theorems 3 and 4 (if the system is with

permanent and recoverable node-failures). However, the complexity of any such

equation grows exponentially with the increase in number of nodes and the problem

soon becomes intractable. Specially, when the delays imposed by the channel differ

according to paths between nodes, the LB gains Kij, for all i, can no longer be

parameterized by one value K. In such cases, it is not computationally efficient

to perform the online optimization required by the DLB policy. Therefore, in this

section, we undertake a sub-optimal yet effective approach to calculate different LB

gains that can be computed efficiently by invoking a two-node solution every time a

load is exchanged between a pair of nodes.

Suppose that all the nodes are initially informed of the states of the other nodes

and the optimal one-shot LB instant is at tb = 0. Therefore, according to (3.5) given

in Chapter 3, the adjusted load to be transferred from the jth node to the ith node is

Lij(0) = bKijpijLexj (0)c. Now, we present our approach to calculate the sub-optimal

LB gains of the jth node.

At first, we arbitrarily choose a recipient node, say the ith node, that belongs to

86


Uj. For the remaining recipient nodes we assume that the jth node sends the full

partitions of its excess load; that is, we set Kkj = 1 for all k ∈ Uj \ {i}, while the

objective is to calculate Kij. Now, we can think of a two-node system comprising

the ith and the jth nodes, where upon the execution of LB at tb = 0, the ith and the

jth nodes have loads Qi(0) and Qj(0) − ∑k∈Uj

bKkjpkjLexj (0)c, respectively, while

bKijpijLexj (0)c tasks are in transit from the jth node to the ith node. Next, the

theorems in Chapter 3 is invoked to compute the suboptimal Kij for a two-node

system.

In the next step, we arbitrarily choose another recipient node, say the lth node,

that belongs to Uj \ {i} and assume that Kkj = 1 for all k ∈ Uj \ {i, l}. We also

utilize the sub-optimal Kij obtained from the previous step. We then calculate the

sub-optimal Klj by solving a two-node system where upon the execution of LB at

tb = 0, the lth and the jth nodes have loads Ql(0) and Qj(0)−∑k∈Uj

bKkjpkjLexj (0)c,

respectively, while bKljpljLexj (0)c tasks are in transit from the jth node to the lth

node. These steps are repeated until we calculate sub-optimal Kij for all i ∈ Uj.

Example: We will now consider a 5-node system for which λd1 , λd2 , λd3 , λd4 , and

λd5 (in units of task per second) are 0.25, 0.5, 0.75, 1 and 1.25, respectively, while

λ−1f1

, λ−1f2

, λ−1f3

, λ−1f4

and λ−1f5

are 250 s, 200 s, 150 s, 100 s and 10 s, respectively. The

load-information arrival times and the failure-notice arrival times between nodes are

all equal to 0.12 s, while the slope of mean transfer delay per task is θ = 0.4 s per

task.. The probability of success corresponding to different initial loads is listed in

Table 5.3. When the initial load is (m1,m2,m3,m4,m5) = (125, 100, 75, 50, 25), only

the 4th and the 5-th nodes are the recipient nodes (belonging to U). The sub-optimal

LB gains were computed to be K41 = 0.95, K42 = 0.9, K51 = 0.73 and K52 = 0.96,

which together give a sub-optimal success probability of 0.39. In Table 5.3, we also

show the probability of success obtained under (i) Full-LB policy, where Kij = 1 for

all i ∈ Uj, and (ii) Null-LB policy, where Kij = 0 for all i ∈ Uj. Clearly, for all three

87


initial load distributions, the proposed suboptimal LB policy outperforms both the

Full-LB and the Null-LB policies.

Table 5.3: Probability of success achieved under different policies

Initial load Probability of success

(m1,m2,m3,m4,m5) Sub-optimal LB Full-LB Null-LB

(125,100,75,50,25) 0.39 0.33 0.2

(500,0,0,0,0) 0.15 0.1 0.015

(100,100,100,100,100) 0.25 0.22 0.16

5.4 Conclusion

The optimal one-shot load-balancing approach has been adapted to develop a distrib-

uted and dynamic load-balancing policy, in which, at every external load arrival, the

receiver node executes load balancing autonomously. Further, the optimal gains are

calculated on-the-fly, based on the system parameters that are adaptively updated.

Thus, the dynamic-load-balancing policy can adapt to the changing traffic conditions

in the channel as well as the change in task processing rates induced from the type

of applications. We have shown experimentally that the proposed dynamic-load-

balancing policy minimizes the average completion time per task, while improving

the system processing rate. The interplay between the queuing delays and the trans-

fer delays as well as their effects on the average completion time per task and system

processing rate were investigated. In particular, the average completion time per

task achieved under the proposed dynamic-load-balancing policy is significantly less

than those achieved by the commonly used SED and NQ policies. This is attribut-

able to the fact that the dynamic-load-balancing policy achieves a higher success, in

comparison to SED and NQ policies, in reducing the likelihood of nodes being idle

88


while there are tasks in the system, comprising tasks in the queues as well as those

in transit. Finally, the tow-node model has been utilized to formulate a sub-optimal

yet effective and computationally efficient LB policy for a multi-node system.

89

Chapter 6

Application to Wireless Sensor

Networks

A multidimensional queuing framework for distributed systems have been utilized to

analyze the performance of distributed sensor networks, routing in wireless networks,

telecommunications, and other resource-allocation problems in computer science and

operational research [46–49]. Therefore, the theoretical approach given in Chapter 3

can be useful in solving complex queuing problems that arise in such dynamical

systems. For illustration, we apply the theory to develop an optimal LB policy for

energy-limited distributed sensor networks.

6.1 Description of wireless sensor networks

Wireless sensor networks typically consist of small battery powered processors de-

ployed over a region. These sensors communicate with each other over radio links.

In some situations, few sensors might be overloaded by collecting data at a high

rate while others remain idle. This could lead to a situation where the network

90

Chapter 6. Application to Wireless Sensor Networks

looses some sensing coverage as the overloaded sensors become rapidly depleted of

battery power. In addition, due to computational limitations of a sensor, it may not

be possible for the sensor to process its data in a timely fashion. Thus, by allow-

ing the sensors to process the raw data cooperatively we may not only extend the

lifetime of some batteries but also enhance the computing efficiency of the sensor

network. However, transferring data between sensors in turn requires energy, and

these transfers will also be accompanied by random delays. There is therefore a fun-

damental tradeoff between savings in queuing time per task, resulting from utilizing

the processing power of distributed sensors cooperatively, and the combined delay

and energy overhead resulting from the very collaborative nature of the sensors [20].

6.2 Queuing equation

Consider the queuing model of Chapter 3 and suppose that the LB is performed at

time t = 0. Let Tr1,...,rn(C) denote the total service time for all tasks in the system if

the network state at t = 0 (just after LB is performed) is as specified by C, while rk

tasks (k = 1, . . . , n) remain at the kth sensor at t = 0. Now, based on Assumptions

A1 and A2 and Convention C1 that are given in Chapter 3, we can prove that for

n ∈ IN, rk ∈ ZZ+, gk ≥ 0 and cki > 0, the expected value of Tr1,...,rn(C) satisfies the

following relation:

E[Tr1,...,rn

([g1 c11 . . . c1g1 ], . . . , [gn cn1 . . . cng2 ]

)]

=1

λ+

n∑

k=1

λdk

λE

[Tr1−δ1,k,...,rn−δn,k

([g1 c11 . . . c1g1 ], . . . , [gn cn1 . . . cng2 ]

)]

+n∑

k=1

gk∑i=1

λ̃ki

λE

[Tr1+δ1,kcki,...,rn+δn,kcki

([g1 c11 . . . c1g1 ], . . . , [gk − 1 ck1 . . .

ck(i−1) ck(i+1) . . . ckgk], . . . , [gn cn1 . . . cng2 ]

)], (6.1)

91


where λ =∑n

k=1(λdk+

∑gk

i=1 λ̃ki), δj,k is the Kronecker delta and rk−1 is set to 0 when

rk = 0. Note that E[T0,0,...,0([0], [0], . . . , [0])] = 0 since T I,F0,0,...,0([0], [0], . . . , [0]) = 0 a.

s., since there are no task to be processed in the system. We omit the proof of (6.1)

as it is similar to the proofs in Chapter 3.

6.3 LB policies for two cooperating sensors

Consider a two-node sensor network. At time t = 0, the first sensor (overloaded)

transfers L21 number of tasks to the second sensor (idle) using the following policy:

L21 = bKm1c, (6.2)

where K ∈ [0, 1] is the gain of the LB policy and bxc is the greatest integer less

than or equal to x. Suppose that for any particular value of K, the average energy

consumed for processing, transferring and receiving tasks by the jth sensor is εj. The

minimum–service-time (MST) policy is the LB policy that minimizes the expected

value of the total service time. More precisely, the MST policy is defined by the

optimal K that minimizes E[T I,F

r1,r2([0], [1 L21])

]for r1 = m1 −L21 and r2 = m2. The

fair-energy (FE) policy is the LB policy that ensures equal energy consumption for

each sensor. More precisely, the FE-policy is defined by the fair K that achieves

εfair4= ε1 = ε2.

Example 1: Suppose that at time t = 0, the first node has sensed data equivalent

to 100 tasks and the second node is idle, i.e., m1 = 100 and m2 = 0. (A task is defined

as the amount of data required by a preprocessing algorithm in order to compute

one value of a desired quantity and processing of one task is one execution of the

preprocessing algorithm.) Suppose that the service rates (in task per second) are

λd1 = 1 and λd2 = 0.5. Further, let the channel delay be θ = 0.2 s per task.

Moreover, by adopting the radio-energy model for an actual sensor [50], we set the

92


energy dissipation rate for each sensor to be 1 mJ per task for transmission, and 0.5

mJ per task for reception. Finally, the energy dissipation rates for processing tasks

at the first and second sensors are 4 mJ per task and 2 mJ per task, respectively.

In Fig. 6.1 (left), we see that the expected value of the total service time, cal-

culated according to (6.1), attains its minimum value of 74 s for K = 0.28. If we

assume an infinite transfer rate, the fair amount of tasks to be transferred to the sec-

ond node would only depend upon the processing rates of the sensors, and is given byλd2

λd1+λd2

×100, which is approximately equal to 33 tasks. Instead, with a transfer-rate

of 5 tasks per second, the MST policy (corresponding to K =0.28) transfers only 28

tasks to the second node in order to avoid excess delay in the channel. However, it

should also be noted that for any particular value of K, and according to our assump-

tions on energy consumption, ε1 =L21 +4r1 and ε2 =0.5L21+2(r2+L21). Next, we can

observe from Fig. 6.1 (right) that at K = 0.28, the energy consumption per sensor

becomes unfair as the first sensor consumes about 4.5 times the energy consumed

by the second sensor. However, the FE policy, corresponding to K =0.73, results in

εfair = 182 mJ of energy consumption at each sensor; nonetheless, at K = 0.73, the

expected value of the total service time is about 161 s.

In order to jointly investigate the effect of K on the interplay between the total

service time and the energy consumption of each sensor, we define the following

normalized quantities that, respectively, measure the policy’s deviation from the

points of minimum service time and fair-energy consumption:

σT21 =E[T I,F

r1,r2(C)

]− Tmin

Tmax

, σε21 =

√∑l=1,2(εl − εfair)2

2εmax

,

where Tmin4= inf

{E[T I,F

r1,r2(C)

], K ∈ [0, 1]

}, Tmax

4= sup

{E[T I,F

r1,r2(C)

], K ∈ [0, 1]

}

and εmax = maxK

(max(ε1, ε2)

). In order to achieve a fair tradeoff between the devi-

ation in service time and deviation in energy consumption per sensor, we introduce

a scheduling policy called the fair-tradeoff (FT) policy that is given by K yield-

93


ing σT21 = σε21 . In the case of Example 1, the FT policy is given by K = 0.5 for

which ε1 = 250 mJ, ε2 = 125 mJ and the expected value of the total service time is

approximately 110 s.

0 0.2 0.4 0.6 0.8 160

80

100

120

140

160

180

200

220

K

E[

T r1, r

2([0]

,[1

L21

])],

s

0 0.2 0.4 0.6 0.8 10

50

100

150

200

250

300

350

400

KA

vera

ge E

nerg

y C

onsu

mpt

ion,

mJ

ε1

ε2

ε1+ ε

2

Figure 6.1: Expected value of the total service time (left) and the battery-energy

consumed by the sensors (right) under different LB gains.

6.3.1 Extension to n cooperating sensors

In this section, the sub-optimal one-shot LB policy detailed in Chapter 5.3 is utilized

to calculate the LB gain Kij that dictates the load-transfer from the jth node to the

ith node. At each step, the sub-optimal LB gain can be selected based on either the

MST, the FE or the FT policies as per the requirement.

Example 2: Consider a five-node sensor network for which λd1 , λd2 , λd3 , λd4 ,

and λd5 (in units of tasks per second) are 0.25, 0.5, 0.75, 1 and 1.25, respectively,

while the energy dissipation rates for processing (in units of mJ per task) at the

first to fifth sensors are 0.5, 1, 1.5, 2 and 2.5, respectively. The load-transfer rate as

well as the energy dissipation rates for transmission and reception are set as given

in Example 1. The initial task-distribution is: m1 = 500, m2 = . . . = m5 = 0. The

94


average service time (AST) and the average energy consumption (AEC) per sensor

under the MST, FE and FT policies are listed in Table 6.1. The MST policy gives a

small AST compared to FE and FT policies, but the aggregate AEC of five sensors

is much larger. Similarly, the AST under the FE policy is much larger compared

to the other two. In summary, the FT policy offers a well-balanced performance in

terms of reducing the AST and the AEC together.

Table 6.1: The AST and the AEC under the MST, FE and FT policies.

MST policy FE policy FT policy

AST (s) AEC (mJ) AST (s) AEC (mJ) AST (s) AEC (mJ)

ε1 = 481 ε1 = 363 ε1 = 423

ε2 = 99 ε2 = 75 ε2 = 86

188.5 ε3 = 200 1091.6 ε3 = 100 616.0 ε3 = 154

ε4 = 330 ε4 = 145 ε4 = 243

ε5 = 459 ε5 = 207 ε5 = 345

6.4 Conclusions

We have applied the theory to design novel LB policies for a distributed sensor

network application. The interplay between the expected value of the total service

time and the energy consumption of each sensor is highlighted. To this end, we

have considered three different scheduling policies: (1) the minimum-service-time

policy that minimizes the expected value of the total service time, (2) the fair-

energy policy that ensures a fair consumption of energy by each sensor, and (3)

the fair-tradeoff policy that jointly considers the service time and the deviation in

energy consumption per sensor. Our preliminary results for a two-node and five-

node sensor networks indicates that the fair-tradeoff policy achieves a well-balanced

95


performance as compared to the minimum–service-time and fair-energy policies in

terms of reducing the average service time and the average energy consumption.

96

Chapter 7

Future work

The regeneration theory based multidimensional queuing model presented in this

dissertation can be adapted to design resilient communication networks and wireless

sensor networks. In this chapter, we present an overview of future research work in

these areas.

7.1 Resilient distributed networks

Random occurrences of external attacks on the communication links and/or nodes of

distributed networks have to be detected dynamically. Due to the strong dependence

of the load completion times on the network functionality and connectedness, mea-

suring the load completion times enables us to make dynamic decisions on whether or

not the network is under external attack at any given time. The idea is to periodically

submit loads to the network and form a sequence of random variables correspond-

ing to load completion times for each submission. Such a sequence implicitly bares

information about the network’s state. Then, a composite binary hypothesis-testing

problem for “network-condition” can be formed as follows: Under the null hypoth-

97

Chapter 7. Future work

esis the node/link failure and recovery rates belong to the “non-attack” mode and

the nodes and links are all functional. On the other hand, under the alternative

hypothesis, the aforementioned rates belong to the “attack” mode and there is at

least one failed node or link in the network. The data (or test statistic) for this

decision problem is the sequence of load completion times. As loads are submit-

ted by the network evaluator periodically and their completion times observed, the

network-condition is announced through the Neyman-Pearson decision rule, whose

performance (detection probability and false-alarm rate) can be fully characterized

based on the pdf of load completion times, which can be calculated as described in

Chapter 3.

7.2 Wireless sensor networks

A wireless sensor network consists of a large number of sensor nodes distributed over

a certain sensing region. Further, the network is partitioned into clusters, where all

the sensors belonging to a certain cluster transmit sensed data to a common sink (or

fusion) node. Design of co-operative energy-efficient data transfer strategy between

sensors within a cluster is one of the challenging problems in this area. For example,

once a sensor collects a sum of data, it can transmit those data to the corresponding

sink node either by directly establishing communication with the sink node or by

means of a multi-hop cooperative strategy using other sensor nodes as relays. Next,

we outline an energy-efficient cooperative data transfer strategy, where each sensor

must use, more-or-less, the same amount of energy to uplink the data to the sink

layer.

Each sensor can be assigned an integer quantity, called maximum-packets-number

(MPN), which represents the maximum number of data packets that can be trans-

mitted from the sensor to the sink node before the sensor runs out of energy. Clearly,

98

Chapter 7. Future work

MPN depends on time, and one way to calculate it is by normalizing the energy re-

serve of a sensor at any given time by the energy required to transmit one packet of

data from the sensor to its corresponding sink node. Now, we can think of MPN as

analogous to the number of tasks awaiting service at a node in a distributed comput-

ing system. Therefore, the theory from Chapter 3 of this dissertation can be utilized

to calculate corresponding MPN-partitions for each sensor. For example, the cur-

rent MPN of the ith sensor is compared to the current average MPN of the cluster

at that time to calculate the MPN-partition of the ith sensor. Next, when the jth

sensor collects a certain amount of data, it transmits the ith MPN-partition of its

total data packets to the sink node using the ith node as a relay. Finally, the MPNs

can be thought of as multi-dimensional queues and analyzed under a regeneration

theory framework. The idea is to compute the quantities, analogous to the LB gains,

that will readjust the initial MPN-partitions in order to obtain optimal data transfer

strategy, resulting in a uniform energy usage by all sensors.

99

Appendices

A Optimality of partitions in the ideal case

B Proof of Equation (3.24)

C Proof of conditional independence of random delays W′21 and Y

′1

D Special property of minimum of exponential random variables

100

Appendix A

Optimality of partitions in the

ideal case

By ideal case we mean that there are no delays, the queues are deterministic, and the

tasks are arbitrarily divisible. This effectively means that each node in the system has

the exact queue size of other nodes. Consequently, it follows that mi(j)(t) = Qi(t),

Ij = I and pij ≡ pi, independently of j. Assume further that LB actions are executed

together at time t at all the nodes that do not belong to I. Let Qfinali (t) be the total

load at node i ∈ I after the execution of LB. Then,

Qfinali (t) = Qi(t) + pi

∑j∈Ic

Lexj (t) = Qi(t) +

Lexi (t)∑

j∈I Lexj (t)

∑j∈Ic

Lexj (t). (A.1)

Since∑n

j=1 Lexj (t) = 0, we have

∑j∈I Lex

j (t) = −∑j∈Ic Lex

j (t). Therefore,

Qfinali (t) = Qi(t)− Lex

i (t) = λdi

∑nl=1 Ql(t)∑n

l=1 λdl

. (A.2)

Clearly, the overall completion time isPn

l=1 Ql(t)Pnl=1 λdl

for all the nodes.

101

Appendix B

Proof of Equation (3.24)

Let us look at the conditional distribution of W′21:

P{

W′21 ≤ t|τ = s, τ = W11

}= P

{W21 − τ ≤ t|τ = s, τ = W11

}

= P{

W21 ≤ t + s|τ = s, τ = W11

}. (B.1)

Note that {τ = s, τ = W11} = {W11 = s,W21 > s, Y1 > s, Y2 > s, X12 > s, X21 > s}.Therefore, (B.1) becomes

P{

W′21 ≤ t|τ = s, τ = W11

}

= P{

W21 ≤ t + s|W11 = s,W21 > s, Y1 > s, Y2 > s, X12 > s,X21 > s}

Exploiting mutual independence as per the Assumption A2, we obtain

P{

W′21 ≤ t|τ = s, τ = W11

}= P

{W21 ≤ t + s|W21 > s

}=

(1− e−λd2

t)u(t).2

102

Appendix C

Proof of conditional independence

of random delays W′21 and Y

′1

P{

W′21 ≤ t1, Y

′1 ≤ t2|τ = s, τ = W11

}

= P{

W21 − τ ≤ t1, Y1 − τ ≤ t2|τ = s, τ = W11

}

= P{

W21 ≤ t1 + s, Y1 ≤ t2 + s|τ = s, τ = W11

}

= P{

W21 ≤ t1 + s, Y1 ≤ t2 + s|W11 = s, W21 > s, Y1 > s, Y2 > s, X12 > s, X21 > s}

= P{

W21 ≤ t1 + s, Y1 ≤ t2 + s|W21 > s, Y1 > s}

=P{

W21 ≤ t1 + s, Y1 ≤ t2 + s,W21 > s, Y1 > s}

P{

W21 > s, Y1 > s}

=P{

s < W21 ≤ t1 + s}

P{

W21 > s} .

P{

s < Y1 ≤ t2 + s}

P{

Y1 > s} ,

103

Appendix C. Proof of conditional independence of random delays W′21 and Y

′1

where the last equality follows from the use of mutual independence of W21 and Y1

according to Assumption A2. Therefore, we get:

P{

W′21 ≤ t1, Y

′1 ≤ t2|τ = s, τ = W11

}= P

{W

′21 ≤ t1|W21 > s

}.P

{Y′1 ≤ t2|Y1 > s

},

which concludes the proof after noting that P{

W′21 ≤ t1|τ = s, τ = W11

}= P

{W

′21 ≤

t1|W21 > s}

and P{

Y′1 ≤ t2|τ = s, τ = W11

}= P

{Y′1 ≤ t2|Y1 > s

}. 2

104

Appendix D

Special property of minimum of

exponential random variables

Property: P{

τ = W11|τ = s}

=λd1

λ.

Proof: As per Convention C1, XFkj = ∞ when F = (11, 11), and Zki = ∞ a.s.

when C = ([0], [0]). Therefore, τ = min(W11,W21, Y1, Y2, X12, X21). Next, for any

s ≥ 0, we can write

P{τ = W11|τ ≤ s} =P{τ = W11, τ ≤ s}

P{τ ≤ s} (D.1)

Let β := min(W21, Y1, Y2, X12, X21). In accordance to A1 and A2, it is straightforward

to show that β is an exponential random variable with rate θt = λd2 + λf1 + λf2 +

λ12 + λ21, while W11 and β are independent. Observe that {τ = W11} ∩ {τ ≤ s} =

105

Appendix D. Special property of minimum of exponential random variables

{W11 ≤ β} ∩ {W11 ≤ s} = {W11 ≤ min(β, s)}.

P{W11 ≤ min(β, s)}=

∫ ∞

0

P{W11 ≤ min(β, s)|β = b}fβ(b)db

=

∫ ∞

0

P{W11 ≤ min(b, s)|β = b}fβ(b)db

=

∫ s

0

P{W11 ≤ b|β = b}fβ(b)db +

∫ ∞

s

P{W11 ≤ s|β = b}fβ(b)db (D.2)

By independence of W11 and β, P{W11 ≤ b|β = b} = P{W11 ≤ b} and P{W11 ≤s|β = b} = P{W11 ≤ s}. Therefore, (D.2) becomes:

P{W11 ≤ min(β, s)} =

∫ s

0

(1− e−λd1b)θte

−θtbdb +

∫ ∞

s

(1− e−λd1s)θte

−θtbdb

=λd1

λd1 + θt

[1− e−(λd1

+θt)s]

(D.3)

With P{τ ≤ s} = 1− e−(λd1+θt)s, and using (D.1) and (D.3), we get

P{τ = W11|τ ≤ s} =λd1

λ,

where λ = λd1 + λd2 + λf1 + λf2 + λ12 + λ21. But, it can also be shown that

P{τ = W11} =λd1

λ.

Therefore, the probability that the minimum is attained by W11 is independent of

the value of τ , which leads to the conclusion that P{

τ = W11|τ = s}

=λd1

λ. 2

106

References

[1] H. M. Lee, S. H. Chin, J. H. Lee, D. W. Lee, K. S. Chung, S. Y. Jung and H. C.

Yu, “A resource manager for optimal resource selection and fault tolerance service in

grids”, Proc. 4th IEEE International Symposium on Cluster Computing and the Grid,

Chicago, Illinois, USA 2004.

[2] A. Brandt and M. Brandt, “On a two-queue priority system with impatience and

its application to a call center”, Methodology and Computing in Applied Probability,

1:191-210, 1999.

[3] www.planetlab.org

[4] Z. Lan, V. E. Taylor, and G. Bryan, “Dynamic load balancing for adaptive mesh

refinement application”, Proc. ICPP’2001, Valencia, Spain, 2001.

[5] T. L. Casavant and J. G. Kuhl, “A taxonomy of scheduling in general-purpose distrib-

uted computing systems”, IEEE Trans. Software Eng., vol. 14, pp. 141–154, Feb. 1988.

[6] G. Cybenko, “Dynamic load balancing for distributed memory multiprocessors”,

IEEE Trans. Parallel and Distributed Computing, vol. 7, pp. 279–301, Oct. 1989.

[7] Chi-Chung Hui and Samuel T. Chanson, “Hydrodynamic load balancing”, IEEE

Trans. Parallel and Distributed Systems, vol. 10, No. 11, pp. 1118–1137, Nov. 1999.

[8] B.W. Kernighan and S. Lin, “An efficient heuristic procedure for partitioning graphs”,

The Bell System Technical Journal, Vol. 49, pp. 291–307, Feb 1970.

107

References

[9] S. Dhakal, “Load balancing in delay-limited distributed systems”, Masters of Science

Thesis, Electrical and Computer Engineering Department, University of New Mexico,

Dec 2003.

[10] M. M. Hayat, S. Dhakal, C. T. Abdallah, J. Chiasson, and J. D. Birdwell, “Dynamic

time delay models for load balancing. Part II: Stochastic analysis of the effect of delay

uncertainty”, Advances in Time Delay Systems, Springer Series on Lecture Notes

in Computational Science and Engineering, (Keqin Gu and Silviu-Iulian Niculescu,

Editors), vol. 38, pp. 371–385, Springer: Berlin, 2004.

[11] S. Dhakal, B. S. Paskaleva, M. M. Hayat, E. Schamiloglu, and C. T. Abdallah,

“Dynamical discrete-time load balancing in distributed systems in the presence of

time delays”, Proc. 42 nd IEEE Conference on Decision and Controls, Maui, Hawaii,

pp. 5128–5134, Dec 2003.

[12] J. Ghanem, C. T. Abdallah, M. M. Hayat, S. Dhakal, J.D Birdwell, J. Chiasson, and

Z. Tang, “Implementation of the load balancing algorithms over a local area network

and the Internet”, Proc. 43 rd IEEE Conference on Decision and Control, Bahamas

2004.

[13] http://setiathome.ss.berkeley.edu/

[14] M. Litzkow, M. Livny and M. Mutka, “Condor - A hunter of idle Workstations”, Proc.

8th International Conference of Distributed Computing Systems, pp. 104–111, June

1988.

[15] E. Gelenbe, D. Finkel, and S. K. Tripathi, “On the availability of a distributed com-

puter system with failing components”, ACM SIGMETRICS Performance Evaluation

Review, vol. 13, Issue 2, pp. 6–13, 1985.

[16] R. Sheahan, L. Lipsky, and P. Fiorini, “The Effect of Different Failure Recovery

Procedures on the Distribution of Task Completion Times”, Proc. IEEE DPDNS05,

Denver CO, April 2005.

108

References

[17] S. Dhakal, M.M. Hayat, and J.E. Pezoa, “Reliability in distributed queuing systems

in the presence of random delays”, IEEE Trans. Inf. Theory, under review, 2006.

[18] S. Dhakal, M. M. Hayat, J. E. Pezoa, C. Yang, and D. A. Bader, “Dynamic load

balancing in distributed systems in the presence of delays: A regeneration-theory

approach”, IEEE Trans. Parallel and Distributed Systems, to appear, 2006.

[19] S. Dhakal, M. M. Hayat, J. Ghanem, C. T. Abdallah, H. Jerez, J. Chiasson, and J.

D. Birdwell, “On the optimization of load balancing in distributed networks in the

presence of delay”, Advances in Communication Control Networks, Springer series

Lecture Notes in Control an Information Sciences, (S. Tarbouriech, C. T. Abdallah,

and J. Chiasson, Editors) LNCSE vol. 308, pp. 223–244, Springer-Verlag, 2004.

[20] S. Dhakal, J.E. Pezoa, and M.M. Hayat, “A regeneration-based approach for resource

allocation in cooperative distributed systems”, Submitted IEEE 32nd International

Conference on Acoustics, Speech, and Signal Processing (ICASSP 2007) , Hawaii,

USA.

[21] S. Dhakal, M. M. Hayat, J. E. Pezoa, C. T. Abdallah, J. D. Birdwell, and J. Chiasson,

“Load Balancing in the presence of random node failure and recovery”, Proc. IEEE

International Parallel and Distributed Processing Symposium (IPDPS ’06), Rhodes,

Greece, April 2006.

[22] S. Dhakal, M. M. Hayat, J. Ghanem, and C. T. Abdallah, “Load Balancing in Distrib-

uted Computing Over Wireless LAN: Effects of Network Delay”, Proc. IEEE Wireless

Communication & Networking Conference (WCNC-2005), New Orleans, LA, vol. 3,

pp. 1755–1760, March 13–17, 2005.

[23] J. Ghanem, S. Dhakal, C. T. Abdallah, M. M. Hayat, and H. Jerez “Load balanc-

ing in distributed systems with large time delays: Theory and experiment”, Proc.

IEEE/CSS 12th Mediterranean Conference on Control and Automation (MED ’04),

Aydin, Turkey, June 2004.

109

References

[24] Michel Trehel, Chantal Balayer, Abdelghani Alloui, “Modeling load balancing in-

side groups using queuing theory”, 10th International Conference on Parallel and

Distributed Computing System, New Orleans, Louisiana, Oct1 to Oct. 3, 1997.

http://lifc.univ-fcomte.fr/ trehel/PDCS97.ps

[25] A.Cortes, A. Ripoll, M.A. Senar and E. Luque, “Performance comparison of dynamic

load-balancing strategies for distributed computing”, Proc. IEEE 32nd Hawaii Con-

ference on System Sciences, vol.8, p. 8041, 1999.

[26] J. M. Bahi, C. Vivier, R. Couturier, “Dynamic load balancing and efficient load esti-

mators for asynchronous iterative algorithms”, IEEE Trans. Parallel and Distributed

Systems, Vol. 16, No. 4, Apr. 2005.

[27] D.L. Eager, E.D. Lazowska, and J. Zahorjan, “Adaptive load sharing in homogeneous

distributed systems”, IEEE Trans. Software Engineering, vol.12, pp. 662–675, no.5,

May 1986

[28] J. Liu and V.A. Saletore, “Self-scheduling on distributed-memory machines”, ACM

Int’l Conf. in Supercomputing,, pp. 814–823, Nov. 1993.

[29] K.M. Dragon and J.L. Gustafson, “A low-cost hypercube load balance algorithm”,

Proc. Fourth Conf. Hypercube Concurrent Computers and Applications, pp. 583–590,

1989.

[30] T.H. Tzen and L.M. Ni, “Dynamic loop scheduling for shared memory multiproces-

sors”, Int’l Conf. Parallel Processing, vol. 2, pp. 247–250, 1991.

[31] S. Zhou, “A trace-driven simulation study of dynamic load balancing”, IEEE Trans.

Software Eng., vol. 14, no. 9, pp. 1,327–1,341, Sept. 1988.

[32] H.C. Lin and C.S. Raghavendra, “A dynamic load-balancing policy with a central job

dispatcher (LBC)”, IEEE Trans. Software Eng., vol.18, no.2, pp. 148–158, Feb.1992.

[33] H.G. Rotithor and S.S. Pyo, “Decentralized decision making in adaptive task sharing”,

Proc. IEEE International Parallel and Distributed Processing Symposium, Dec. 1990.

110

References

[34] P. Krueger and M. Livny, “The Diverse Objectives of Distributed Scheduling Policies”,

Proc. Seventh Int’l Conf. Distributed Computing Systems, pp. 242–249, 1987.

[35] S. Shenker, A. Weinrib, “The optimal control of heterogeneous queuing systems: A

paradigm for load sharing and routing”, IEEE Trans. Computers, vol. 38, pp. 1724–

1735, Dec. 1989.

[36] K. Kabalan, W. Smari, J. Hakimian, “Adaptive load sharing in heterogeneous systems:

policies, modifications, and simulation”, Int’l Journal of Simulation Systems Science

and Tech., vol. 3, No. 1–2, pp. 89–100, Jun. 2002.

[37] V. Subramani, R. Kettimuthu, S. Srinivasan and P. Sadayappan, “Distributed Job

Schedulingon Computational Grids Using Multiple simultaneous Requests”, Proc.

11th IEEE International Sumposium on High Performance Distributed Computing

HPDC-11, 2002 (HPDC’02), Edinburgh, Scotland, July 24–26, pp. 359–368, 2002.

[38] S. Choi, M. Balik, and C. S. Hwang “Volunteer Availability based Fault Tolerant

Scheduling Mechanism in Desktop Grid Computing Environment”, Proc. 3rd IEEE

International Symposium on Network Computing and Applications, Boston, Massa-

chusetts, August 30th - September 1st, pp. 366–371, 2004.

[39] C. Knessly and C. Tiery,“Two tandem queues with general renewal input I: Diffusion

approximation and integral representation”, SIAM J. Appl. Math.,vol. 59, pp. 1917–

1959, 1999.

[40] F. Bacelli and P. Bremaud, Elements of queuing theory: Palm-Martingale Calculus

and Stochastic Recurrence. New York: Springer-Verlag, 1994.

[41] D. J. Daley and D. Vere-Jones, An introduction to the theory of point processes.

Springer-Verlag, 1988.

[42] J. Ghanem, “Implementation of load balancing policies in distributed systems”, Mas-

ters of Science Thesis, Electrical and Computer Engineering Department, University

of New Mexico, June 2004.

111

References

[43] G. Petrie, G. Fann, E. Jurrus, B. Moon, K. Perrine, C. Dippold, and D. Jones, “A

Distributed computing approach for remote sensing data”, Proc. 34th Symposium on

the Interface, pp. 477–489, 2002.

[44] V. Jacobson , “Congestion avoidance and control”, In Proc. SIGCOMM ’88, (Stan-

ford, CA, Aug. 1988), ACM.

[45] D. L. Snyder and M. I. Miller, Random Point Processes in Time and Space. New York:

Springer-Verlag, 1991.

[46] L. Tassiulas and A. Ephremides, “Stability properties of constrained queuing systems

and scheduling policies for maximum throughput in multihop radio networks”, IEEE

Trans. Automatic Control, vol. 37, no. 12, pp. 1936-1948, December 1992.

[47] M. J. Neely, E. Modiano, and C. E. Rohrs, “Dynamic power allocation and routing

for time varying wireless networks”, Proc. of IEEE INFOCOM, San Francisco, April

2003.

[48] D. Y. Burman and D. R. Smith, “A light traffic theorem for multi-server queues”,

Mathematics of Operations Research, 8:15-25, 1983.

[49] G. Koole, P. Sparaggis, and D. Towsley, “Minimizing response times and queue lengths

in systems of parallel queues”, Journal of Applied Probability, 36:1185-1193, 1999.

[50] J. Hill, R. Szewczyk, A.Woo, S. Hollar, D. Culler, and K. Pister, “System architecture

directions for networked sensors”, Proc. Int. IX Conf. ASPLOS, Cambridge, MA,

2000.

112

load balancing in communication constrained distributed...

Documents