junwei cao and wen zhangcaoj/pub/doc/jcao_j_control.pdfsources. ligo data analysis streams terabytes...

Dynamic Controlling of Data Streaming Applications for Cloud Computing

Junwei Cao1,2* and Wen Zhang3 1Research Institute of Information Technology, Tsinghua University, Beijing 100084, China

2Tsinghua National Laboratory for Information Science and Technology, Beijing 100084, China 3Chongqing Military Delegate Bureau, General Armament Department of PLA, Chongqing 400060, China

*Corresponding email: [email protected]

Abstract Performance of data streaming applications is co-

determined by both networking and computing resources, which should be allocated in an integrated and cooperative way. Dynamic controlling of resource allocation is required since unilateral redundancy in networking or computing resources may result in underutilization but not necessarily high performance since insufficiency of either resource may become a bottleneck. In this paper, a virtualized cloud platform is utilized to implement data streaming: fuzzy logic controllers are designed to allocate required CPU resources; iterative bandwidth allocation is applied which is processing- and storage-aware to guarantee on-demand data provision. Experimental results show that our approach leads to high performance of applications as well as high utilization of resources, which is also justified by comparison with other resource allocation methods.

1. Introduction

Data streaming applications are special in that (1) they are continuous and long running in nature; (2) they require efficient transmission of data from/to distributed sources in an end-user-pulling way; (3) it is often not feasible to store all the data in entirety for subsequent processing because of limited storage and high volumes of data to be processed; (4) they need to make efficient use of computational resources to carry out processing in a timely manner. New computing paradigm, e.g. Cloud Computing [1], provides better support for data streaming since virtualized resources can be allocated in a more fine-grained and on-demand way. For example, LIGO (Laser Interferometer Gravitational-wave Observatory) [2], is aimed at direct detection of gravitational waves emitted from space sources. LIGO data analysis streams terabytes of data per day from observatories for real-time processing using scientific workflows [3]. This is a typical scenario in many scientific applications where data

processing is continuously executed over remote streams as if data were always available from local storage.

Great challenges have to be addressed to provide sufficient but not redundant resources, mainly computing resources and networking bandwidth, so that application requirements can be met, while maintaining high resource utilization. Virtualization technology has been applied for resource management [4]. Owe to the progress of virtualization technology, such as the open source solution Xen [5], it is possible to allocate fine-grained computational resources for applications. Virtual machines (VMs) are able to instantiate multiple independently configured guest environments on a host resource at the same time, to provide performance isolation. With the ability of dynamic configuration, VMs make it possible to allocate computing resources for applications on demand with finer granularity. In this work, VMs are setup for data streaming applications for resource isolation and performance guarantee.

As shown in our previous work [6], CPU and bandwidth must be allocated in a cooperative and integrated way. In such scenarios, unilateral redundancy of CPU or bandwidth will not necessarily lead to high data throughput; on the other hand, insufficiency of either will become a bottleneck of the ultimate throughput. It is required to allocate computing and networking resources to reach a balance between obtaining high data throughput while maintaining high resource utilization. Actually, data throughput and resource utilization conflict with each other in nature. Due to stochastic dynamics of computational systems and time-variant workloads, an automatic allocation mechanism that can react to a dynamic environment quickly is required. In this work, closed-loop feedback control is applied.

Feedback control has been applied to computational systems [7] and some promising results obtained. A closed-loop controller observes the error between the set-point (called reference in control theory terminology) and the output of the target system (called plant) to activate its control algorithm (such as

proportional, integral and derivate, PID in short) and produce the control variable of the plant, so as to adjust the output of the plant, until the system reaches to the targeting state, even in the presence of unexpected disturbance. It is indispensable to establish mathematical models in which control systems are described using one or more differential equations that define the system response to inputs in traditional feedback control. This is rather challenging in some cases, especially for data streaming applications due to variable coupling and heavily nonlinear property of the system. Fortunately, fuzzy control offers an alternative, which provides formal techniques to represent, manipulate and implement human experts’ heuristic knowledge for controlling a plant via IF-THEN rules. Fuzzy control does not rely on mathematical modeling of the plant but establishes a direct nonlinear mapping between their inputs and outputs, which significantly reduces the difficulty of control system design.

In this work, an integrated approach is implemented for fine-grained and on-demand resource allocation for data streaming applications in a virtualization based cloud platform: fuzzy logic control is applied for CPU allocation by configuring VMs dynamically according to resource utilization; an iterative algorithm is adopted for bandwidth allocation, which is processing-, congestion- and storage-aware. Actually, allocation of CPU and bandwidth is tightly coupled, and experimental results included in this work show good performance of our approach.

The rest of this paper is organized as follows: Section 2 formulates the problem in details, and the following two sections illustrate fuzzy allocation of CPU resources and iterative bandwidth allocation, respectively. Experimental results are illustrated in Section 5 to show performance evaluation of our approach using an example of gravitational wave data analysis. Section 6 discusses related work and Section 7 concludes the paper.

2. Problem Formulation

In this work, virtualization technology makes it possible to provide a predictable and controllable run-time environment for each application; fuzzy control is applied for on-demand allocation of resources for each VM. Both performance metrics, data throughput and resource utilization, are defined, co-determined by CPU and bandwidth allocation.

At any time t, for a data streaming application i, if the amount of data in local storage, denoted as Qi(t), is higher than a certain level (e.g., a block as explained later), the processing will be running otherwise it will

just be idle. Qi(t) is co-determined by both data provision and data processing since new data will be streamed to local storage while processed data will be cleaned up afterwards.

The amount of data in storage varies over time and can be described using the following differential equation:

( ) ( ) ( )tdttranspeedtQ iii −= (1) ( ) 00 =iQ

, where ( )tQi, transpeedi(t) and di(t) stand for the

derivative of Qi(t), assigned transferring bandwidth and processing speed for data stream i.

As demonstrated in [8], data sets to be processed are composed with lots of small files (LOSF). Data are processed block by block, where a block consists of a certain number of small files. If there are blocks available in the local storage, an indicator, denoted as Readyi for the application i, is set to be 1, otherwise Readyi is 0. So di(t) can be described as:

( )0 00 1

ii

i

Readyd t

Ready=⎧

= ⎨> =⎩

，

，

Some definitions must be provided here for a clearer problem statement.

Realistic Processing Speed (RPS): the processing speed given a data streaming scheme di(t) here.

Theoretic Processing Speed (TPS): the processing speed the allocated CPU resources can generate if there were always enough data locally, denoted as procspeedi(t,Ci(t)), where Ci(t) stands for the allocated CPU resource for application i at time t. Relationship between procspeedi(t,Ci(t)) and Ci(t) must be determined with system identification and it is safe to say that procspeedi(t,Ci(t)) is a non-decreasing function of Ci(t), where Ci(t) mainly refers to a proportion of CPU cycles as explained later.

Realistic Throughput (RTP): given a data supply scheme, the amount of data processed in a given period of time.

Theoretic Throughput (TTP): the amount of data processed in a given period of time if there were always enough data locally with allocated CPU resources.

Scheduling of CPU and bandwidth and resources is carried out periodically to cope with the dynamic status of resources and applications. Suppose the length of a scheduling period is M, and for the hth scheduling period, the following formulas are explicit:

( ) ( )( )( )( )

1Re,

0Re0,

=

==

iadytiCtiprocspeediady

tiCtidtid，

，＝

( )( )( )

∫−

=hM

MhdttiCtiprocspeedhiTTP

1,,

( )( )( )

∫−

=hM

MhdttiCtidhiRTP

1,,

From (1):

( ) ( )( )

( ) ( )( ) ( )( )∫−=

−−+

∫−

⋅

−=

=

⎟⎟⎠

⎞⎜⎜⎝

⎛

hM

MhthMiQMhiQdttitranspeed

hM

MhdttiQtitranspeedhiRTP

11

1

.,

Define utilization of compute resource (UC in

short) as

hiTTPhiRTP

hiUC,

,, = (2)

i.e.,

( ) ( )( ) ( )( )

( )( )∫

∫

−=

−=

−−+

= hM

Mht

hM

Mht

dttiprocspeed

hMiQMhiQdttitranspeed

hiUC

1

1

1

, (3)

, denoting to what extent the allocated compute resource is utilized.

RTPi,h can be defined in another form as:

( )∫Ω

=hi

i dttprocspeedhiRTP,

, (4)

where Ωi,h stands for the time fragment when processing is going on and then utilization can be redefined in another way as:

MhiUC hΩ=, (5)

Note that (5) implies that TPSi,h will keep constant in a scheduling period with given CPU resources.

UC can be defined also as the ratio of RPS to TPS. The problem is to allocate proper amount of CPU resource to generate RPS approaching TPS as much as possible given the data supply scheme. It is obvious that redundant CPU resource will make a TPS much larger than RPS, which implies underutilization of computing resources. If available bandwidth is limit, RPS will be zero at most time with redundant CPU cycles for lack of data to process. This dependency between data provision and processing make it necessary to allocate compute resources on demand so as to make RPS equal with TPS.

Our goal is to get high throughput while maintaining high utilization of resources even when system characteristics are vague as discussed before. A low utilization implies that at most time CPUs are idle for lack of data to be processed. On the other hand, an extremely high utilization indicates that CPU resources

are over-loaded and more resources are required. In both cases, data throughput will be hampered. In intuition, applications with extremely high utilization should be allocated more resources to increase throughput, and resources of those applications with low utilization should be decreased to increase the utilization and avoid waste of resources. Actually, this intuition is nearly sufficient to design a fuzzy controller, dynamically configuring the clock speed of VMs for each data streaming application and implementing fine-grained allocation of CPU resources.

The iterative bandwidth allocation algorithm must be processing- and storage-aware. That is to say, data are streamed according to processing capacity, since insufficient CPU resource may lead to waste of available bandwidth. Also a relatively higher speed for data streaming than processing will lead to data accumulation, requiring more available local storage, which is unnecessary and should be avoid. On the other hand, redundant CPU resources may make full use of bandwidth and result in a high throughput, at cost of CPU underutilization. It is required to achieve a balance between CPU and bandwidth resources for each data streaming application. The coupling between data provision and processing leads to the system with heavily nonlinear properties and system modeling is not a trivial task. This is why fuzzy logic control is applied instead.

Allocation of CPUs and bandwidth is carried out in an integrated and cooperative way. A resource container which holds appropriate CPU and bandwidth resources calculated with our proposed algorithm is implemented for each data streaming application, similar with that demonstrated in [9].

3. Fine-grained CPU Allocation

As mentioned above, fine-grained CPU resource allocation is implemented with combination of virtualization technology and fuzzy control, where the former provides an isolated run-time environment and the latter is responsible for appropriate resource configuration.

3.1. Virtualization with Xen

Recent progress on virtualization technology makes it possible for resource isolation and performance guarantee for each data streaming application. Virtualization provides a powerful new layer of abstraction in distributed computing environments, which separates physical hardware and operating system, so as to improve resource utilization and flexibility. Virtualization allows multiple VMs

with different operating systems and software configuration to run on a single physical machine. Each VM holds its own virtual hardware, such as CPU and memory, on which operating system and applications can be loaded.

Xen is a virtualization technology that offers negligible performance penalties with a technique called paravirtualization, which exposes a set of clean and simple device abstractions to allow most instructions to run at their native speed and the overall capacity is very close to raw physical performance. Xen conducts a VM management mechanism called hypervisor, or in other words, virtual machine monitor (VMM), to share and access hardware at lower level.

With Xen, configuration of VMs can be dynamically adjusted to optimize performance. The amount of memory of each VM can be changed easily. The CPU of a VM is called virtual CPU, often abbreviated as VCPU. The quota of physical CPU cycle a VCPU will get is determined by two parameters, i.e., weight and cap, where the former is a relative value and the latter is an absolute one. A VCPU with a weight of 128 can obtain twice as many CPU cycles as one whose weight is 64, while 50 as a cap value indicates that the VCPU will obtain 50% of a physical CPU’s cycles. What’s more, binding a VCPU with a physical CPU will provide better performance.

Each data streaming application will be provided with a VM which will be occupied exclusively by it. Xen is applied to create a VM with configurable clock speed for each application. Cap, one of Xen’s interfaces regarding CPU allocation, is adjusted dynamically according to the measured utilization and pre-defined fuzzy rules as described below.

3.2. Fuzzy Control Overview

A fuzzy control system is based on fuzzy logic related with fuzzy concepts that cannot be expressed as true or false but rather as partially true. Inputs of a fuzzy control system are values in terms of logical variables that take on continuous values between 0 and 1 which indicates degree of truth, rather than classical or digital logic operating on discrete values of either 0 and 1 (true and false). As an obvious advantage, fuzzy logic can express the solution to problems in terms that human operators can understand, so as to make use of their experience in designing of the controller. This makes it easier to mechanize tasks that are already successfully performed by humans but hard to establish mathematical models. So, fuzzy control has its special merit when it is difficult, if not possible, to apply traditional control method.

The input variables in a fuzzy control system are in general mapped into fuzzy sets, where an input variable may be mapped into several fuzzy sets with corresponding truth values determined by the membership functions. This process is called fuzzification. All the rules that apply are invoked, using the membership functions and truth values obtained from the inputs, to determine the result of the rule. This result in turn will be mapped into a membership function and truth value controlling the output variable. These results are combined to give a specific answer by a procedure known as defuzzification.

A fuzzy logic controller (FLC) can be depicted as the diagram in Figure 1, consisting of an input stage, a processing stage, and an output stage.

Figure 1. A fuzzy logic controller The input stage maps inputs including UC and

UC to the appropriate membership functions (as shown in Figures 2 and 3, respectively) and truth values, known as fuzzification. These mappings are then fed into the rules in the processing stage, which based on inference mechanism, invokes each relevant rule in the rule base and generates a result for each, then combines the results of the rules. Here the rule specifies an AND relationship between the mappings of the two input variables so the minimum of the two is used as the combined truth value. Finally, in the output stage, the appropriate output state is selected and assigned a membership value at the truth level of the premise. The truth values are then defuzzified through centroid defuzzification. In our scenario, the output is a proportional factor, PF in short, which will be used to calculate the allocated CPU quota of each VM, in terms of cap.

Some basic concepts are given below to help construct an elementary understanding of fuzzy logic controllers and their mechanism.

Universe of discourse is the domain of an input (output) to (from) the FLC. Inputs and outputs must be mapped to the universe of discourse by quantization factors (Ke and Kec in Figure 1) and scaling factor (Ku in Figure 1) respectively, which helps to migrate the fuzzy control logic to different problems without any modification.

Linguistic variables describe the inputs and output(s) of a fuzzy controller. These linguistic variables are a natural way resembling human thoughts to handle uncertainties created by stochastics present in most computer systems. Linguistic variables involved in this work include UC, UC and PF.

Linguistic values are used to describe characteristics of the linguistic variables. Very low, low, medium, high and very high are the linguistic values for UC, while those for UC and PF are NB, NM, NS, ZE, PS, PM and PB, where N, P, B, M, S and ZE are abbreviations of negative, positive, big, medium, small and zero, respectively, and the combination of them just takes on a degree of truth. Different from classical mathematics, in fuzzy world, this is represented as a continuous value between 0 to 1, and 0.5 indicates we are halfway certain. The mapping from a numeric value to a degree of truth for a linguistic value is done by the membership function.

Linguistic rules form a set of IF premise THEN consequent rules to map the inputs to output(s) of a fuzzy controller, i.e., to guide the fuzzy controller’s actions. These rules are defined in terms of linguistic variables, different from the numerical input-or-output of the classical controller. A linguistic rule for example is: IF UC is high AND UC is NB THEN PF is NB.

Rule-base holds a set of IF-THEN rules as a part of the controller, dictating how to achieve PF according to the fuzzified linguistic values of UC and UC.

Membership functions quantify the certainty an UC and UC value to be associated with a certain linguistic value. Except for the membership of linguistic value very low for UC, we use symmetric triangles of an equal base and 50% overlap with adjacent MFs. Unlike traditional set theory, in fuzzy set theory underlying fuzzy control theory, set membership is not binary but continuous to deal with uncertainties. Thus, a fuzzy input or output may belong to more than one set—maximum two adjacent sets in our MFs—with different certainty values.

Inference mechanisms in Figure 1 determines which rules will be applied at the kth sampling point, based on the fuzzified UC and UC. To compute the certainty value of the premise in the corresponding IF premise THEN consequent rule(s), we take the minimum between the certainty values of UC and UC, since the consequent cannot be more certain than the premise.

Fuzzification is to transform precise values of inputs into fuzzy sets with corresponding membership functions, which is indispensable for fuzzy inference. Outputs of fuzzy inference are fuzzy sets, which are not suitable to drive the controlled system, and they

must be transformed into a clear value by defuzzification. Centriod method is applied to compute the control signal.

3.3. Linguistic Variables and Fuzzy Rules

As for the FLC proposed in this paper, the inputs are the observed resource utilization UC as defined in (2) or (5) and the derivative of it, UC, unlike most existing FLCs whose inputs are usually the observed error between the settled point and the actually measured value and the derivative of the error. Although it has two inputs, essentially it is a single input controller for UC can be derived from UC as:

( ) ( ) ( )1UC k UC k UC kΔ = − − Output of the fuzzy controller for application i in

the hth scheduling period, called the control signal in control terminology, is a proportional factor, denoted as PFi,h. At the hth scheduling period, given inputs UC and UC, suppose the relevant fuzzy sets of output form a set denoted as Mi,h with membership denoted as Mi,h(u), where u∈Ui,h and Ui,h is the universe of discourse, then the output can be calculated as

( ) ( ), ,

, , ,

i h i h

i h i h i hU U

PF M u udu M u du= ∫ ∫ (6)

Suppose the initial caps of each application are Ci,0, in the hth scheduling period, the cap will be

( ) 1,11,1,, ≥−−+−= hCapScalehiPFhiChiC (7)

, and initially ,0 1iPF =

where CapScale is the varying scale of the allocated cap. PFi,h is adjusted every scheduling period, so it is adaptive to the varying situations. Relationship between allocated cap and procspeed can be obtained with system identification [7] as described later in Section 5.2 and a linear model (an approximately proportional model in certain scope) is adopted as

( ) ( ) ( )∑=

∑=

−+−=p

l

q

mmhCapblhprocspeedlahprocspeed m

1 1(8)

Linguistic values of UC include very-low, low, medium, high and very-high, indicating utilization status of CPU resources. Triangular membership functions are adopted, as shown in Figure 2.

Figure 2. Triangular membership functions of UC

Our goal is to keep the utilization at a high level (80% in this scenario), and a low or extremely high utilization is not required. Xen allow most instructions to run at their native speed, but due to I/O intensive characteristics of data streaming applications, there is indeed some overhead, i.e., performance loss. In order to guarantee high processing efficiency, we set the target utilization to be 80% rather than 100%, which means that the allocated CPU quotas are more than actually needed to compensate the overhead, as inferred from (2) or (5).

Both input UC and output PF adopt triangular membership functions with linguistic variables of NB, NM, NS, ZE, PS, PM and PB, as shown in Figures 3 and 4, respectively. It can be seen that the universe of discourse of UC falls to the scope of -0.4 to 0.4, which is just based on our empirical observation. It is also the case for PF where the universe of scope is set to 0.6 to 1.4.

Mem

bers

hip

Func

tion

Figure 3. Triangular membership functions of UC

Mem

bers

hip

Fucn

tion

Figure 4. Triangular membership functions of PF

Table 1. Fuzzy rules UC PF

very low low medium high very highNB NB NM NM NS NS ZE ZE PS PS PM PM

UC

PB

NB NB NB

PB

PB

As mentioned above, a low utilization implies that the allocated CPU quota should be decreased to release redundant compute resources without reducing the ultimate throughput; meanwhile extremely high utilization indicates that more CPU resources are required to increase processing efficiency. When the utilization is very low, low or medium, the generated

PF should be less than 1 while the very high utilization requires a PF lager than 1. To avoid oscillation, UC should also be paid attention. When the utilization is far away from the settled point (80% here), the adjustment can be big. Then the PF in the 1st, 2nd and 3rd columns in Table 1 will be NB (negatively big) while in the last column it is PB (positively big). When the utilization falls to the high area, more careful adjustment is required, as shown in the 4th column in Table 1. For example, when UC is NS, PF is also NS; and when UC is ZE, PF is also ZE, which means that no adjustment is required so as to keep a stable status. Note that in Figure 4, ZE means the value of PF is 1, not 0.

From Table 1, it can be seen that these fuzzy rules are simple but robust, guaranteeing a rapid convergence to the settled point without stable state errors, which is a required characteristic for control systems, as shown in experimental results included in Section 5.

3.4. Hybrid Resource Allocation

The control system of resource allocation for data streaming applications is illustrated in Figure 5. CPU and bandwidth resources are allocated by actuator, abbreviated as ACT, which performs the integrated allocating scheme. The transfer function from allocated CPU and bandwidth resources to utilization UC, denoted as G, is not available, for the two inputs are tightly coupled and interactive to each other. Fortunately, this transfer function is not indispensable for our fuzzy allocation scheme.

A FLC receives UC and UC and outputs the caps of CPU for each application and then procspeed is determined with (8). An iterative bandwidth allocation (IBA) is implemented as described in Section 4 to decide transpeed. UC at the next scheduling period is obtained with (2) or (5). In such a way, the control system works and hybrid resource allocation is implemented for cloud computing.

Figure 5. The control system

4. Iterative Bandwidth Allocation

For data streaming applications, e.g. LIGO data analysis, corresponding data sources are usually located on remote sites to the processing cluster, so it

is required to transfer data to local storage through Internet. The total bandwidth from the cluster to Internet is limited, denoted as I, which is shared by multiple data streams. The data streams, called sessions, denoted as s, form a set S. Each session will be assigned a bandwidth xs (i.e., transpeed in Section 2), where xs ∈Xs, Xs =[bs,Bs] and bs >0, Bs <∞. bs stands for the least bandwidth required for session s, while Bs is the highest bandwidth of the link from the corresponding data source to the session s. Session s will have a utility function Us(xs), which is assumed to be concave, continuous, bounded and increasing in the interval [bs,Bs]. We try to maximize the sum of the utilities of all the sessions, maintaining fairness among them. The problem can be described as follows.

.P ( )∑

∈Ssss xUmax

..ts ∑∈

≤Ss

s Ix

ss Xx ∈ Repository policy is applied here, and variables Us

and Ls are the settled upper and lower limits of data amount in storage for session s to control the resuming and pause of data transfers: when the amount of data in storage of s reaches the upper limit, the data transfer is halted and when this amount scratches the lower limit, data transfer is resumed. So data transfer may be intermittent rather than continuous, so as to be storage aware, to avoid data overflow while guaranteeing adequate data provision.

Due to the amount of data in storage, there are two possible transfer states for each s at any time, i.e., active and inactive, which indicate a data transmission is on or off. All active sessions form a set, called SA, and it is obvious that this set is varying because transfer states of sessions are changing. At every sampling time k, status of transfer s can be determined by data amount in storage and the previous status:

⎪⎩

⎪⎨⎧

≤−

=−>−

≥−

=−<−

=

sLksamountksstatussLksamount

sUksamountksstatussUksamount

ksstatus

1,,101,,1,,0

1,,011,,1,,1

,

, where statuss,k and amounts,k stand for status of transfer and data amount in storage for session s at the kth sampling time, respectively. The initial condition is statuss,0 =1 and amounts,0=0.

Then at the kth sampling point 1, , =∈ ksA statusifSs

and

0, , =∉ ksA statusifSs

We just allocate bandwidth for active transmissions, so the bandwidth constraint may be rewritten as

∑∈

≤ASs

s Ix

An iterative optimization algorithm is proposed as follows.

While s ∈SA

( )( ) ( )( )[ ] ( )

( )[ ] ( )⎪⎩

⎪⎨⎧

∑∈

>

∑∈

≤+

=+

ASsIk

sxifsX

ksxk

ASsIk

sxifsX

ksxUk

ksx

ksx

ρβ

ρα '1

otherwise ( )

Ak

s Ssx ∉∀=+ ,01 Here, xs

(k)is the bandwidth for session s∈S at the kth sampling time. αk and βk are two positive sequences, βk∈ (0, 1). [·] Xs denotes a projection on the set Xs as

[ ] ( )( )ybBy ssX s,max,min=

( )⋅'U is the sub-gradient of ( ) ( )( )∑

∈

=⋅ASs

kss xUU

and ( )( ) ( )

( )ks

ks x

UxU∂

⋅∂='

And ρ is the so-called safety coefficient to avoid bandwidth excess, where ρ∈ (0, 1).

Essentially, this allocation algorithm is processing-aware, since statuss,k and amounts,k are affected by processing, i.e., SA is associated with data processing. Bandwidth allocation is coupled with CPU allocation. On the other hand, the allocated bandwidth will also result in a utilization to trigger the FLC to adjust allocation of CPU resource. In this way, allocations of bandwidth and CPU resources are integrated and interactive, and they cooperate to allocate resources for applications on-demand.

Parameters in this iterative bandwidth allocation scenario, such as αk, βk and ρ can be adjusted according to different allocation principles, such as relative fairness, the-most-needed-the-first, etc. The mostly applied utility function is logarithmic:

( ) ( )ssss xwxU += 1ln , where ws is the weight of session xs and a larger ws implies a bigger quota in the total available bandwidth for each iterative step.

In data streaming scenario, due to coupling of data processing and provision, i.e., because of the fact that the ultimate throughput or in other words the ultimate

processing efficiency is co-determined by allocated CPU and bandwidth and both are interactive to each other, relationship between processing efficiency and allocated CPU or bandwidth is heavily non-linear which is hard, if not possible, to be expressed in precise mathematical formula. This makes it very difficult to apply traditional feedback control for absence of a plant model, and fortunately the model-free nature of fuzzy controlling provide a feasible alternative. 5. Performance Evaluation

In this section, the approach described in Sections 3 and 4 is implemented and experiments are carried out using a gravitational wave data analysis case study. 5.1 Experiment Design

VMs are set up on a HP DL580G5 server with 4 Intel CPUs containing 16 Xeon E7310 cores and 8 GB memories for LIGO data streaming applications. Data items are streamed to the VMs from remote data sources.

A LIGO data analysis application reads in two data streams from remote LIGO data archives and calculates correlation coefficients that can be used to characterize similarity of two data curves. If two signals from two observatories occur simultaneously with similar curves, it would be likely that a gravitational wave candidate is detected. Note that this is only a simplified case study since actual LIGO data analysis pipeline is much more sophisticated since many pre- and post- signal processing steps are required. LIGO data streams are composed with numerous small data files, each containing observational data acquired in 16 seconds. Data files used in this experiment is the LIGO level 3 data with reduced sizes that only include data from the gravitational wave channel.

According to complexity of processing (denoted in terms of proportional coefficients obtained by system identification in the following subsection), applications can be divided into two categories: light and heavy ones. For light applications, data items can be processed with small amount of computation while heavy applications perform more complex processing on data items and relatively large amount of computational resources are required. Heavy and light groups of applications are used as benchmarks, each with 5 applications respectively. Total available bandwidth, I, is set to 5, 10 and 15 Mbps. Three values of CapScale, 3, 5 and 8, are evaluated. Note that here heavy or light applications are relatively defined.

Our allocation algorithm is evaluated for 100 scheduling periods. Performance metrics include the output of the FLC (PF), the utilization of each allocated computational resources (as define in (1), UC), the allocated cap for each application and resource usage (US) of CPU and bandwidth. Note that the usage of CPU means the sum of allocated caps for each application, which is different from the utilization defined in (1).

5.2 System Identification

To reveal the mathematical relationship between the processing speed and the allocated computing resources (mainly the quota of CPU), system identification is carried out. Here 1,188 pairs of LIGO data files from two observatories with the total amount of 4,354 MB are used in the experiment.

Three VMs are setup with 512 MB, 256 MB and 128 MB of memory, respectively. The allocated caps for each VM rang from 5% to 100%. The makespans during which all the data pairs are processed are shown in Figure 6, from which it seems that allocated memory has little to do with the makespan for this application. In the following experiments, memory size for each VM is set to 128 MB.

Figure 6. Makespans with different caps and

memory sizes The processing speed is shown in Figure 7 with the

solid line. Polynomial curve fitting is applied to generate a mathematical function from the cap to processing speed, using Least Squares Method (LSM).

Figure 7. Processing speeds with different caps

From Figures 6 and 7, it can be inferred that once the allocated cap exceeds 50%, the same increase will not make so obvious benefit as it does in the scope of 5% to 50%. For our application, CPU allocation for VMs is very essential to share CPUs among multiple data streaming applications so that higher performance can be achieved with less resource usage.

5.2. Resource Allocation and Utilization

As shown in Figures 8 and 9, light and heavy applications get appropriate amount of CPU resources using our approach, where the total bandwidth is 5 Mbps. In these cases, required CPU resources of each application is far less than a whole physical CPU, for the allocated caps of them are under 20% or even 10%. The total CPU utilization is also far less than 100%.

Initial caps for each application are 10%. All the allocation schemes are converged to a steady state. No steady state error exists in each allocation scheme. But in presence of sudden changes of available resources, they can make a rapid response, as shown in Figure 9.

(a) Proportional factors

(b) Utilization

(c) Allocated cap

(d) Resource usage Figure 8. Performance of heavy tasks


(b) Utilization

(c) Allocated cap

(d) Resource usage

Figure 9. Performance of light tasks

5.3. Robustness and Adaptability Adaptability of the allocation algorithm is also

tested. For example, total available bandwidth jumps to 8 and 10 at the 30th and 60th scheduling period, respectively, and the algorithm can react to this situation very fast as shown in Figure 10. From the view point of control, this variance in bandwidth can be considered as a disturbance, so the robustness or performance of anti-disturbance is good.


(b) Utilization

(c) Allocated cap

(d) Resource usage

Figure 10. Responsiveness to bandwidth

5.4. Parametric Convergence and Stability As a routine in control system design, stability

analysis is indispensable because only the stable systems can be applied.

We find that some parameters, e.g. CapScale in (7), have something to do with the system stability. For some applications, a larger CapScale results in a rapid convergence to the steady state. But a large CapScale may also lead to performance oscillation and instability. For example, when CapScale is set to 8, the outcome of FLC, resource utilization and allocated CPU quota for each light application will not converge to a stable value. In this case, the control system cannot work in a stable state, as shown in Figure 11.

PF


UC

(b) Utilization

Cap

(c) Allocated cap

Figure 11. Performance of light tasks with a large CapScale

Heavy tasks are not so sensitive to a large CapScale as light ones. In Figure 12, CapScale is also

set to 8. While some oscillation occurs, the magnitude is very small. So CapScale is essential for the approach proposed in this work. We set CapScale empirically from 3 to 5.

PF


0 10 20 30 40 50 60 70 80 90 1000.65

0.7

0.75

0.8

0.85

0.9

0.95

1

Scheduling Periods

uc1

uc2

uc3

uc4

uc5

(b) Utilization

Cap

(c) Allocated cap

Figure 12. Performance of heavy tasks with a large CapScale

5.5. Performance Comparison

Several other resource allocation schemes are developed in comparison, as shown in Table 2: iterative and even stands for the way to allocate bandwidth among applications, in an iterative way as described in Section 4 or just to divide the total bandwidth evenly; dynamic stands for CPU allocation manner described in Section 3, while fixed means that allocated CPU resources for each application are constant. Obviously, our approach belongs to Case 1.

Table 2. Algorithm settings Bandwidth CPU

Case 1 iterative dynamic Case 2 iterative fixed Case 3 even dynamic Case 4 even fixed

Some results are provided in Table 3, where performance metrics from top to bottom are final throughput (the sum of data processed during the evaluation), CPU usage and bandwidth (BD in short) usage in percentage. Characters of H and L are abbreviations of heavy and light, indicating the type of applications.

Table 3. Performance comparison Total Bandwidth Index Case 5 10 15

1 49951 97080 117140 2 46542 77107 77269 3 46745 87495 104660 H

4 48566 80004 82296 1 49999 99994 134980 2 46569 94019 123458 3 47576 95018 124990

FinalTP

L

4 48967 94948 104458 1 40.17 77.76 98.81 2 50 50 50 3 39.16 66.09 91.88 H

4 50 50 50 1 9.79 20.74 31.93 2 50 50 50 3 9.76 19.90 28.93

CPUUsage

(%) L

4 50 50 50 1 93.49 91.50 81.77 2 93.08 77.11 51.51 3 92.90 87.08 80.10 H

4 90.13 80.00 54.86 1 92.14 92.02 89.99 2 93.14 93.14 90 3 91.41 90.50 83.33

BD Usage

(%) L

4 93.24 91.99 80.33 Our algorithm prevails in all the scenarios included

in Table 3. For example, with fewer resources, our algorithm obtains a higher throughput. On the contrary, in some case, for instance Case 4, where the bandwidth is evenly allocated irrespective of the requirements of applications and CPU resources are fixed constantly, the situation is deteriorated, though they occupy so big a quota of resources. Cases 2 and 3 improve a little, but their performance is not as ideal as our approach. It reflects that unilateral adjustment of bandwidth or CPU resources is not powerful enough to reach the goals of high throughput and high utilization of resources simultaneously, which from another aspect justifies our assertion that bandwidth and CPU resources should be allocated in an integrated and cooperative way.

6. Related Work

Stream processing [10] has become one of major

focuses of database research in recent years, and some tools and techniques have been developed to cope with efficient handling of continuous queries on data streams, e.g. Aurora [11] and TelegraphCQ [12]. Our

work focuses on scheduling data streaming applications on virtualized cloud resources.

Distributed computing techniques evolve from cluster, grid to cloud computing [13]. Resource management and allocation has been a key issue in these areas [14]. For cloud computing with virtualization technology as kernel, virtual machines [15] or virtual clusters [16] are basic elements for management, scheduling and optimization [17]. Some existing management tools include Eucalyptus [18], VMPlants [19], Usher [20], etc. Some schedulers are developed to support data streaming applications, e.g. GATES [21] and Streamline [22], but they mainly concern on computing resource allocation. Several existing projects, EnLIGHTened [23], G-lambda [24] and PHOSPHORUS [25] put emphases on networking resources. They hold the whole control over an optical network so that a deterministic lightpath can be obtained with advance reservation or on demand, while our network is public Internet based on “best effort” TCP/IP protocols to approach the required bandwidth as much as possible.

Control theory has been successfully applied to control performance or quality of service (QoS) for various computing systems. An extensive summary of related work can be found in the first chapter of [7]. Some control types, such as proportional, integral, and derivative (PID) control [26] [27], pole placement [28], linear quadratic regulator (LQR) [28] and adaptive control [29] [30] are proposed. Most of them require precise models of controlled objects.

The first application of fuzzy control was introduced into industry [31]. Fuzzy control [32] [33] is also a topic of research in computing system, but it is mainly focused on admission control to get a better quality of service. Adaptive fuzzy control is applied for utilization management of periodic tasks [34], where the utilization is defined as the ratio of the estimated execution time to the task period. Fuzzy inference is carried out on fuzzy rules to decide the threshold, a point over which the quality of service (QoS) of tasks should be degraded or even an admission control must be performed to reject more tasks. In this work, execution time estimation must be provided, which may be not feasible for some applications.

The latest relevant work [35] is focused on providing predictable execution so as to meet the deadlines of tasks. Virtualization technology is applied to implement the so-called performance container and compute throttling framework, to realize the “controlled time-sharing” of high performance compute resources, i.e., fine-grained CPU allocation. System identification is carried out to establish the

model of controlled object and a proportional and integral (PI) controller is applied. This work has similar motivation with our work, but for the data streaming scenario, it is hard to generate a precise model from allocated bandwidth and CPU resources to corresponding utilization or throughput generation as explained before. So we adopt the model-free fuzzy control approach.

7. Conclusions and Future Work

In this work we provide a new approach for resource allocation of virtualized cloud resources for data streaming applications. From experimental results included in Section 5, some detailed conclusions can be inferred as follows:

Speeds of data streaming and processing reach a balance at a high level to guarantee running of applications to obtain high throughput while consuming just reasonable amount of resources on demand, especially usage of CPU resources is fairly economic;

High CPU utilization can be achieved, e.g., 80% in Figure 9, and there is no stable state error, so as to make efficient use of resources while avoiding overloading;

Heavy applications consume more compute resources than light ones. Higher total available bandwidth leads to a higher CPU resource usage, while lighter tasks will result in a higher bandwidth usage, which reflects the interrelation between bandwidth and CPU resources, as also shown in Table 3;

Data throughput is co-determined by total available bandwidth and CPU resources. Unilateral redundancy in either resource will not necessarily lead to a higher throughput but may result in under-utilization, so integrated resource allocation is required. Fuzzy logic control of CPU resources and iterative

bandwidth allocation are implemented for resource scheduling for grid data streaming applications. Virtualization technology is applied to enable fine-grained and on-demand resource allocation. Experimental results show the good performance in resource utilization, data throughput and robustness of the approach, in comparison with several other methods with less adaptability to dynamic environments.

In further work, virtualization of network and storage will be implemented. More complex scenarios will be considered, e.g. for workflows or pipelines, where VMs can be established for each stage and

controlled to achieve balance among different stages in one pipeline so as to avoid performance bottleneck and achieve overall high performance. Acknowledgement

This work is supported by National Science Foundation of China (grant No. 60803017) and Ministry of Science and Technology of China under National 973 Basic Research Program (grants No. 2011CB302505 and No. 2011CB302805).

References

[1]. M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R.

Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, and M. Zahari, “A View of Cloud Computing”, Communications of the ACM, Vol. 53, No. 4, pp. 50-58, 2010.

[2]. A. Abramovici, W. E. Althouse, et. al., “LIGO: The Laser Interferometer Gravitational-Wave Observatory”, Science, Vol. 256, No. 5055, pp. 325 – 333, 1992.

[3]. D. A. Brown, P. R. Brady, A. Dietz, J. Cao, B. Johnson, and J. McNabb, “A Case Study on the Use of Workflow Technologies for Scientific Analysis: Gravitational Wave Data Analysis”, in I. J. Taylor, D. Gannon, E. Deelman, and M. S. Shields (Eds.), Workflows for eScience: Scientific Workflows for Grids, Springer Verlag, pp. 39-59, 2007.

[4]. M. Migliardi, J. Dongarra, A. Geist and V. Sunderam, “Dynamic Reconfiguration and Virtual Machine Management in the Harness Metacomputing System”, Computing in Object-Oriented Parallel Environments, Lecture Notes in Computer Science, Vol. 1505, pp. 127-134, 1998.

[5]. P. Barham, B. Dragovic, K. Fraser, S. Hand, T.L. Harris, A. Ho, R. Neugebauer, I. Pratt and A. Warfield, “Xen and the Art of Virtualization”, Proc. ACM Symp. on Operating Systems Principles, 2003.

[6]. W. Zhang, J. Cao, Y. Zhong, L. Liu and C. Wu, “An Integrated Resource Management and Scheduling System for Grid Data Streaming Applications”, Proc. 9th IEEE/ACM Int. Conf. on Grid Computing, pp. 258-265, Tsukuba, Japan.

[7]. J. L. Hellerstein, Y. Diao, S. Parekh, and D. M. Tilbury, Feedback Control of Computing Systems, Wiley-IEEE Press, August 2004.

[8]. W. Zhang, J. Cao, Y. Zhong, L. Liu and C. Wu, “Block-Based Concurrent and Storage-Aware Data Streaming for Grid Applications with Lots of Small Files”, Proc. 1st Int. Workshop on Service-Oriented P2P Networks and Grid Systems, conj. 9th IEEE Int. Symp. on Cluster Computing and the Grid, pp. 538-543, Shanghai, China, 2009.

[9]. X. Liu, X. Zhu, S. Singhal, and M. Arlitt, “Adaptive Entitlement Control to Resource Containers on Shared

Servers,” Proc. 9th IFIP/IEEE International Symposium on Integrated Network Management (IM 2005), Nice, France, 2005.

[10]. L. Golab, and M. T. Ozsu, “Issues in Data Stream Management”, SIGMOD Record, Vol. 32, No. 2, pp. 5-14, 2003.

[11]. D. Abadi, D. Carney, U. Cetintemel, M. Cherniack, C. Convey, S. Lee, M. Stonebraker, N. Tatbul, and S. Zdonik, “Aurora: A New Model and Architecture for Data Stream Management”, VLDB Journal, Vol. 12, No. 2, pp. 120-139, 2003.

[12]. S. Chandrasekaran, O. Cooper, A. Deshpande, M. J. Franklin, J. M. Hellerstein, W. Hong, S. Krishnamurthy, S. R. Madden, F. Reiss, and M. A. Shah, “TelegraphCQ: Continuous Dataflow Processing”, Proc. ACM SIGMOD Int’l. Conf. on Management of Data (SIGMOD’03), 2003.

[13]. I. Foster, Y. Zhao, I. Raicu, S. Lu, “Cloud Computing and Grid Computing 360-Degree Compared”, Proc. IEEE Grid Computing Environments, conj. IEEE/ACM Supercomputing Conf., Austin, 2008.

[14]. R. Buyya, C. S. Yeo, S. Venugopala, J. Broberga, and I. Brandicc, “Cloud Computing and Emerging IT Platforms: Vision, Hype, and Reality for Delivering Computing as the 5th Utility”, Future Generation Computer Systems, Vol. 25, No. 6, pp. 599-616, 2009.

[15]. M. Rosenblum and T. Garfinkel, “Virtual Machine Monitors: Current Technology and Future Trends”, IEEE Computer, Vol. 38, No. 5, pp. 39-47, 2005.

[16]. I. Foster, T. Freeman, K. Keahey, D. Scheftner, B. Sotomayer and X. Zhang, “Virtual Clusters for Grid Communities”, Proc. IEEE Int. Symp. on Cluster Computing and the Grid, May 2006.

[17]. F. Zhang, J. Cao, L. Liu, and C. Wu, “Redundant Virtual Machine Management in Virtualized Cloud Platform”, Int. J. Modeling, Simulation, and Scientific Computing, 2011. (to appear)

[18]. D. Nurmi, R. Wolski, C. Grzegorczyk, G. Obertelli, S. Soman, L. Youseff, and D. Zagorodnov, “The Eucalyptus Open-Source Cloud-Computing System”, Proc. 9th IEEE/ACM Int. Symp. on Cluster Computing and the Grid, 2008.

[19]. I. Krsul, A. Ganguly, J. Zhang, J. A. B. Fortes, and R. J. Figueiredo, “VMPlants: Providing and Managing Virtual Machine Execution Environments for Grid Computing”, Proc. ACM/IEEE SC2004 Conference, 2004.

[20]. M. McNett, D. Gupta, A. Vahdat, and G. M. Voelker, “Usher: an Extensible Framework for Managing Clusters of Virtual Machines”, Proc. 21st Conf. on Large Installation System Administration, 2007.

[21]. L. Chen and G. Agrawal, “A Static Resource Allocation Framework for Grid-based Streaming Applications”, Concurrency and Computation: Practice and Experience, 18:653–666, 2006.

[22]. B. Agarwalla, N. Ahmed, D. Hilley, and U. Ramachandran, “Streamline: Scheduling Streaming

Applications in a Wide Area Environment”, Multimedia Systems, Vol. 13, No. 1, pp. 69-85, 2007.

[23]. L. Battestilli, et al., “EnLIGHTened Computing: An Architecture for Co-allocating Network, Compute, and Other Grid Resources for High-End Applications”, Proc. Int’l. Symp. on High Capacity Optical Networks and Enabling Technologies, pp. 1-8, 2007.

[24]. A. Takefusa, et al., “G-lambda: Coordination of a Grid Scheduler and Lambda Path Service over GMPLS”, Future Generation Computer Systems, Vol. 22, No. 8, pp. 868-875, October 2006.

[25]. S. Figuerola, et al., “PHOSPHORUS: Single-step On-demand Services across Multi-domain Networks for e-Science”, Proc. SPIE, Vol. 6784, 67842X, 2007.

[26]. T. F. Abdelzaher, K. G. Shin, and N. Bhatti, “Performance Guarantees for Web Server End-systems: A Control-theoretical Approach”, IEEE Trans. on Parallel and Distributed Systems, Vol. 13, 2002.

[27]. S. Parekh, N. Gandhi, J. L. Hellerstein, D. Tilbury, T. S. Jayram, and J. Bigus, “Using Control Theory to Achieve Service Level Objectives in Performance Management”, Real Time Systems Journal, Vol. 23, No. 1-2, 2002.

[28]. Y. Diao, N. Gandhi, J. L. Hellerstein, S. Parekh, and D.M. Tilbury, “MIMO Control of an Apache Web Server: Modeling and Controller Design”, American Control Conf., 2002.

[29]. A. Kamra, V. Misra, and E. M. Nahum, “Yaksha: A Self-tuning Controller for Managing the Performance of 3-tiered Web Sites”, Proc. 12th IEEE Int. Workshop on Quality of Service, June, 2004.

[30]. Y. Lu, C. Lu, T. Abdelzaher, and G. Tao, “An Adaptive Control Framework for QoS Guarantees and its Application to Differentiated Caching Services”, Proc. 10th IEEE Int. Workshop on Quality of Service, May 2002.

[31]. P. J. King and E. H. Mamdani, “Application of Fuzzy Algorithms for Control Simple Dynamic Plant”, IEE Proc., Control Theory App., Vol. 121, pp. 1585-1588, 1974.

[32]. Y. Diao, J. L. Hellerstein, S. Parekh, “Using Fuzzy Control to Maximize Profits in Service Level Management”, IBM Systems Journal, Vol. 41, No. 3, 2002.

[33]. B. Li and K. Nahrstedt, “A Control-based Middleware Framework for Quality of Service Adaptations”, Communications, Vol. 17, pp. 1632-1650, 1999.

[34]. M. H. Suzer and K. D. Kang, “Adaptive Fuzzy Control for Utilization Management”, IEEE Int. Symp. on Object/Component/Service-oriented Real-time Distributed Computing, 2008.

[35]. S. Park and M. Humphrey, “Feedback-controlled Resource Sharing for Predictable eScience”, Proc. ACM/IEEE Conf. on Supercomputing 2008.

junwei cao and wen zhangcaoj/pub/doc/jcao_j_control.pdfsources. ligo data analysis streams terabytes...

Documents