Download - Automated Control for Elastic Storage
![Page 1: Automated Control for Elastic Storage](https://reader036.vdocuments.site/reader036/viewer/2022062816/568163c9550346895dd50402/html5/thumbnails/1.jpg)
1
Harold C. Lim, Shinath Baba and Jeffery S. Chase from Duke University
AUTOMATED CONTROL FOR ELASTIC STORAGE
Presented by: Yonggang LiuDepartment of Electrical and Computer Engineering,
University of Florida
![Page 2: Automated Control for Elastic Storage](https://reader036.vdocuments.site/reader036/viewer/2022062816/568163c9550346895dd50402/html5/thumbnails/2.jpg)
2
OutlineIntroductionSystem overviewSystem architecture and modeling
methodologiesEvaluationContribution and related workDiscussions and future work
![Page 3: Automated Control for Elastic Storage](https://reader036.vdocuments.site/reader036/viewer/2022062816/568163c9550346895dd50402/html5/thumbnails/3.jpg)
3
OutlineIntroductionSystem overviewSystem architecture and modeling
methodologiesEvaluationContribution and related workDiscussions and future work
![Page 4: Automated Control for Elastic Storage](https://reader036.vdocuments.site/reader036/viewer/2022062816/568163c9550346895dd50402/html5/thumbnails/4.jpg)
4
Introduction -Popularity of highly dynamic workloadsMany web-based services (especially Web
2.0) often experience rapid load surges and drops.One Facebook application saw an increase
from 25,000 to 250,000 users in 3 days, with up to 20,000 new users signing up per hour during peak times.
Elastic services offered by cloud computing becomes one solutionGrow/shrink service capacity dynamically as
the load changes.
![Page 5: Automated Control for Elastic Storage](https://reader036.vdocuments.site/reader036/viewer/2022062816/568163c9550346895dd50402/html5/thumbnails/5.jpg)
5
Introduction - Elasticity in cloud computing
Elasticity is one of cloud computing’s greatest features – Systems acquire and release resources in response to users’ dynamic workloads; users only pay for what they need.
SLAsWeb Services
Virtualization
Picture provided by Dr. Andy Li from UF
![Page 6: Automated Control for Elastic Storage](https://reader036.vdocuments.site/reader036/viewer/2022062816/568163c9550346895dd50402/html5/thumbnails/6.jpg)
6
Introduction -Topic of this paperThis paper addresses the challenges
associated with controlling the elastic storage in a data-intensive service, in cloud computing environment.
Intuitively, it does:If performance can not meet the Service
Level Objective (SLO) → grow storage capacity
If performance meets SLO, and system utilization is low → shrink storage capacity
![Page 7: Automated Control for Elastic Storage](https://reader036.vdocuments.site/reader036/viewer/2022062816/568163c9550346895dd50402/html5/thumbnails/7.jpg)
7
Introduction -Topic of this paperIn this paper, Hadoop Distributed File System
(HDFS) is employed as the storage system.When the controller increases the storage size:
Create new storage instancesMove storage data to the new instances (data
rebalancing)When the controller reduces the storage size:
Remove a certain number of storage instancesSome storage data on existing nodes get replicated
because the replica number is lower than the replica degree N. This is automatically done by DHFS.
![Page 8: Automated Control for Elastic Storage](https://reader036.vdocuments.site/reader036/viewer/2022062816/568163c9550346895dd50402/html5/thumbnails/8.jpg)
8
OutlineIntroductionSystem overviewSystem architecture and modeling
methodologiesEvaluationContribution and related workDiscussions and future work
![Page 9: Automated Control for Elastic Storage](https://reader036.vdocuments.site/reader036/viewer/2022062816/568163c9550346895dd50402/html5/thumbnails/9.jpg)
9
System overviewWhat is the big picture
Controller
Cloud Provider (Amazon EC2)
Web Tier (Apache server)Application Tier (Facebook
core)Storage Tier (Hadoop DFS)
Elastic Service
Clients
Sensor
Actuator
Gathermeasurements
Manage instances
Sensors highersystem load
Create more storageinstances, and rebalance data
Suppose we are hosting the Facebookserver on amazon EC2 instances, withthe proposed control techniques.
Sensors lowersystem load
Remove somestorage nodes
![Page 10: Automated Control for Elastic Storage](https://reader036.vdocuments.site/reader036/viewer/2022062816/568163c9550346895dd50402/html5/thumbnails/10.jpg)
10
System overviewChallenges in elastic storage controlControlling elastic storage involves many
challenges:Data Rebalancing. The newly added storage
nodes will not be effective until data rebalancing is done.
Interference to Guest Service. Data rebalancing also consumes the system resources.
Actuator Delay. The controller must consider the delay of the control operations, otherwise it may response too late or become unstable.
![Page 11: Automated Control for Elastic Storage](https://reader036.vdocuments.site/reader036/viewer/2022062816/568163c9550346895dd50402/html5/thumbnails/11.jpg)
11
OutlineIntroductionSystem overviewSystem architecture and modeling
methodologiesEvaluationContribution and related workDiscussions and future work
![Page 12: Automated Control for Elastic Storage](https://reader036.vdocuments.site/reader036/viewer/2022062816/568163c9550346895dd50402/html5/thumbnails/12.jpg)
12
System architectureThe controller is composed by:
Horizontal Scale Controller (HSC) - responsible for growing and shrinking the number of storage nodes.
Data Rebalance Controller (DRC) - controlling the data transfers to rebalance the storage tier after it grows or shrinks.
State machine - coordinating the actions of the HSC and the DRC.
![Page 13: Automated Control for Elastic Storage](https://reader036.vdocuments.site/reader036/viewer/2022062816/568163c9550346895dd50402/html5/thumbnails/13.jpg)
13
System architecture -Horizontal Scale Controller (HSC)Actuator: The HSC uses cloud APIs to
change the number of active server instances.
Sensor: The paper uses CPU utilization on the storage nodes as the sensor feedback metricIt is easy to measure, and strongly correlated
to overall response time of the Cloudstone benchmark when the bottleneck is on the storage tier.
![Page 14: Automated Control for Elastic Storage](https://reader036.vdocuments.site/reader036/viewer/2022062816/568163c9550346895dd50402/html5/thumbnails/14.jpg)
14
Modeling methodology -System model without controllerThe system without a controller can be described as this
graph:
U(z): Input to the system, the number of storage instances.D(z): The effect of client workload variance on the value of
storage instance number.V(z): The effective number of storage instancesY(z): The Output of the system, the CPU utilization on
storage nodes.G(z): The transfer function of the storage system.
G(z)U(z) Y(z)++ V(z)
D(z)
![Page 15: Automated Control for Elastic Storage](https://reader036.vdocuments.site/reader036/viewer/2022062816/568163c9550346895dd50402/html5/thumbnails/15.jpg)
15
Modeling methodology -Controller - Integral controlControl Policy (K): Integral control
- the integral gain parameter. - the current sensor measurement. - the desired reference sensor
measurement, which is 20% CPU utilization for 3 second average response time.
G(z)R(z)
K(z)+-E(z) U(z) Y(z)
++ V(z)D(z)
![Page 16: Automated Control for Elastic Storage](https://reader036.vdocuments.site/reader036/viewer/2022062816/568163c9550346895dd50402/html5/thumbnails/16.jpg)
16
Modeling methodology -Controller - discrete control functionsBecause discrete actuators (instance
number) are used in the system, the paper generates the following discrete control functions:
and are the higher and lower thresholds for CPU utilization .
Only when (under-provisioned) or (over-provisioned), , i.e., the controller adds/removes the storage instances.
![Page 17: Automated Control for Elastic Storage](https://reader036.vdocuments.site/reader036/viewer/2022062816/568163c9550346895dd50402/html5/thumbnails/17.jpg)
17
Modeling methodology -Proportional thresholdingHow to set and ?
They can’t be static, because for a cluster of size N, adding/removing a node affects 1/N of the total capacity.
“Proportional thresholding” mechanism:Set , and vary to vary the range.Suppose “workload” is the per-node
workload and we have N instances. We get
Suppose , we get
![Page 18: Automated Control for Elastic Storage](https://reader036.vdocuments.site/reader036/viewer/2022062816/568163c9550346895dd50402/html5/thumbnails/18.jpg)
18
System architecture -Data Rebalance Controller (DRC)The DRC rebalances the layout of data in the system
after the number of storage nodes grows or shrinks.Rebalancing is a cause of actuator delay and
interference.Tuning knob of HDFS rebalancer:
Bandwidth b allocated to the rebalancer.Select b to control the tradeoff between lag and
interference.Big b - fast rebalance, serious impacts on normal
service.Small b - slow rebalance, not very disruptive to normal
service.
![Page 19: Automated Control for Elastic Storage](https://reader036.vdocuments.site/reader036/viewer/2022062816/568163c9550346895dd50402/html5/thumbnails/19.jpg)
19
Modeling Methodology -Modeling the impacts of bThe paper employed multi-variate
regression to decide b:The time to completion of rebalancing (Time)
as a function of the bandwidth throttle (b) and size of data to be moved(s): .
The impact of rebalancing on service response time (Impact) as a function of the bandwidth throttle (b) and per-node workload (l): .
Values of s and l are measured by sensors in DRC.
![Page 20: Automated Control for Elastic Storage](https://reader036.vdocuments.site/reader036/viewer/2022062816/568163c9550346895dd50402/html5/thumbnails/20.jpg)
20
Modeling Methodology -Balancing between lag and interferenceThe Data Rebalance Controller poses the
choice of b as a cost-based optimization problem:
The ratio of can be specified by the guest
based on the relative preference towards Time over Impact.
![Page 21: Automated Control for Elastic Storage](https://reader036.vdocuments.site/reader036/viewer/2022062816/568163c9550346895dd50402/html5/thumbnails/21.jpg)
21
System architecture -State machineRecall that:
Horizontal Scale Controller (HSC) is used to increase/shrink the number of storage nodes
Data Rebalance Controller (DRC) is used to rebalance the storage after the changes in storage node size
They have mutual dependencies:After HSC adds a new storage node, the system cannot
obtain full service until DRC completes rebalancing.When one component is taking actions, the noise will be
introduced to the sensor measurements of the other one.To preserve stability during adjustments, a state
machine is employed to coordinate HSC and DRC to manage their mutual dependencies.
![Page 22: Automated Control for Elastic Storage](https://reader036.vdocuments.site/reader036/viewer/2022062816/568163c9550346895dd50402/html5/thumbnails/22.jpg)
22
System architecture -State machineThe following diagram shows the internal
state machine of the elasticity controller in the storage tier.
Horizontal Scale State
Rebalance state
Init
Storage tier configuration changed? No
Storage tier configuration
changed? Yes
Rebalancing done? Yes
Rebalancingdone? No
Elasticity Controller
Storage Tier
![Page 23: Automated Control for Elastic Storage](https://reader036.vdocuments.site/reader036/viewer/2022062816/568163c9550346895dd50402/html5/thumbnails/23.jpg)
23
OutlineIntroductionSystem overviewSystem architecture and modeling
methodologiesEvaluationContribution and related workDiscussions and future work
![Page 24: Automated Control for Elastic Storage](https://reader036.vdocuments.site/reader036/viewer/2022062816/568163c9550346895dd50402/html5/thumbnails/24.jpg)
24
Evaluation -Experimental TestbedThe paper employs CloudStone to run with GlassFish
as the front-end application server tier.CloudStone: a flexible Web 2.0 benchmark generatorGlassFish: an open source application server project
HDFS is used for the storageHDFS is modified to expose the rebalancer’s bandwidth
throttle b as an actuator to the external controller.The paper implements a local ORCA cluster as the
cloud infrastructure providerORCA: A resource control framework that provides a
resource leasing service; guests can lease resources from a substrate resource provider, such as a cloud provider
![Page 25: Automated Control for Elastic Storage](https://reader036.vdocuments.site/reader036/viewer/2022062816/568163c9550346895dd50402/html5/thumbnails/25.jpg)
25
Evaluation -Experimental TestbedThe experimental service cluster:
A group of servers running on a local network.To fully explore the effects of the storage tier:
Other tiers are statically over-provisioned.The storage tier nodes:
Dynamically allocated virtual machine instancesThey all have fixed resource configurations:
30 MB disk space; 512 MB RAM; single disk arm; 2.8 GHz CPU.
HDFS is preloaded with at least 36 GB data.
![Page 26: Automated Control for Elastic Storage](https://reader036.vdocuments.site/reader036/viewer/2022062816/568163c9550346895dd50402/html5/thumbnails/26.jpg)
26
Evaluation - Controller EffectivenessStatic and dynamic resource previsioning
to load burst of 10 times at .
a1. CPU utilization - static
b1. Response time - static
a2. CPU utilization - dynamic
b2. Response time - dynamic
Target response time:3 seconds.Target CPU utilization:20%.
See from the figures:1. Dynamic provisioningis able to adapt to the load burst.2. Instance creation anddata rebalancing hascost and delay on effect.
![Page 27: Automated Control for Elastic Storage](https://reader036.vdocuments.site/reader036/viewer/2022062816/568163c9550346895dd50402/html5/thumbnails/27.jpg)
27
Evaluation - Controller EffectivenessStatic and dynamic resource previsioning
to small load increase of 35% at .
a1. CPU utilization - static
b1. Response time - static
a2. CPU utilization - dynamic
b2. Response time - dynamic
Target response time:3 seconds.Target CPU utilization:20%.
See from the figures:1. Dynamic provisioningis alert enough to adapt tothe small load increase.2. The cost and delay ofnode creation/rebalancingare smaller than the prev.
![Page 28: Automated Control for Elastic Storage](https://reader036.vdocuments.site/reader036/viewer/2022062816/568163c9550346895dd50402/html5/thumbnails/28.jpg)
28
Evaluation - Resource EfficiencyStatic and dynamic resource previsioning
to load decrease of 30% at .
a1. CPU utilization - static
b1. Response time - static
a2. CPU utilization - dynamic
b2. Response time - dynamic
Target response time:3 seconds.Target CPU utilization:20%.
See from the figures:1. Shrinking the storage size has much lower cost/delay than increasing it.2. During resizing process,There are almost no SLOviolations.
![Page 29: Automated Control for Elastic Storage](https://reader036.vdocuments.site/reader036/viewer/2022062816/568163c9550346895dd50402/html5/thumbnails/29.jpg)
29
Evaluation - Comparison of Rebalance PoliciesRecall that:
, monotone decreasing function of b., monotone increasing function of b.And we want to optimize for the cost
function:
![Page 30: Automated Control for Elastic Storage](https://reader036.vdocuments.site/reader036/viewer/2022062816/568163c9550346895dd50402/html5/thumbnails/30.jpg)
30
OutlineIntroductionSystem overviewSystem architecture and modeling
methodologiesEvaluationContribution and related workDiscussions and future work
![Page 31: Automated Control for Elastic Storage](https://reader036.vdocuments.site/reader036/viewer/2022062816/568163c9550346895dd50402/html5/thumbnails/31.jpg)
31
Contribution and related workThis paper is the first to address the problem of automated
control for elastic storage in cloud computing.SCADS is a related work dealing with dynamically scaling
a storage system. It uses machine learning to predict resource requirements.
Padala et al. proposed a decoupled architecture (between guest and cloud provider) for cloud computing. They did not consider the actuator constraints.
Aqueduct uses a feedback controller to throttle the rebalancing bandwidth usage to ensure the SLOs will not be violated. The rebalancing may be able to use very little bandwidth.
![Page 32: Automated Control for Elastic Storage](https://reader036.vdocuments.site/reader036/viewer/2022062816/568163c9550346895dd50402/html5/thumbnails/32.jpg)
32
OutlineIntroductionSystem overviewSystem architecture and modeling
methodologiesEvaluationContribution and related workDiscussions and future work
![Page 33: Automated Control for Elastic Storage](https://reader036.vdocuments.site/reader036/viewer/2022062816/568163c9550346895dd50402/html5/thumbnails/33.jpg)
33
Discussions and future workThe proposed modeling method is not able
to correctly handle workloads with transient noise, which is common in reality.Adding a filter module solves the problem:
H(z)W(z)
G(z)R(z)
K(z)+-E(z) U(z) Y(z)
++ V(z)D(z)
![Page 34: Automated Control for Elastic Storage](https://reader036.vdocuments.site/reader036/viewer/2022062816/568163c9550346895dd50402/html5/thumbnails/34.jpg)
34
Discussions and future workThe proposed model sets tight resource
allocation model. A small system load change often triggers adding/removing storage instances, which is very disruptive.Recall the proposed control function:
By setting lower or higher (not exceed ), we prevent the system from changing frequently.
The drawback of this approach: The system will be under-provisioned to some
extent.
![Page 35: Automated Control for Elastic Storage](https://reader036.vdocuments.site/reader036/viewer/2022062816/568163c9550346895dd50402/html5/thumbnails/35.jpg)
35
Discussions and future workMake the resource configuration of newly
created storage instances tunable.Resizing storage size by adding/removing
storage instances with flexible resource configuration.
Optimizing the system by exploring the capacity and efficiency of individual storage instances, rather than storage instance amount.
This requires investigating the performance of storage nodes under different setups: disk size, CPU frequency, RAM size, etc.
![Page 36: Automated Control for Elastic Storage](https://reader036.vdocuments.site/reader036/viewer/2022062816/568163c9550346895dd50402/html5/thumbnails/36.jpg)
36
THANK YOU!