evaluating the cost-benefit of using cloud computing to extend the capacity of clusters presenter:...
TRANSCRIPT
Evaluating the Cost-Benefit of Using Cloud Computing to
Extend the Capacity of Clusters
Presenter: Xiaoyu Sun
Cluster Computing
Users have to know cluster very well
administrative privileges
What is Cloud Computing?
Cloud computing provides computation, software, data access, and storage resources without requiring cloud users to know the location and other details of the computing infrastructure.
Empowerment Users control resource by themselves not by a centralized IT service
Agility users' ability to re-provision technological infrastructure resources.
Application Programing Interface Cost Device and Location Independence
enable users to access systems using a web browser regardless of their location or what device they are using
Virtualization servers and storage devices to be shared and utilization be increased
Reliability and Scalability Performance
Monitor by web services as the system interface Security
providers are able to devote resources to solving security issues that many customers cannot afford Maintenance
Applications don’t need to be installed on each user's computer and can be accessed from different places
Characteristics of Cloud Computing
Describe a system that enables an organization to augment its computing infrastructure by allocating resources from a Cloud provider.
Provide various scheduling strategies that aim to minimize the cost of utilizing resources from the Cloud provider.
Evaluate the proposed strategies, considering different performance metrics; namely average weighted response time, job slowdown, number of deadline violations,
number of jobs rejected, and the money spent for using the Cloud.
Purpose
Cloud Computing
Figure 1:The resource provisioning scenario
Strategy sets
Conservative each request is scheduled when it arrives in the system, and requests are
allowed to jump ahead in the queue if they do not delay the execution of other requests.
Aggressive Only the request at the head of the waiting queue called the pivot is
granted a reservation. Other requests are allowed to move ahead in the queue if they do not delay the pivot.
Selective Requests are given reservations if they have waited long enough in the
queue. Long enough is determined by the requests’ expansion factor:
Xfactor = (wait time + run time)/run time (1) The threshold is given by the average slowdown of previously
completed requests.
Backfilling Policies
Naïve: Both site and cloud schedulers use
Conservative backfilling to schedule the requests
The redirection algorithm is executed at the arrival of each job at the site
Use cloud provider when the request cannot start immediately on local cluster
Strategy Sets
Shortest Queue: Aggressive backfilling First-Come-First-Served (FCFS) manner At the arrival or complete of each job at the
site Compute the ratio of number of VMs required
by requests to the number of VMS available Redirect request if cloud provider’s number is
smaller
Strategy Sets
Weighted Queue: Aggressive backfilling First-Come-First-Served (FCFS) manner Number of VMs that can be borrowed from
cloud provider is the number of VMs required by requests minus VMs in use
Strategy Sets
Selective Selective backfilling Compute the ratio of number of VMs required
by requests to the number of VMS available When the request’s xFactor exceeds the
threshold, the scheduler makes a reservation at the place that provides the earliest start time.
Strategy Sets
Simulation of two-month-long periods SDSC Blue Horizon machine with 144 nodes
Number of VMs Price of a virtual machine per hour
Amazon EC2’s small instance: US $0.10 Network and storage are not considered
Values are average of 5 simulation runs
Experiments
Average Weighted Response Time(AWRT) of site k:
Tk: requests submitted to site k
Pj: the runtime of request j
mj: the number of VMs required by request j
ctj: request j’s completion time
stj: the submission time of request j
Performance Metrics
Performance Improvement Cost of a strategy set st:
Amount spent is the amount spent running virtual machines on the Cloud provider
AWRTbase is the AWRT achieved by a base strategy(FCFS with aggressive backfilling) that schedules requests using only the site's resources
AWRTst is the AWRT reached by the strategy st when Cloud resources are also utilized.
Performance Metrics
Using Lublin99's model to generate different workloads:
Umed: the mean number of virtual machines required by a request to log2m-umed where m is the maximum number of virtual machines allowed in the system, from 1.5 to 3.5.
Barr: the inter-arrival time of requests at rush hours, from 0.45 to 0.55 .
PB: the proportion p of the first gamma in Lublin99's model is given by p = pa * nodes + PB, from 0.5 to 1.0.
Performance Improvement Cost
Performance Improvement Cost
These three graphs show the site's utilization using the base aggressive backfilling strategy without Cloud resources
The larger the value of Umed, the smaller the requests.
The larger the value of PB, the smaller the duration of the requests
Performance Improvement Cost
Requests’ size Requests’ arrive time
Requests’ duration
Users may have stringent requirement on when the virtual machines are required
Deadline constrained requests have: Ready time Duration Deadline
Cost of using Cloud resources used to meet requests’ deadlines and decrease the number of deadline violations and request rejections
Deadline Constrained Applications
Conservative both local site and Cloud schedule requests using conservative
backfilling. Places a request where it achieves the best start time If rejections are allowed and deadline cannot be met, reject the
request Aggressive
both local site and Cloud use aggressive backfilling to schedule requests
Earliest Deadline First If request deadlines are broken in the local cluster, try the cloud
provider If rejections are allowed and deadlines are broken, reject the request
Deadline Aware Strategies
The non-violation cost is given by:
Where: Amount_spentst: amount spent with Cloud resources
Violbase: the number of deadline violations under the base strategy set (aggressive backfilling and an Earliest Deadline First manner)
Violst:the number of deadline violations under the evaluated strategy set
Cost of Reducing Deadline Violations
The deadline calculation is given by:
Where: stj: the request j's submission time
ctj: the completion time.
taj: the difference between the request's completion and submission times.
sf : a stringency factor that indicates how urgent the deadlines are.
Deadline calculation
Cost of Reducing Deadline Violations
sf=0.9 sf=1.3 sf=1.7
Cost of Reducing Deadline Violations
Tight deadlines
Normal deadlines
Relaxed deadlines
Cost to Reduce Job Rejections: Aggressive Strategy Set
Cost to Reduce Job Rejections: Aggressive Strategy Set
Different strategy sets can yield different ratios of performance improvement to money spent Naïve strategy has a higher performance
improvement cost Selective strategy provides a good ratio of money
spent to job slowdown improvement Using cloud provider to meet job deadlines
Less than $3,000 were spent to keep the number of rejections close to zero
Conclusions