comparisonofauto-scalingpoli- …1283941/fulltext01.pdftions using auto-scaling, a technique where...

Linköpings universitetSE–581 83 Linköping

+46 13 28 10 00 , www.liu.se

Linköping University | Department of Computer and Information Science

Master thesis, 30 ECTS | Software Engineering

202019 | LIU-IDA/LIU-IDA/LITH-EX-A–18/054–SE-EX-A--2019/001--SE

Comparison of Auto-Scaling Poli-cies Using Docker SwarmJämförelse av autoskalningpolicies med hjälp av Docker Swarm

Henrik Adolfsson

Supervisor : Vengatanathan KrishnamoorthiExaminer : Niklas Carlsson

http://www.liu.se

Upphovsrä

De a dokument hålls llgängligt på Internet – eller dess fram da ersä are – under 25 år från pub-liceringsdatum under förutsä ning a inga extraordinära omständigheter uppstår. Tillgång ll doku-mentet innebär llstånd för var och en a läsa, ladda ner, skriva ut enstaka kopior för enskilt brukoch a använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring avupphovsrä en vid en senare dpunkt kan inte upphäva de a llstånd. All annan användning av doku-mentet kräver upphovsmannens medgivande. För a garantera äktheten, säkerheten och llgäng-ligheten finns lösningar av teknisk och administra v art. Upphovsmannens ideella rä innefa ar räa bli nämnd som upphovsman i den omfa ning som god sed kräver vid användning av dokumentetpå ovan beskrivna sä samt skyddmot a dokumentet ändras eller presenteras i sådan form eller i så-dant sammanhang som är kränkande för upphovsmannens li erära eller konstnärliga anseende elleregenart. För y erligare informa on om Linköping University Electronic Press se förlagets hemsidah p://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet – or its possible replacement – for aperiod of 25 years star ng from the date of publica on barring excep onal circumstances. The onlineavailability of the document implies permanent permission for anyone to read, to download, or toprint out single copies for his/hers own use and to use it unchanged for non-commercial research andeduca onal purpose. Subsequent transfers of copyright cannot revoke this permission. All other usesof the document are condi onal upon the consent of the copyright owner. The publisher has takentechnical and administra ve measures to assure authen city, security and accessibility. According tointellectual property law the author has the right to be men oned when his/her work is accessedas described above and to be protected against infringement. For addi onal informa on about theLinköping University Electronic Press and its procedures for publica on and for assurance of documentintegrity, please refer to its www home page: h p://www.ep.liu.se/.

© Henrik Adolfsson

http://www.ep.liu.se/

http://www.ep.liu.se/

Abstract

When deploying software engineering applications in the cloud there are two similar soft-ware components used. These are Virtual Machines and Containers. In recent yearscontainers have seen an increase in popularity and usage, in part because of tools such asDocker and Kubernetes. Virtual Machines (VM) have also seen an increase in usage asmore companies move to solutions in the cloud with services like Amazon Web Services,Google Compute Engine, Microsoft Azure and DigitalOcean. There are also some solu-tions using auto-scaling, a technique where VMs are commisioned and deployed to as loadincreases in order to increase application performace. As the application load decreasesVMs are decommisioned to reduce costs.

In this thesis we implement and evaluate auto-scaling policies that use both Virtual Ma-chines and Containers. We compare four different policies, including two baseline policies.For the non-baseline policies we define a policy where we use a single Container for everyVirtual Machine and a policy where we use several Containers per Virtual Machine. Tocompare the policies we deploy an image serving application and run workloads to testthem. We find that the choice of deployment strategy and policy matters for responsetime and error rate. We also find that deploying applications as described in the methodis estimated to take roughly 2 to 3 minutes.

Acknowledgments

I want to thank my family and my beloved Sofia for helping me through this thesis. I alsowant to thank Niklas Carlsson and Johan Åberg for providing guidance and help with writingthe thesis, as well as Raymond Leow for being my opponent and giving me great feedback. Ialso want to thank Sara Berg for reading my thesis and providing me with great feedback.

iv

Contents

Abstract iii

Acknowledgments iv

Contents v

List of Figures vii

List of Tables x

Listings xi

1 Introduction 21.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4 Delimitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Background 42.1 Briteback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3 Theory 53.1 Microservice Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53.2 Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63.3 Elasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.4 Measuring Elasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.5 Taxonomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.6 Cloud Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.7 Control Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.8 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4 Method 194.1 Sending Requests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.2 Image Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.3 DigitalOcean Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.4 Policy Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.5 Auto-Scaling Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

5 Results 305.1 DigitalOcean Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305.2 Download Times for Docker Image . . . . . . . . . . . . . . . . . . . . . . . . . . . 315.3 Optimal Number of Containers for Mixed Policy . . . . . . . . . . . . . . . . . . . 325.4 Startup and Shutdown Times for Containers . . . . . . . . . . . . . . . . . . . . . 335.5 Policy Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

v

5.6 Policy Experiments for Baseline and Comparison . . . . . . . . . . . . . . . . . . 43

6 Discussion 546.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556.3 Source Criticism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576.4 The Work in a Wider Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

7 Conclusion 597.1 Discussion of Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

A Early Deployment Figures 61

Bibliography 63

vi

List of Figures

3.1 Microservices are built around single components that provide a standalone service,while monoliths are centralized systems. . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.2 A Docker command is issued to run containers on the local computer. Docker thendownloads an image from Docker Hub, the repository of Docker Images, and thenruns a set of containers from the Docker Image specification. . . . . . . . . . . . . . 8

3.3 An example with Docker Compose service definition. A is a network that connectsall four services to a common database. B is a network that connects a subset ofservices to the trusted store. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.4 Horizontal and vertical elasticity. Vertical means scaling individual capabilities ofcomputing resources and horizontal means scaling the number of computing resources. 10

3.5 A simplified image of the taxonomy of elastic systems described by Al-Dhuraibi etal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.6 Feedback controller of ElastMan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.7 Feedforward controller of ElastMan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.1 Exponentially rising in a few small steps. . . . . . . . . . . . . . . . . . . . . . . . . . 204.2 Linearily increasing the rate of accesses until dropping of at the end to the starting

intensity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204.3 Traffic is increased to an order of magnitude more in an instant. This continues for

15 minutes until it goes back to its starting value. . . . . . . . . . . . . . . . . . . . . 214.4 The image that was used during the auto-scaling experiment. . . . . . . . . . . . . . 214.5 The test program that tested startup time for VMs on DigitalOcean. . . . . . . . . 224.6 The test program that tested shutdown time for VMs on DigitalOcean. . . . . . . . 224.7 Expected scaling behaviour of the application . . . . . . . . . . . . . . . . . . . . . . . 264.8 The algorithm used by the scaling decider. How it scales up or down depends on

the currently active policy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.9 The classification of the system, using the taxonomy developed by Al-Dhuraibi et

al. The blue colored boxes represent the classification of the system. . . . . . . . . . 29

5.1 On the left: The CCDF of startup times for VMs on DigitalOcean. On the right:The same data but plotted on a logarithmic scale. . . . . . . . . . . . . . . . . . . . . 30

5.2 On the left: The CCDF of shutdown times for VMs on DigitalOcean. On the right:The same data but plotted on a logarithmic scale. . . . . . . . . . . . . . . . . . . . . 31

5.3 On the left: the CCDF for download times when downloading the image serverDocker Image. On the right: The CCDF for download times when downloadingthe image server Docker Image, plotted using a logarithmic scale on the Y-axis. . . 32

5.4 The actual response time for each container setup. . . . . . . . . . . . . . . . . . . . . 325.5 The response time for each container setup as relative to the minimum response

time for that column in the graph. Datapoints for 1 container is not visible for100 and 120 requests per second. Datapoint for 2 containers is not visible for 120requests per second. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

vii

5.6 The figure shows the time it takes for the containers to scale from 0 replicas to Nreplicas on a single node. The orange line is the median, the box shows the 95%range, the whiskers show the 99% range and the black circles shows the rest 1%. . 34

5.7 The figure shows the time it takes for the containers to scale from N replicas to 0replicas on a single node. The orange line is the median, the box shows the 95%range, the whiskers show the 99% range and the black circles shows the rest 1%. . 35

5.8 The figure shows the time it takes for the containers to scale from N-1 replicas toN replicas on a single node. The orange line is the median, the box shows the 95%range, the whiskers show the 99% range and the black circles shows the rest 1%. . 35

5.9 The figure shows the time it takes for the containers to scale from N replicas to N-1replicas on a single node. The orange line is the median, the box shows the 95%range, the whiskers show the 99% range and the black circles shows the rest 1%. . 36

5.10 The results of the VM Only policy with the Exponential Ramp workload. The topgraph shows the number of VMs and the error rate. The middle graph shows therequest rate as well as the maximum, average and minimum request time. Thebottom graph shows the distribution of containers over different nodes. . . . . . . . 38

5.11 The results of the VM Only policy with the Linear Rise, Fast Drop workload. Thetop graph shows the number of VMs and the error rate. The middle graph showsthe request rate as well as the maximum, average and minimum request time. Thebottom graph shows the distribution of containers over different nodes. . . . . . . . 39

5.12 The results of the VM Only policy with the Instant Traffic workload. The topgraph shows the number of VMs and the error rate. The middle graph shows therequest rate as well as the maximum, average and minimum request time. Thebottom graph shows the distribution of containers over different nodes. . . . . . . . 40

5.13 The results of the Mixed policy with the Exponential Ramp workload. The topgraph shows the number of VMs and the error rate. The middle graph shows therequest rate as well as the maximum, average and minimum request time. Thebottom graph shows the distribution of containers over different nodes. . . . . . . . 41

5.14 The results of the Mixed policy with the Instant Traffic workload. The top graphshows the number of VMs and the error rate. The middle graph shows the requestrate as well as the maximum, average and minimum request time. The bottomgraph shows the distribution of containers over different nodes. . . . . . . . . . . . . 42

5.15 The results of the Mixed policy with the Linear Rise, Fast Drop workload. Thetop graph shows the number of VMs and the error rate. The middle graph showsthe request rate as well as the maximum, average and minimum request time. Thebottom graph shows the distribution of containers over different nodes. . . . . . . . 43

5.16 The result of the Constant policy on a small VM with the Exponential Rampworkload. The middle graph shows the request rate as well as the maximum,average and minimum request time. The bottom graph shows the distribution ofcontainers over different nodes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5.17 The result of the Constant policy on a big VM with the Exponential Ramp work-load. The middle graph shows the request rate as well as the maximum, averageand minimum request time. The bottom graph shows the distribution of containersover different nodes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

5.18 The result of the VM Only policy on a small VM with the Exponential Rampworkload and late deployment policy. The middle graph shows the request rateas well as the maximum, average and minimum request time. The bottom graphshows the distribution of containers over different nodes. . . . . . . . . . . . . . . . . 46

5.19 The result of the VM Only policy on a big VM with the Exponential Ramp workloadand late deployment policy. The middle graph shows the request rate as well asthe maximum, average and minimum request time. The bottom graph shows thedistribution of containers over different nodes. . . . . . . . . . . . . . . . . . . . . . . 47

viii

5.20 The result of the Mixed policy on a small VM with the Exponential Ramp workloadand late deployment policy. The middle graph shows the request rate as well asthe maximum, average and minimum request time. The bottom graph shows thedistribution of containers over different nodes. . . . . . . . . . . . . . . . . . . . . . . 48

5.21 The result of the Mixed policy on a big VM with the Exponential Ramp workloadand late deployment policy. The middle graph shows the request rate as well asthe maximum, average and minimum request time. The bottom graph shows thedistribution of containers over different nodes. . . . . . . . . . . . . . . . . . . . . . . 49

5.22 The result of the Container Only policy on a small VM with the Exponential Rampworkload. The middle graph shows the request rate as well as the maximum,average and minimum request time. The bottom graph shows the distribution ofcontainers over different nodes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

5.23 The result of the Container Only policy on a big VM with the Exponential Rampworkload. The middle graph shows the request rate as well as the maximum,average and minimum request time. The bottom graph shows the distribution ofcontainers over different nodes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

5.24 The result of the Mixed policy on a small VM with the Exponential Ramp workloadusing the early deployment strategy. The middle graph shows the request rate aswell as the maximum, average and minimum request time. The bottom graphshows the distribution of containers over different nodes. . . . . . . . . . . . . . . . . 52

5.25 The result of the VM Only policy on a big VM with the Exponential Ramp workloadusing the early deployment strategy. The middle graph shows the request rate aswell as the maximum, average and minimum request time. The bottom graphshows the distribution of containers over different nodes. . . . . . . . . . . . . . . . . 53

A.1 The result of the VM Only policy on a small VM with the Exponential Rampworkload using the early deployment strategy. The middle graph shows the requestrate as well as the maximum, average and minimum request time. The bottomgraph shows the distribution of containers over different nodes. . . . . . . . . . . . . 61

A.2 The result of the Mixed policy on a big VM with the Exponential Ramp workloadusing the early deployment strategy. The middle graph shows the request rateas well as the maximum, average and minimum request time. The bottom graphshows the distribution of containers over different nodes. . . . . . . . . . . . . . . . . 62

ix

List of Tables

4.1 Scaling configuration for different scaling policies. . . . . . . . . . . . . . . . . . . . . 244.2 Standard Droplets offered by DigitalOcean . . . . . . . . . . . . . . . . . . . . . . . . 28

5.1 Average request times for 1, 2, 3, ..., 10 containers with request times between 5and 120 per second. Each row is a separate container count and each column is aseparate request-per-second count. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

5.2 Error rate and price for VM Only and Mixed policy. . . . . . . . . . . . . . . . . . . . 37

x

Listings

3.1 An example “Dockerfile” that creates an application extended from the Ubuntuimage and puts the application in /app, exposes port 80 (uses port 80 on thehost and sends anything on that port to the container). Lastly when the image isrun as a container it will run the run-script.sh with “start” as the first commandline argument. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.2 An example Docker Compose File (version 3) that creates a single service, host-ing a service called “rethink”. The Docker Image used is called “rethinkdb”and the ports 29015, 28015 and 8080 are used by the image. Additionally,/root/rethinkdb on the host is mapped to /data in the container . . . . . . . . . 8

xi

Glossary

This chapter lists different shortings or shorthands used in the text.

• API - Application Programming Interface. Interface for working with an applicationprogrammatically

• AWS Lambda - Amazon Web Services Lambda, An Amazon hosting service

• CLI - Command Line Interface

• CTMC - Continuous-Time Markov Chain

• EC2 - Amazon Elastic Compute Cloud

• NIST - National Institute of Standards and Technology

• PaaS - Platform as a Service (Amazon EC2, DigitalOcean, etc.)

• SLO - Service Level Objective

• VM - Virtual Machine, can also be referred to as “a node”

1

1 Introduction

The introduction will mention a brief background into the area of cloud computing. It will alsomention some results of previous research and the main motivation for studying this particulararea of research. Then it will discuss the aim of the thesis, list the research questions and endthe chapter with delimitations on the main work of the thesis.

1.1 Motivation

A new type of service has caught the eye of researchers and the IT industry in recent years.The service is called Platform as a Service (PaaS) and allows for renting compute capabilitiesin the cloud. Providers of these services are called Cloud Hosting Providers. This serviceallows customers to rent servers without having to tackle problems such as server uptime,component failure and hosting space. Companies like Amazon, Google, Microsoft, RedHatand DigitalOcean offer PaaS with different paid plans for different use cases. Previous researchhave shown that using a cloud computing solution provides benefits in form of: lowering cost,maintaining high resource utilization and being able to accommodate for a sudden burst inusage [1].

A common software architecture pattern used with hosting providers is the microservice ar-chitecture. It is a pattern in which services are split into different independent specializedservers, often smaller in size compared to using a single server. When hosting these serversin the cloud it is possible to allocate more or less resources to each individual VM that theserver runs on. The providers often have different price plans for different types of VMs andthe providers allow the type to be changed while applications are running. Changing the VMconfiguration can be done programmatically through an API, adding more VMs or removingVMs , and this creates an interesting research opportunity. Cloud hosting providers allow forusers of their service to both monitor activity and react to it automatically and this can beused to create tools for automatic scaling, also known as auto-scaling. It has been shown thattaking user performance requirements and economic factors into consideration in the scalingmechanism can reduce the cost in a cloud computing environment [2].

Studying the problem of scaling with dynamic resources has both academic purpose and eco-nomic purpose. As an academic it is interesting to understand good strategies of analyzing

2

1.2. Aim

activity and then scaling accordingly. It is a complex and non-trivial area of study, as it maydepend heavily on context. Nevertheless, this makes it interesting to study independent of thecontext it is studied in. We can compare this perspective to that of a business. As a businessit is important to be profitable and deliver a good service. The scaling of the servers willdirectly relate to both the quality of the service and the cost of running the service. Thus, fora business, it is interesting to study this problem because it allows the business to scale theirservers in a cost-conscious way.

There have been several attempts to find good algorithms and techniques to achieve good auto-scaling [3]. There are two major groups for auto-scaling techniques: proactive and reactive. Aproactive algorithm tries to predict the future from historical data while a reactive algorithmreacts to workload or resource utilization according to a set of predefined rules and thresholdsin real-time [4]. This thesis studies the effects of Container based scaling and Virtual Machinebased scaling using part of a real world application.

1.2 Aim

This thesis aims to compare auto-scaling policies using part of an application built with amicroservice architecture. The different policies for scaling differs in the number of containersand virtual machines they spawn.

1.3 Research Questions

• How does an auto-scaling policy based on static thresholds compare when scaling withdifferent policies, in regards to container and/or virtual machines, when it comes to costand SLO violations?

1.4 Delimitations

The thesis will focus on what is possible to achieve with DigitalOcean and Docker Swarm.Docker is a common tool for defining software containers and Docker Swarm is a tool forcreating and managing a fleet of containers. DigitalOcean is a cloud hosting provider thatallows users to rent VMs.

3

2 Background

This chapter describes important parts of the context for the study that is part of the thesis.

2.1 Briteback

The company that the experiments were performed at is called Briteback. Their main prod-uct is a communication platform for larger companies, realized as an application for severaldifferent platforms. Part of this application is used in the experiments, and so it is importantthat it is explained in sufficient detail.

Briteback has built their main application using microservices, meaning that they have manydifferent types of services that run independently. One of the services used in the Britebackapplication is the image service. This service is used to download, crop and scale imagesthat are later displayed in the application. This service will be the target of the auto-scalingexperiment of this thesis. All of Britebacks services are created using a tool called Docker andthey are deployed on a hosting provider called DigitalOcean. Docker is a tool for creating andrunning containers.

This study will be used by Briteback as groundwork for investigating an auto-scaling solutionfor the production environment of the company.

4

3 Theory

This chapter explains different concepts needed to understand how the method was formedand realized.

3.1 Microservice Architecture

Software can follow different types of high level architectural designs. Within web developmentit is common to talk about monolithic and microservice architectures. They are commonlycontrasted with each other because of their definitions. The National Institute of Standardsand Technology (NIST) defines microservices in the following way[5]:

Microservice: A microservice is a basic element that results from the archi-tectural decomposition of an application’s components into loosely coupled patternsconsisting of self contained services that communicate with each other using a stan-dard communications protocol and a set of well defined API’s, independent of anyvendor, product or technology.

NIST defines a microservice as a part of a bigger application. The individual service has tobe able to work, mostly, on its own. It needs to be connected with other microservices thattogether form the application, as visible to the end user.

The definition given by NIST fits well with the domain. The individual servers are of varyingsizes and each is characterized by its available interface (API) as well as the human-defined“aim” of the server. The definition also fits well with other research in the area, such as thedefinitions given by Dragoni et al. [6].

Monolithic applications, as opposed to microservices, are applications in which there is only oneserver managing the application. Even tough “monolithic” usually has negative connotationsfrom a programming perspective it is important to understand that a monolithic applicationis not, by definition, worse than a microservice architecture. Instead it has different benefits,drawbacks and challenges compared to a microservice architecture.

In comparing monolithic and microservice applications there is research that suggests microser-vices cost less when deployed in the cloud. Villamizar et al. found that they could reduce

5

3.2. Virtualization

infrastructure costs by up to 77% when deploying an application through AWS Lambda, whencomparing an application written with both a monolithic and microservice architecture [7].Dragoni et al. [6] also argue that a microservice architecture help in many modern softwareengineering related practices. Having a service with well-defined barriers help programmersfind bugs faster. Having several smaller services is ideal for containerization and thus improvesportability and security. Scaling several smaller servers allows for fine-tuning of the scalingand can save costs, rather than increasing the capability of every service which a monolith hasto do [6].

A conceptual image of what a microservice architecture looks like can be seen in Figure 3.1.The figure depicts a monolith architecture and a microservice architecture. The monolith hasall software components communicating with each other and access to the same database. Inthe microservice architecture each software component has their own (smaller) database andcan run fully independently. However, they need to communicate with other microservicesover the network using a standard protocol, such as HTTP.

Figure 3.1: Microservices are built around single components that provide a standalone service,while monoliths are centralized systems.

ECMAScript and NodeThe microservices that are examined as part of the thesis are written in the programminglanguage ECMAScript, more commonly refered to as JavaScript. These run as servers withthe help of Node.js, a runtime environment for JavaScript. Node is built on the V8 JavaScriptengine found in Google Chrome [8]. It is a programming environment heavily focused onasynchronous programming and non-blocking I/O. This makes it suitable as a runtime forcreating a scalable system [9]. Programming in JavaScript with Node suits both a monolithicapproach as well as a microservice approach.

3.2 Virtualization

Virtualization is a concept used in computing that means abstracting away from the underlyingarchitecture. Virtualization is used to sandbox different environments. A common use-caseis to run a Virtual Machine (VM) with a different Operating System (OS) than that of theoriginal machine. This allows for running a Linux machine inside of a Windows machine andvice versa. To refer to the actual machine running the virtualization environment we use theword “host” and to refer to the VM that runs we use “guest” and say that the guest machine(s)runs on the host. Intuitively, a virtualized environment should run slower, compared to runningon the host environment. While this is mostly correct, there are virtualization tools that runwith very little overhead [10].

6

3.2. Virtualization

Apart from virtual machines there is also a technique called containerization. Instead of usingvirtual machines you use “containers”. A key difference being that a VM runs on virtualhardware with its own kernel while a container runs the same OS as the host and only haslimited access to the host. Containers run faster compared to using a VM and also start faster[11]. There is also research suggesting that the difference between containers and running codedirectly on the host is negligible [12].

Virtualization has properties that are useful to software applications. In this section we men-tion some of the benefits and describe them briefly.

Portability: An application defined inside of a virtualized environment can run anywhere thevirtualized environment can run. This property becomes very useful when an application ishosted in the cloud, on hardware provided by someone else [13].

Security: Applications deployed inside a virtualized environment typically cannot access thehost. Because of the sandboxed environment an attacker typically has to compromise theapplication and then escape the (restricted) virtualized environment to compromise the host[13].

Reliability: With virtual machines and containerization the running environment can betreated as software, instead of being part of the hardware. A failure is then only a soft-ware failure and not a hardware failure. This makes restarting the virtualized environment avalid fallback tactic for crashes, which is much harder to realize when actual hardware fails[13].

DockerDocker is used to create software containers. It uses a scripting language to define what iscalled an image. The image defines what should exist in a container, if it should extend from aprevious image and how it should be configured. From this image the Command Line Interface(CLI) can spawn containers and once spawned the CLI can access the containers and providechanges to the running environment or run programs in it. Docker also has a repository ofshared images that can be used and extended upon, similar to open source communities suchas GitHub [13]. It is called Docker Hub. Figure 3.2 shows a conceptual image of how runningDocker containers on a single computer works. An example of an image definition is found inListing 3.1.

Docker can also run in a mode called swarm mode, or Docker Swarm. It allows for managinga set of containers each providing its own service, as well as running duplicates of services forredundancy. Built into Docker Swarm are features commonly used when deploying applica-tions, such as load balancing and automatic failover [14]. Docker Swarm acts as a deploymentagent, starting and stopping containers when it is told to. It decides where containers willrun, when they will start, and when to restart containers that have failed or crashed. It usesthe same API as the Docker CLI does, meaning that starting new containers work in the sameway as when a human would start a container through Docker CLI, like depicted in Figure3.2.

1 FROM ubuntu2 CWD /app3 COPY my−a p p l i c a t i o n / s r c /app/ s r c4 COPY my−a p p l i c a t i o n /run− s c r i p t . sh /app/run− s c r i p t . sh5 EXPOSE 806 ENTRYPOINT [”/ app/run− s c r i p t . sh ” , ” s t a r t ” ]

Listing 3.1: An example “Dockerfile” that creates an application extended from the Ubuntuimage and puts the application in /app, exposes port 80 (uses port 80 on the host and sends

7

3.2. Virtualization

Figure 3.2: A Docker command is issued to run containers on the local computer. Docker thendownloads an image from Docker Hub, the repository of Docker Images, and then runs a setof containers from the Docker Image specification.

anything on that port to the container). Lastly when the image is run as a container it willrun the run-script.sh with “start” as the first command line argument.

For this thesis we use Docker to build an application image. Generally, an image would be builtfor each microservice. Images can also be less specific, for example a web server or a database.The setup, which utilize swarm mode, is defined through a Docker Compose File. This is a filefollowing the YAML data standard and it defines a number of services run by Docker Swarm.Each service is based on an image, which in this case corresponds to the application images.Each service also has a number of properties that can be specified through the compose file,such as environment variables, container replicas, volumes, constraints and networks [15]. Anexample of a Docker Compose File can be seen in Listing 3.2.

1 v e r s i on : ”3”2 s e r v i c e s :3 r e th ink :4 image : reth inkdb5 por t s :6 − ”29015”7 − ”28015”8 − ”8080”9 volumes :

10 − / root / rethinkdb : / dataListing 3.2: An example Docker Compose File (version 3) that creates a single service, hostinga service called “rethink”. The Docker Image used is called “rethinkdb” and the ports 29015,28015 and 8080 are used by the image. Additionally, /root/rethinkdb on the host is mappedto /data in the container

8

3.2. Virtualization

Docker allows Compose Files to define volumes, directories mounted inside the container. Acontainer can save data to the host by saving it in the given volume. It also works in the reversedirection. A host can save data to a volume in order for the container to read information.Volumes are a common way of giving access to company secrets inside a container. Secretswould be data, such as SSH keys or authentication tokens. Images are stored in the cloud andthus you do not want to store your secret inside the image, as anyone with the image can runthe container [15].In a Compose File it is also possible to define constraints, when working in swarm mode. Con-straints are general constraints on where to run a service. For example, running a database onthe weakest of four nodes may hamper application performance if the database is a bottleneck.Or perhaps the application is CPU intensive and it is important to run the CPU intensivetasks on the node with the most CPU’s. This can be achieved with constraints by specifyingthat a service needs a minimal amount of CPU/memory or specify a node that it runs on.Labels can manually be added to nodes through the Docker CLI and constraints can specifylabels that they run on [15].Compose Files allow for custom networks to be defined as well. These are networks that joinservices together. For example an application may have a database which every other serviceneeds to be able to connect to but it also has a trusted store, which sensitive information iskept in, that can only be accessed by a few of the other services. In that case it would bepossible to create two different networks, one with the database and one with the trusted store.In the Compose File you would then specify that the services with permission to access thetrusted store is part of the same network as the trusted store [15]. This scenario is depictedin Figure 3.3.

Figure 3.3: An example with Docker Compose service definition. A is a network that connectsall four services to a common database. B is a network that connects a subset of services tothe trusted store.

In order to run the configurations in a Compose File you can deploy it to a Docker Swarm.The swarm will then try to deploy all the containers in a way that satisfies the constraints andrequirements of the Compose File. The file can be updated and redeployed and the swarmwill try to update the running services in the swarm. It is also possible to manually updateparameters of the compose file, for example scaling the number of container replicas for asingle service [14].

9

3.3. Elasticity

Figure 3.4: Horizontal and vertical elasticity. Vertical means scaling individual capabilities ofcomputing resources and horizontal means scaling the number of computing resources.

3.3 Elasticity

There are many definitions of elasticity. Here we will list key elements of elasticity and comparethe literature in the area.

Horizontal elasticity: Increase or decrease the number of computing resources.

Vertical elasticity: Increase or decrease the capacity of available computing resources.

The definitions of horizontal and vertical elasticity are taken from Al-Dhuraibi et al. [4]. Hor-izontal elasticity and vertical elasticity are orthogonal concepts in nature and cloud providerscan provide both at the same time. It is also possible to impose restrictions on how the scalingworks. For example imposing restrictions on vertical elasticity to only work between comput-ing resources of the same pricing. Even though the verticality of the elasticity is constrainedit would still be vertical elasticity. The concept of vertical and horizontal elasticity is shownin Figure 3.4.

Over-provisioning: auto-scaling that has resulted in having a higher supply of processing powerthan demand.

Under-provisioning: auto-scaling that has resulted in having a higher demand than availableprocessing power.

These two definitions are given by Al-Dhuraibi et al. [4]. This is not the only definition of Over-provisioning and Under-provisioning. Ai et al. [16] define three states of the system insteadof two. Apart from the above it defines the Just-in-Need state [16]. This state is supposed tocapture when the application runs at optimal scale, the supply matches the demand closely.However, an issue with the definitions of Ai et al. [16] is how the states are derived fromthe number of requests to the system and the number of VMs the system has available. TheJust-in-Need state is defined as a ⋅ i < j ≤ b ⋅ i where i is the number of requests, j is the numberof available VMs and a and b are constants such that a < b. In the paper they specifically usethe value 1 and 3 for a and b, respectively. These values are not justified but it is mentionedthat they will need to be modified depending on the cloud platform and context.

Scalability: The ability of the system to sustain increasing workloads by increasing resources.

10

3.4. Measuring Elasticity

Automation: The degree of which the system is able to scale without interaction from some-thing that is not part of the system.

Optimization: The degree of optimization for the application run by the system.

The above definitions are used by Al-Dhuraibi et al. to define and summarize elasticity [4].Their paper defines elasticity as the combination of Scalability, Automation and Optimization.For elasticity you need auto-scaling, or you cannot handle an ever increasing workload, butan important part of the application is also the optimization of the application itself. If theapplication is not built to scale then it is hard to achieve an elastic application with onlyauto-scaling. Even tough there are limits to elastic provisions with inelastic of unoptimisedprograms, there is research suggesting non-elastic programs can become elastic with softwaretools for elastic configuration during runtime [17].

A lot of research has focused on a working implementation that utilizes some elastic conceptsand exploring ways in which to achieve elasticity [1], [2], [16]–[30].

Elasticity ChallengesElasticity is a large area, covering many different aspects of software engineering. The defini-tion by Al-Dhuraibi et al. [4] show that it covers not only the infrastructure that the systemruns on, but on the code that is running as well. Roy et al. [23] list three main challenges forelastic resource provisioning with component-based systems. These are:

Workload Forecasting: By forecasting and predicting the workload it is possible to commisioncomputing resource just in time of need, instead of commisioning at the moment they areneeded. The challenge becomes predicting when they are needed, forecasting how much stressthe servers will be put under.

Identify Resource Requirements for Incoming Load: The system needs to be able to accu-rately calculate the necessary resources for a load in order to avoid under-provisioning orover-provisioning.

Resource Allocation while Optimizing Multiple Cost Factors: A problem that occurs with theoverhead of resource provisioning is that it is impossible to follow the load exactly, which oftenfluctuates and can change drastically. Optimizing over such uncertainty and with the overheadconstraints makes for a difficult problem to solve.

There is data suggesting the CPU-utilization is quite low for a lot of systems, with estimatesaround an average CPU-utilization of 15-20% [31].

3.4 Measuring Elasticity

An approach to measuring elasticity is defining the three states of over-provising, under-provisioning and just-in-need, as previously mentioned. The merits of this approach is thatit is simple to create a metric, a comparable value, between two different solutions. Thedrawback is that the states are complex to define. Ai et al. [16] used this definition andcreated an elasticity value, representing the time a system is kept within the Just-in-Needstate. They also tested their definition using a queueing model with a modified continuous-time Markov chain (CTMC). The CTMC modelled an underlying system and they tweakedvalues of the CTMC in order to see how the elasticity would be affected.

Ai et al. [16] prove, for their model, that it is a plausible model, mathematically. They alsoanalyze the computed elasticity for different values of input variables, for example varying VMstartup time. They also simulate their model using a simulator built in C++ and find thatthe simulation differs from the CTMC model with less than 1% difference. Apart from this

11

3.5. Taxonomy

they also run experimental tests and find that the experiments differs from their model withat most 3.0%.

Working with the elasticity definition presented by Ai et al. [16] has both positive and negativeaspects. The positives are that it produces a comparable value between different algorithmsand that the model developed by Ai et al. [16] is sound and tested mathematically, in sim-ulation and experimentally. However, it is not perfect. Part of the calculations made by Aiet al. [16] is how the three different states are defined. The definitions, even though they areagnostic to the auto-scaling algorithm, may yield different results for different values. Thisimplies that anyone using the elasticity value needs to make a subjective choice for how thestates are defined. The paper, unfortunately, provides no guidance on how best to definethe states. Apart from the definition of the states they also assume that the system to bescaled is centered around lending VMs to service requests, not requests to an application. Thearguments in the paper are made from the perspective of a cloud service provider, but can bemodified to be made from an application provider using a cloud service provider. The mainissue then is that the states cannot be defined with the relatively low boundaries that Ai etal. [16] use, but would be defined with request sizes several orders of magnitude larger.

3.5 Taxonomy

For the purpose of this thesis we will use the taxonomy provided by Al-Dhuraibi et al. [4] Theydefine seven different properties in order to classify an elastic system and in this section webriefly describe the taxonomy. It is divided into seven subsections, each describing a propertyfound in the work of Al-Dhuraibi et al. [4] The taxonomy is summarized in Figure 3.5.

ConfigurationConfiguration is a property that looks at how computing resources are offered by a cloudprovider. Two types of configuration are defined, rigid and configurable. A rigid configurationmeans that a customer can choose between different fixed types of computing resources. Forexample, DigitalOcean uses fixed instances where you buy nodes and each node has an hourlyprice. There are different types of nodes and each node has a specific number of virtual CPUs,a specific memory size and a specific bandwidth. A configurable configuration allows thecustomer to choose resources for a VM, for example the number of CPUs.

How resources are reserved is a property that falls under Configuration and Al-Dhuraibi etal. [4] define four standard ways of reserving resources as well as an “other” category. Thecategories are:

• On-demand reservation - Reservations are made as they are needed and result in eitherrejection of reservations or acceptance of reservations.

• In advance reservation - Reservations are made in advance of the necessary timeframeand are guaranteed to be available at the specified time.

• Best effort reservation - Reservations are queued and handled on a first-come-first-servebasis.

• Auction-based reservation - Reservations are made dynamically through bidding and willbe made available as soon as the customer wins a bid.

ScopeScope defines at what level, or which levels, scaling actions occur. For example scaling couldoccur inside an application, but it can also occur at infrastructure level. Al-Dhuraibi et al. [4]

12

3.5. Taxonomy

note that most of the elasticity solutions they examined provide solutions at the infrastructurelevel. An important concept to bring up before going into detail is the concept of stickysessions. Sticky sessions mean that an application carries state, meaning that applications canbe session based, and that sessions and their state needs to be preserved when performingscaling actions. An example would be an application which allows for video calls. If theapplication can scale with sticky sessions then video calls would not be interupted by thescaling mechanism, however if the scaling mechanism assumed a stateless application then thevideo call might suddenly end, or require complex client code to continue.

Al-Dhuraibi et al. [4] define the infrastructure level to have two main branches. One branchwhere containers are scaled and one where VMs are scaled. They also define embedded elas-ticity solutions, which are aware of the application in some way. Within embedded elasticitythey define two subcategories, application map and code embedded. Code embedded is whenthe elasticity solution is part of the application itself. The application uses internal metricsto decide on how to scale. The drawback of this approach is that it needs to be tailored foreach application as it is an integral part of the application. Application map is defined as anelasticity controller that has a complete view of the application, including the components ofthe application. Each component is either static or dynamic. Static components are launchedat the start of the application execution and dynamic components are launched as part of thescaling [4].

PurposeElasticity has different purpose depending on the goal with the application and its execution.For example improving performance and reducing costs can be interesting for the ones deploy-ing an application while quality of service is important for the cloud provider. Al-Dhuraibistates that “elasticity solutions cannot fulfill the elasticity purposes from different perspectivesat the same time”. For example one cannot seek to maximize performance while minimizingenvironmental footprint, as both cannot be achieved at the same time. However, purposesthat are not contradictory can be combined, such as quality of service and performance [4].

The number of purposes for using an elastic systems are many but some of them are:

• Availability - Ensure that the application is available under increased/decreased work-loads.

• Energy - Minimize the amount of energy used by an application.

• Performance - Increase/decrease performance of an application under certain conditions.

• Cost - Reduce the cost of the application.

• Capacity - Handle more users of the application simultaneously.

ModeMode defines how an application is scaled. An application can be scaled in three differentways. Automatically, manually or programmatically. In order for an application to be elasticit needs to scale automatically, the manual modes are not considered elastic. Manual modemeans that a user has an interface provided by the cloud provider that allows them to scaletheir application. Programmable mode means that the application is scaled through API calls.Al-Dhuraibi claims that programmable mode is incompatible with elasticity, because it is thesame as manual mode [4]. The reasoning is not clear because the focus is on the differentsubtypes of automatic mode and not elaborations on programmable mode.

Automatic mode can be divided into three main areas, each with their own approaches. Thethree areas are reactive, proactive and hybrid mode. Reactive means that the algorithm

13

3.5. Taxonomy

monitors events and tries to match supply with the demand of resources. Proactive meansthat the algorithm tries to analyze previous usage and detect patterns in demand, essentiallypredicting the demand for resources. Hybrid means that the solution uses a combination ofthe reactive and proactive approach to scale.

Reactive mode can be divided into two approaches, static and dynamic thresholds. Staticthresholds means that elasticity actions are performed when measured metrics reach staticthresholds. For example CPU utilization is above or below 50%. A lot of commerciallyavailable solution use this kind of metrics, such as Amazon EC2 and Kubernetes [32], [33].Dynamic thresholds are also called adaptive thresholds because they adapt to the state of thehosted application.

Proactive mode can be divided into many approaches. Al-Dhuraibi shows five different ap-proaches. Each one of them is listed and briefly described below.

• Time series analysis - Use gathered metrics to try and predict future metrics. Performelasticity actions based on comparing the predicted values to a set of thresholds andrules.

• Model solving mechanisms - Create a model of the application and find optimal rules forscaling using that model. An example model is Markov Decision Processes.

• Reinforcement learning - Create an agent that is responsible for scaling and receivesrewards and punishments depending on the performance of the application.

• Control theory - Use a mathematical model and a controller (P-, PI-, PD- or PID-controller) to determine when to scale.

• Queueing theory - Use a mathematical model with waiting time, arrival rate and servicetime to model the system and make elasticity actions from that model.

It is important to note that some approaches can work both reactively and proactively. Forexample Al-Shistawy and Vlassov created an elasticity manager that uses control theory forboth proactive and reactive decisions, resulting in a hybrid elasticity manager [29]. Theyreasoned that a reactive mode is good at handling unexpected workloads, such as unexpectedflash crowds, while a proactive is good at handling long-term predictions, for example recurringflash crowds during mid-day.

MethodMethod refers to which method is used to achieve elasticity. Al-Dhuraibi et al. [4] define twomain approaches to this, horizontal and vertical elasticity, and state positives and negativesabout the approaches. For this category there is also a possibility of a hybrid approach, butit is not further discussed.

Horizontal elasticity allows for new computing resources to be added. The drawback of hori-zontal elasticity is that it often works with very large increments of resources and fine-tuningthe resources is not always possible. Vertical elasticity, while not being as available as hor-izontal, allows for fine-grained tuning of resources [4]. It is also important to note that theway an application is built makes a difference for what method is most appropriate. If theapplication is highly distributed, then it would likely work well with horizontal scaling, but ifit has a lot of components that needs to communicate and are tightly interconnected, then itis likely that vertical elasticity would work better.

14

3.6. Cloud Computing

Figure 3.5: A simplified image of the taxonomy of elastic systems described by Al-Dhuraibi etal.

ArchitectureThe elasticity management solution can be architectured in one of two different ways. Ei-ther centralized or decentralized. A centralized architecture has one elasticity controller thatmakes elasticity scaling decisions while a decentralized one has several, independent ones. Adecentralized elasticity manager also needs to have an arbiter component that is responsiblefor allocating resources to the controllers and the different system components [4].

ProviderThis property denotes whether the elasticity manager uses a single or several cloud providers,if they are in different regions, different data centers etc.

3.6 Cloud Computing

There have been several attempts at defining cloud computing. NIST has provided a definition,just like they have with virtualization. NIST defines five properties that cloud computing musthave [34]. These terms are listed and explained briefly:

• On-demand self service - A user of the service can automatically ask for up/downgradedservice without human interaction.

• Broad network access - The provided service is available over the network over standardprotocols. This allows any internet connected device to access it.

• Resource polling - Customers of the service are pooled together on the same machine.Additionally the physical location of the machine should not be apparent to the user butit can be visible on a higher level, such as region or country.

• Rapid elasticity - The service should allow for scaling the computational power providedto the user.

• Measured service - How the resources for a single user are used are both gathered andpresented to the user.

Cloud computing is a necessary prerequisite for creating an auto-scaling mechanism and anelastic platform. The most important properties for elasticity and auto-scaling is On-demandself service as well as Rapid elasticity.

For cloud service providers it is also important to look at their cost models and their pricemodels. Genaud et al. [25] has shown that it is possible with client-side provisioning, depending

15

3.7. Control Theory

on the pricing model, to optimise for different types of workloads . One of the key elements intheir paper is that the cost model works in discrete increments of an hour, that is you pay foreach started hour you rent. That way it is possible to fit several workloads within the samehour, while only paying for a single hour. They find that different strategies have differentbenefits and it is possible for a user to select a strategy based on their need.

3.7 Control Theory

Control theory is an area in which input signals are analyzed and output is generated fromthem. A widely used controller is the Proportional-Integral-Derivate (PID) controller. Thiscontroller continuously calculates the deviation from a desired reference signal r(t), or constantr(t) = c, and adjusts the system based on the following formula:

u(t) =KP e(t) +KI ∫t

0e(T )dT +KD

d

dte(t). (3.1)

The above three terms corresponds to the proportional, integral, and derivative terms (P, I andD respectively) captures different aspects of these deviations. However, all three terms use theerror signal e(t) at timepoint t. This signal can be calculated as e(t) = y(t)−r(t); the differencebetween the actual signal and the reference signal. The reference signal is a predefined signaldefined by the one constructing the controller. KP , KI and KD are constants that are usedto change the behaviour of the controller. Constants can be zero and in that case they createa controller that is easier to analyze and work with [35].

Additional properties of a dynamic system that is being controlled by a controller are:

• Stability - that the system settles for a final value with a constant reference signal

• Robustness - that the properties of the system will be similar even if the underlyingmathematical model is not correct and

• Observability - that the system can observe all relevant state of the system.

On the Use of Control Theory for Auto-ScalingA typical use of control theory is to handle and control flow of liquid to ensure that it flowsat a certain rate. The usage of a network application can be viewed as a flow of networkrequests and the scaling of the cloud computing service can be viewed as a parameter toincrease or decrease flow. The actual workings of the flow are complex and thus control theoryis a natural fit for adapting to an increase or decrease in flow. By acquiring measurements,such as average response time, an auto-scaling solution can aim at a reference response timeand scale accordingly.

Previous research in the field has established that using control theory for auto-scaling isa viable strategy. Serrano et al.[28] showed that Service Level Objectives (SLO) could bemapped to a utility function which can be used as input to a controller. They also showedthat their implementation would be reactive to events and change the number of rented nodes,in order to meet SLOs. Lim et al. [36] showed that it was possible to use a discrete versionof an integral controller in order to achieve auto-scaling, with regards to several different QoSmeasurements and SLOs.

3.8 Related Work

There is a lot of work in the area of cloud computing and especially regarding auto-scaling.Al-Dhurabi et al. [4] found that there are at least 15 different approaches to providing a

16

3.8. Related Work

Figure 3.6: Feedback controller of ElastMan

Figure 3.7: Feedforward controller of ElastMan

solution for the problem. One of the most common approaches was a threshold-based policyin which the service reacts against user defined rules in a ”If this then that” configuration.Marshall et al. [26] created such a system and tested it at Indiana University, University ofChicago and Amazon EC2. They created a tool in Python that allows system administratorsto define their own policies for up-scaling and down-scaling. They also made an evaluation ofthe different hosting options they had chosen. The evaluation was done by sending a numberof jobs to the server, each job sleeping for a certain amount of time. The study found thattheir solution could scale to over 150 nodes, it found that bootup times varied significantly andthat termination of nodes was comparatively fast and consistent compared to bootup. Bootuptimes ranged between 74 and 205 seconds, while termination regularly took 3 to 4 seconds[26].

Al-Shistawy et al. [29] created a controller for auto-scaling a key-value store service. Theycalled this system ElastMan. It consists of a feedback controller, a feedforward controllerand an overarching program that decides between which controller to use. They create theircontrolling model from an SLO they call R99p, which means the read operation latency for the99th percentile. They test the auto-scaling by sending it workloads with different intensitiesand show that their system can adjust to rapid changes in the environment, for example a flashcrowd. They also show that using this approach costs less than using just a constant numberof servers. The feedback controller is shown in Figure 3.6 and the feedforward controller isshown in Figure 3.7.

Vasić et al. [24] found that by analyzing the requests that come to a server and categorizingthem allows for strategies that greatly increase adaptation speed, the time it takes for anelastic system to adapt to a new situation. They also show that a quick adaptation speed iscorrelated in monetary saving, in regards to service provisioning costs.

Sharma et al. [22] have investigated the effect of utilizing a cost model for defining the auto-scaling algorithm and take into account the usage based pricing models that are often usedwith resources provisioning. They managed to achieve a cost reduction of 24% in a privatecloud setting and a 35% decrease on Amazon EC2. Their research suggests that in order tolower the cost of a system deployed in the cloud that uses elastic resource provisioning, theauto-scaling should have knowledge about the cost of provisioning.

Ashraf et al. [21] have created an approach for efficient auto-scaling called CRAMP (Cost-efficient Resource Allocation for Multiple web applications with Proactive scaling). It monitorsresource utilization metrics of the infrastructure that the system runs on and does not needmeasurements from the application in order to achieve elasticity. It also uses shared hosting

17

3.8. Related Work

of applications, running them on the same VM, lowering the number of VMs required. Bydecoupling the applications from running on a single VM and scaling the underlying VM withinformation about resource utilization on the VMs they achieve significantly improved QoSfor average response time and resource utilization. Ashraf et al. also created a user-admissionsystem that was able to reduce server overload while simultaneously reducing rejected sessions[19].

Iqbal et al. [20] have shown that a strategy that uses both proactive and reactive scaling,a hybrid approach, is viable. Their system utilizes a reactive approach for scaling up and aproactive (predictive) approach for scaling down. Scaling up is based on the CPU usage whilescaling down is based on an analytical model.

18

4 Method

In this chapter we present the method, including all the experiments we perform.

4.1 Sending Requests

During the experiment we will use a program that generates requests to the service and doesso at a controlled rate. The program we will use is called Siege.1 It is a command line programthat can be invoked to create several “users” that will connect to a given URL. Siege is usedfor testing load capacity of a server. It also records important statistics when running, whichis necessary for analyzing the performance of the application. Examples of these statistics are:minimum response time, maximum response time, average response time, number of failedrequests and total number of connections. When running Siege it is possible to tell it to runwith a certain number of users. Unfortunately there are no guarantees for the request intensitygenerated by Siege.

We define three different tests that all use a single workload each. The workloads are describedbelow.

WorkloadsExponential Ramp - A workload that rises in intensity exponentially, after a set time doublesin regular intervals, up to a threshold. The workload is described in Figure 4.1.

Linear Rise, Fast Drop - A workload that rises linearly, only to drop quickly. The workload isdescribed in Figure 4.2.

Instant Traffic - A workload that increase in a single step to an order of magnitude more inintensity, from 10 requests per second to 100. The workload is described in Figure 4.3.

The exact values to use for the size of workloads is not a trivial question. As described by Al-Dhuraibi et al [4], there is no standard method for evaluating the elasticity of a system. Theseworkloads were chosen because they offer a variety of access patterns. They are also syntheticand are thus easy to use. Lastly, it was observed that the tested application reached a high

1https://github.com/JoeDog/siege

19

4.1. Sending Requests

Figure 4.1: Exponentially rising in a few small steps.

Figure 4.2: Linearily increasing the rate of accesses until dropping of at the end to the startingintensity.

level of CPU-utilization at around 40-50 requests per second for the big VM size. Because ofthis the workloads have a maximum request rate of double that.

20

4.2. Image Server

Figure 4.3: Traffic is increased to an order of magnitude more in an instant. This continuesfor 15 minutes until it goes back to its starting value.

4.2 Image Server

In order to isolate the scaling we will only use a small part of the Briteback application. Wewill be using a modified image server, which retrieves images from the web, stores them locallyand performs cropping and scaling on images. We will use it in a mode that is CPU intensive,where the number of scaling operations on a container has been artificially inflated to createa bottleneck in CPU usage.

Siege is used to test the application and will send requests to fetch, crop and scale the imageshown in Figure 4.4. This image is served by Linköpings University through the liu.se domain.

Figure 4.4: The image that was used during the auto-scaling experiment.

4.3 DigitalOcean Evaluation

In order to evaluate the startup and shutdown times of virtual machines we will run a perfor-mance evaluation of startup and shutdown times on DigitalOcean. These evaluations will usethe small VM type.

In order to test startup time we have developed a program that uses the DigitalOcean API inorder to provision a virtual machine and measure the time until the machine is listed throughthe API as being ready. A similar program has been developed that decommisions the virtualmachine and measures the shutdown time by checking when the machine is no longer listedthrough the API. The two programs are described briefly in Figure 4.5 and Figure 4.6.

21

4.3. DigitalOcean Evaluation

Figure 4.5: The test program that tested startup time for VMs on DigitalOcean.

Figure 4.6: The test program that tested shutdown time for VMs on DigitalOcean.

The goal of the evaluation is to find the distribution of startup and shutdown times for com-misioning and decommisioning VMs, respectively. In order to create a good scaling algorithmit is important to understand what impact different actions have and issuing new VMs is oneof the most important actions for the different policies.

Simply commisioning the VM is not the only event that needs to be measured. The applicationitself has an upstart time that comes from a combination of things. Firstly, the Docker Imagehas to be downloaded to the newly commisioned VM. Secondly, Docker Swarm needs to decidethat the container should run on the new VM. Thirdly, the application needs to start withinthe container. All of these contribute to a rather long upstart time. We will measure the timeit takes for the docker image to download, as this will be done once for any VM and takes abit of time. The expectation, with regards to what Marshall et al. found, is that it may takeup to 205 seconds [26].

In order to find a distribution of the download times we created a simple experiment. Theexperiment removes the image from the local Docker instance and then downloads it. It doesso several times and measures the time it takes for the image to be downloaded. This createsa probability distribution for the time it takes to download the image, which we present in theresult section. The experiment is done on the small VM type.

22

4.4. Policy Experiments

4.4 Policy Experiments

In this section we explain the experiments that relate to the different policies.

Policy Comparison ExperimentThe experiment will begin by deploying the image processing service together with the auto-scaling. It will continue with a workload being initiated by Siege. After Siege has completed,the application will then be allowed to wind down, until a steady state is reached. This proce-dure will be repeated with all workloads for the VM Only policy on with a small configurationand the Mixed policy on the big configuration.

The experiment will measure a lot of different metrics. They are:

• Active Containers - The number of containers active at any one time.

• Active Nodes - The number of nodes active at any one time.

• Total Requests - The total number of requests being made at any one time.

• Response Time - The response time for each request.

• Failure Rate - The ratio of requests that failed.

These metrics are aimed at answering a few questions. They are:

• How much did it cost to use the policy?

• What was the error rate of the policy?

• What was the response time for requests?

For the experiment we will use static thresholds to scale the application up and down. Thesethresholds are based on the non-idleness of the CPU of all the VMs. There are many ways tomeasure CPU utilization. The main reason being that a CPU can spend time in kernel-modeand user-mode, and this creates a new dimension to measuring CPU utilization. For this paperwe use the non-idleness of the CPU because of the simplicity it offers.

The cost of the VMs is the sum of all the (rounded up) hours that the VMs were commisionedtimes a constant, the per-hour cost of the VM. All of the workloads used for the experimentsare shorter than one hour so for our purposes we will only count the number of commisionedVMs, add one for the static node, and then multiply with the hourly cost to get the final price.

Scaling PoliciesThe experiment will be run under two different scaling policies. These are labeled “VM Only”and “Mixed”. In the VM Only scaling we will keep a single container for each VM. Thenumber of containers will match the number of VMs. For the Mixed scaling we will use abasic threshold of containers and once that is reached we will commision another VM. Thepolicy uses at most 4 containers per VM and only once the algorithm suggests that morecontainers than allowed by the threshold should exist will it scale with a new VM. Wheneverthe algorithm scales up it will scale with two VMs (and the appropriate number of containers).This provides reliability in case a VM is not deployed to properly.

Apart from the policies we have mention it might be tempting to use a policy in which onlycontainers are scaled, and not VMs. This policy is disregarded for the main experiment as

23

4.4. Policy Experiments

Table 4.1: Scaling configuration for different scaling policies.

Memory vCPUS SSD Disk Transfer Price per hour VM nameVM Only 1 GB 1 vCPU 25 GB 1 TB $0.007 SmallMixed 4 GB 2 vCPU 80 GB 4 TB $0.030 Big

a policy which only scaled containers cannot acquire more computing resources and thus itcannot be classified as an elastic solution, with regards to the classification created by Al-Dhuraibi et al. [4]. An elastic solution needs to be able to scale and adding more containerswill not increase the resources available to the application. The different policies and theirconfigurations are listed in Table 4.1.

Policy Experiments for Baseline and ComparisonIn order to get measurements that make the different policies comparable, both between them-selves and with a baseline, we run an experiment where each run uses the same workload, butalways a different setup. The setups have different parameters, for which we test the mean-ingful combinations. The parameters are:

• Size - Either small or big

• Scaling Policy - VM Only, Mixed, Container Only or Constant

• Workload - Exponential Ramp

• Deployment Policy - Either early or late

For the “Size” parameter we use two different sizes, these correspond to the sizes mentioned inTable 4.1, where “small” is the smaller one and “big” is the bigger one. For the “Scaling Policy”parameters, we use four different values. VM Only and Mixed are the same policies as usedin the main experiments. Container Only is a policy for which only containers, not VMs, arescaled and Constant always uses one VM and one container. For the “Workload” parameterwe use the Exponential Ramp workload shown in Figure 4.1. Lastly, for “Deployment Policy”we use early and late. Early means that containers and VMs are issued at the same time,while late means that containers are issued after the issued VMs are available.

Some of the combinations, while valid, have no meaningful difference to another combination.An example of this is comparing two runs with the “Constant” scaling policy of the same VMSize. They will only differ in deployment policy, but because they run with one VM and onecontainer they never utilize the deployment policy. A similar scenario occurs when comparingsetups using the Container Only scaling policy.

Optimal Number of Containers for Mixed PolicyA problem that is encountered for the Mixed policy is that a choice must be made in regardsto how many containers are used. To investigate this we conduct an experiment where wetry to find the request time for the application with a certain rate and a certain numberof containers. By analysing the request time for different request rates we can find a valuefor which the request time is as low as possible. It might be useful to have some degree ofreplication, so that more requests can be handled at the same time, but having a very largedegree of replication might result in a lot of administrative work by the CPU and so it isreasonable that there is an optimal value for which increasing or decreasing the degree ofreplication will only worsen the request time.

24

4.5. Auto-Scaling Implementation

In order to find a suitable value for the optimal number of containers for our Mixed policy wewill conduct an evaluation where the number of containers is set to a constant value and onlyone VM will be active, the same type used during the Mixed policy. We will then use Siege tofind the request time with a rate of 5, 10, 20, 40, 60, 80, 100 and 120 users/s. This test willbe done with 1 through 10 containers and in the end the experiment will yield a comparablevalue between different numbers of containers, from which we can choose a good candidate forthe optimal number of containers per VM.

Startup and Shutdown Times for ContainersWe will also be looking at the time it takes for individual containers to start and stop ona single Virtual Machine. These tests will use the same VM as the one used in the Mixedcategory. We will run a test that for every N of 1 through 10, will scale from 0 to N , N to 0,N − 1 to N and N to N − 1. For the tests we will measure the time it takes to complete thesetasks. Each combination of container and test type is run 250 times. The Docker Image wewill use is the modified image server used in the policy experiments. During the tests we willapply no load to the application, so that the system focuses on deploying the containers.

4.5 Auto-Scaling Implementation

In this section we describe the implementation of the auto-scaling software and the differentparts of it.

Dynamic and Static NodesNodes that are part of a Docker Swarm as part of an auto-scaling algorithm can either bestatic or dynamic. If a node is static it means that it will not be removed if the applicationscales down. Static nodes either host special services that are not scaled or provide a minimumof nodes available to the swarm. For example Docker Swarm needs at least one manager nodein order to function correctly, thus it is important to not remove all nodes that are managernodes. Keeping one manager node static will prevent that.

Dynamic nodes are the opposite of static nodes. They are nodes that can be removed whenscaling down, implying that they should host services fit for scaling up and down. For theimage server that this thesis aims to scale we have chosen to use both static nodes and dynamicnodes as this provides application redundancy but still allows for flexibility and scaling. Referto Figure 4.7 for an intuition behind the expected scaling behaviour.

In the experimental setup we will use 3 static nodes. One node is the reverse proxy, Nginx,running on a 4 GB RAM, 2 vCPU, 80 GB storage, 1 TB transfer node. It will be responsible forrouting requests and serving images. One node contains the auto-scaling application, runningon a 1 GB RAM, 1 vCPU, 25 GB SSD, 1 TB Transfer (bandwidth) node. One node is astatic node with the image serving application, as well as being the Docker Swarm Manager.The specifications of the Manager node will depend on the policy being tested and mimics theDroplet type being used for that test. The dynamic nodes will all be of the same Droplet typeas the Manager node.

Scaling DeciderThe auto-scaling algorithm has a software component that decides when the application shouldscale up or down. This component uses data from the deployed application to make decisions.There are five basic operations that the algorithm can do. These are: scale the number of VMsup, scale the number of VMs down, scale the number of containers up, scale the number of

25


Figure 4.7: Expected scaling behaviour of the application

containers down and do not change the configuration. From this we are able to create differentpolicies that uses containers and/or VMs to scale.

The decider is built around non-idleness of the CPU. Any node that is deployed will regularlybe sending data about itself back to the server making decisions around scaling. This in turnwill be used by the scaling decider to see if the application should scale up or scale down. Inorder to scale, the decider uses one of the policies in addition to a simple static threshold toscale. The decider looks at the mean of the non-idleness of the CPU as a number between 0-1(0%-100%) and if all deployed nodes are above 80% CPU non-idleness it will scale up. If allnodes are below 30% non-idleness it will scale down. Additionally it will scale down if thereis one node with less than 5% CPU utilization. In order to calculate the CPU non-idlenesswe use the average of the 5 latest measurements for each node, where each measurement for asingle node is sent roughly every 5 to 10 seconds. All policies use the described algorithm, butthey react differently to it. The algorithm for the scaling decider is shown in Figure 4.8. Thescaling decider will never scale below 1 VM and never above 25 VMs. Additionally it will takea decision once every 60 seconds. If the decider scales up to add another VM or scale downto remove a VM it will have a cooldown period of 210 seconds (3 minutes) before taking anyother decisions.

When scaling VMs and containers we also observe that scaling them at the same time will resultin the containers being placed on the existing VM, not on the newly commisioned VMs. Thisin turn causes the algorithm to behave suboptimally, commisioning VMs that run idle withno containers while at the same time running more containers then expected on the existingVMs. In Figure 4.8 we can see that such a scenario would lead to scaling down because of theVMs without containers would run idle. The solution to this problem in particular would be toscale the containers only once the VMs have been commisioned and deployed to. We refer tothis property as deployment strategy. Either a late or early deployment strategy can be usedwherein the late deployment strategy will deploy containers after VMs have been commisionedand the early deployment strategy will deploy VMs and containers at the same time.

26


Figure 4.8: The algorithm used by the scaling decider. How it scales up or down depends onthe currently active policy.

27


Auto-Scaling TaxonomyWe use the taxonomy provided by Al-Dhuraibi et al. [4] in order to classify the elastic system.The full classification is summarized in Figure 4.9

Configuration: Rigid. The system is rigid because DigitalOcean only offers a set of specific,unflexible configurations. They offer three overarching types of virtual machines, each typewith a different purpose. The three different types are “Standard Droplets”, “OptimizedDroplets” and “Flexible Droplets”. For this thesis we focus on the Standard Droplets. Theseare specified in Table 4.2

Table 4.2: Standard Droplets offered by DigitalOcean

Memory vCPUS SSD Disk Transfer Price per hour1 GB 1 vCPU 25 GB 1 TB $0.0072 GB 1 vCPU 50 GB 2 TB $0.0154 GB 2 vCPU 80 GB 4 TB $0.0308 GB 4 vCPU 160 GB 5 TB $0.06016 GB 6 vCPU 320 GB 6 TB $0.11932 GB 8 vCPU 640 GB 7 TB $0.23848 GB 12 vCPU 960 GB 8 TB $0.35764 GB 16 vCPU 1280 GB 9 TB $0.47696 GB 20 vCPU 1920 GB 10 TB $0.714128 GB 24 vCPU 2560 GB 11 TB $0.952192 GB 32 vCPU 3840 GB 12 TB $1.429

Scope: Infrastructure. The system scales using containers and virtual machines and there areno other scaling mechanisms. The two tests; VM Only and Mixed will all rely on infrastructurescaling, altough they will be using different ways to scale.

Purpose: Research. The main purpose, from the perspective of this thesis, is research, orknowledge and understanding. As this thesis is done at a company it is also worth mentioningthat the main purpose, from the perspective of the company, is to increase some other metric,such as availability.

Mode: Reactive. The auto-scaling implementation uses the latest CPU non-idleness readingsin order to scale and does not make any predictions based on previous state.

Method: Horizontal. The application that will be scaled is an isolated microservice of theBriteback application. It will be scaled by replicating it over several different computationalunits. In all cases it will be scaled using containers.

Architecture: Centralized. The program is built using a centralized logger and decisioncenter.

Provider: Single. The only provider we use is DigitalOcean.

Data Collection and Application ModelFor the scaling decider we will have to make a decision every minute. This is quite a longtime and allows for a lot of flexibility in the algorithm chosen for scaling. However, we havefocused on a simple algorithm but investigating other algorithms, which require more CPU-time, is entirely feasible. Mainly we have focused on allowing arbitrary data to be stored in a

28


Figure 4.9: The classification of the system, using the taxonomy developed by Al-Dhuraibi etal. The blue colored boxes represent the classification of the system.

centralised location that can be used to make auto-scaling decisions. There are a few generalareas in which data collection is possible. Some examples are listed here:

• Containers - Information from inside individual containers, such as running processes

• Service - Information about a specific service, such as active containers

• Node - Information about a running node, such as CPU non-idleness

• Swarm - Information about the Docker Swarm, such as the number of nodes

• API - Information from different network API’s, such as number of active nodes fromthe DigitalOcean API

Data about a decentralized application tend to come from a lot of different places, and as suchthe data collection itself is prone to being decentralized. However, a property that is usefulfor the component responsible for auto-scaling is that it has access to all the data, preferablyin a single place. In order to achieve this centralization we created a network API that canstore arbitrary information to a database. In order to access the API, a request needs to beauthenticated as we only want genuine data to exist in the database. Using a web API also hasthe benefit that we can add a logging mechanism very easy to any computation unit (container,node etc.) with access to the internet. We can also choose different logging mechanisms indifferent nodes. For example, we only need one place that access the API of DigitalOceanand thus we only need to have the secret access token on that node, not in every possiblecomputation unit.

For this thesis we limit ourselves to a few measurements and actions as the goal is to analyzethe scaling policies and not the collected data. Some metrics are necessary for the applicationto run, such as API information on newly created nodes, meaning IP address, status andmore. Without the information from the API it will be impossible to deploy or commisionmore computing resources. As such, the API is necessary in order for the auto-scaling to work.To create our model of the application we use the following information:

• Node CPU Non-Idleness - The CPU utilization of individual nodes

• DigitalOcean API Information - Listing number of VMs and deployment status.

29

5 Results

In this section we present the results from the experiments.

5.1 DigitalOcean Evaluation

The first experiment to conclude was the evaluation of startup and shutdown times of virtualmachines with DigitalOcean. The resulting data used 993 different data points. This ispresented using a Complementary Cumulative Distribution Function (CCDF) in Figure 5.1and Figure 5.2. Both of these figures also show the same data with a logarithmic plot in theirrespective right sideplots.

Figure 5.1: On the left: The CCDF of startup times for VMs on DigitalOcean. On the right:The same data but plotted on a logarithmic scale.

30

5.2. Download Times for Docker Image

Figure 5.2: On the left: The CCDF of shutdown times for VMs on DigitalOcean. On theright: The same data but plotted on a logarithmic scale.

Some key points in the figure is that 99% of all startup times is less than 40 seconds, thatshutdown times are negliable in comparison to startup times and that there is a big spreadon startup times. These values match well with the times that were found by Marshall et al.[26].

5.2 Download Times for Docker Image

In this section we present the download times for the image server Docker Image. In totalwe measured 500 downloads and plot them. Figure 5.3 shows the CCDF for downloading theDocker Image. We can see that the fastest time was roughly 14 seconds and that the slowesttime was roughly 24 seconds. We can also see that close to 95% of download times are lessthan 18 seconds and 99% are less than 20 seconds. Combining this with the time it takes tocommision a VM we get that roughly 98.01% of commision and image downloads takes roughly1 minute. Looking at Figure 5.3 we see that the download times are logarithmic.

31

5.3. Optimal Number of Containers for Mixed Policy

Figure 5.3: On the left: the CCDF for download times when downloading the image serverDocker Image. On the right: The CCDF for download times when downloading the imageserver Docker Image, plotted using a logarithmic scale on the Y-axis.

5.3 Optimal Number of Containers for Mixed Policy

In this section we show the results of the container evaluation.

Figure 5.4 shows the average response time for a rate of requests per second of 5, 10, 20, 40,60, 80, 100 and 120 for 1, 2, 3, ..., 10 containers. Figure 5.5 shows the relative response timewhere each data-point for a given rate has been reduced by the minimal value for that rate.The figures show that adding containers alone adds very little in terms of reduced request timeand it is not clear what the optimal choice is.

Figure 5.4: The actual response time for each container setup.

32

5.4. Startup and Shutdown Times for Containers

Figure 5.5: The response time for each container setup as relative to the minimum responsetime for that column in the graph. Datapoints for 1 container is not visible for 100 and 120requests per second. Datapoint for 2 containers is not visible for 120 requests per second.

The exact values illustrated in Figure 5.4 are shown in Table 5.1.

Table 5.1: Average request times for 1, 2, 3, ..., 10 containers with request times between 5and 120 per second. Each row is a separate container count and each column is a separaterequest-per-second count.

5 10 20 40 60 80 100 1201 0.24 0.28 0.60 1.66 2.80 3.93 5.82 8.352 0.23 0.28 0.63 1.73 2.88 4.0 5.09 6.433 0.24 0.29 0.62 1.72 2.82 3.97 5.02 6.194 0.24 0.29 0.59 1.65 2.77 3.88 4.90 5.995 0.23 0.27 0.55 1.61 2.72 3.87 4.99 6.126 0.23 0.27 0.58 1.65 2.73 3.86 4.88 5.987 0.23 0.27 0.56 1.64 2.70 3.82 4.93 6.028 0.22 0.27 0.57 1.65 2.75 3.83 4.9 5.939 0.23 0.29 0.58 1.66 2.79 3.90 4.96 6.1310 0.22 0.27 0.55 1.61 2.69 3.84 4.89 6.18

5.4 Startup and Shutdown Times for Containers

In this section we show the results of the scaling evaluation. All of the figures shown are boxdiagrams where the orange line is the median, the box containes 95% of all the data points,the whiskers contains 99% of all data points and the remaining 1% are shown as individualblack circles.

For the 0 to N scaling we find that it increases linearly with the number of containers that arerequested to start. This is shown in Figure 5.6. We can also see that starting the containerstakes roughly 14 to 21 seconds, depending on the number of containers started. Looking at thereverse operation in Figure 5.7, scaling from N to 0 we see two interesting properties. Firstly, it

33


is constant and does not rise linearly with the number of containers stopped. Secondly, we seethat it happens at roughly 1/10th of the time it takes to scale up, around 1.5 seconds. Next,looking at the time it takes to start one container in Figure 5.8 we see that starting a singlecontainer, no matter the number of already started containers, takes roughly 14 seconds. Forthe reverse operation in Figure 5.9 we see a similar pattern as the one that appeared betweenscaling from 0 to N and N to 0. It takes roughly 1/10th of the time to scale down, roughlybetween 1.4 and 1.8 seconds. What should be noted here, however, is that scaling down willgive the closing container a bit of time to free all resources that it is using. The particularDocker Image uses very few resources and can thus close quickly. An example of a containerthat might close slower is a container with a database that has a lot of updates queued up.

Figure 5.6: The figure shows the time it takes for the containers to scale from 0 replicas to Nreplicas on a single node. The orange line is the median, the box shows the 95% range, thewhiskers show the 99% range and the black circles shows the rest 1%.

34


Figure 5.7: The figure shows the time it takes for the containers to scale from N replicas to 0replicas on a single node. The orange line is the median, the box shows the 95% range, thewhiskers show the 99% range and the black circles shows the rest 1%.

Figure 5.8: The figure shows the time it takes for the containers to scale from N-1 replicas toN replicas on a single node. The orange line is the median, the box shows the 95% range, thewhiskers show the 99% range and the black circles shows the rest 1%.

35


Figure 5.9: The figure shows the time it takes for the containers to scale from N replicas toN-1 replicas on a single node. The orange line is the median, the box shows the 95% range,the whiskers show the 99% range and the black circles shows the rest 1%.

36

5.5. Policy Evaluation

5.5 Policy Evaluation

This section presents the auto-scaling experiment where the whole system was set up and thedifferent policies were used to scale the system under load. Table 5.2 shows the cost and errorrate for the different policies under different workloads. We can see that the VM Only setuphad problems with error rate, especially in the Instant Traffic case where for some periods theerror rate was around 5%. The Mixed setup handled errors much better, with an error ratearound less than 1% for all measured workloads.

Table 5.2: Error rate and price for VM Only and Mixed policy.

Exponential Ramp Linear Rise, Fast Drop Instant TrafficVM OnlyError Rate (%) 0.06% 0.28% 0.79%Price 3.5 cents 4.9 cents 4.9 centsMixedError Rate (%) 0.10% 0.07% 0.10%Price 9 cents 9 cents 9 cents

VM OnlyIn this section we discuss the results from the VM Only setup with the three different work-loads.

In Figure 5.10 we can see the results from running the Exponential Ramp workload with theVM Only policy. In the top graph we see the number of VMs and the error rate as the greenand red line, respectively. The VMs are measured as the active number of VMs, that is theVMs which can be reached with SSH and which commands can run on. In this test there areat most 5 different VMs. In the bottom graph we see that the distribution of the containers,once deployed, match with the policy of “1 container per VM”. Close to the 18 minute markin the bottom subplot we see a small bump where one VM has two containers. The reasonfor this is because the containers are scaled down after the cooldown period for scaling downa VM. We can also see that the total time to deploy new VMs is roughly three and a halfminutes by looking at the difference between when the VM is active and when the containersrun. The three and a half minutes also contain hidden margins. The auto-scaler operatesminute by minute and container information is sent roughly every 5-10 seconds. The resultis that, in a worst case scenario, the VMs could have been deployed a minute earlier and thecontainers would show up a minute and ten seconds earlier in the graph. The middle of thegraph shows the expected requests per second and the three dashed lines show the maximum,average and minimum request times in red, green and blue respectively. Overall, we see thatthe auto-scaler adapts to the increased number of requests per second properly and the averageand maximum response time is kept reasonably low throughout the test.

37


Figure 5.10: The results of the VM Only policy with the Exponential Ramp workload. Thetop graph shows the number of VMs and the error rate. The middle graph shows the requestrate as well as the maximum, average and minimum request time. The bottom graph showsthe distribution of containers over different nodes.

38


In Figure 5.11 we can see the result from running the Linear Rise, Fast Drop workload. For thisworkload we reach a maximum of 5 different VMs, but commision a total 7 VMs. The earlybump in the top and bottom graph show that the VMs are commisioned and available, butwas not deployed to properly. The reason for this is probably that the VMs were commisioned,but not ready until very late, resulting in a deployment window of a little less than 2 minutes(the time difference between the top graph increase in VMs and the bottom graph increase incontainers). The result is that the maximum, average and minimum request time all increaseby a lot. It only starts to decrease once the unused VMs are gone and new VMs have beencommisioned. Around the 15 minute mark we see that both the average and minimum requesttime drops sharply, because of the new containers that are available on new VMs. The last setof VMs are commisioned at peak request intensity but unfortunately they only appear afterthe intensity has dropped, a necessary negative of using a reactive auto-scaler. This graphshows that the auto-scaling solution is not perfect. It did not manage to properly deploy andbalance containers with rising request intensity, resulting in a time loss of about 10 minutesfor which it commisioned VMs, decomissioned them and then commisioned new VMs. It does,however, display that the auto-scaling solution is resillient enough to handle a scenario inwhich it does not deploy correctly.

Figure 5.11: The results of the VM Only policy with the Linear Rise, Fast Drop workload.The top graph shows the number of VMs and the error rate. The middle graph shows therequest rate as well as the maximum, average and minimum request time. The bottom graphshows the distribution of containers over different nodes.

39


In Figure 5.12 we can see the result from running the Instant Traffic workload with the VMOnly policy. For this workload the policy reached a maximum of 7 different simultaneousVMs, the most of any VM Only workload. A sharp increase in the error rate is seen betweenthe 6 and 7 minute mark. The reason for this is because that period (minute 5 to 7) is wherethe workload starts with 100 requests per second at the same time as the number of VMsscale up. There is a small period of time for which the container has started (and requests arerouted to it) but the application in the container has not yet started. We see that the averagerequest time decreases over the course of the test as more VMs and containers are added tothe swarm. We also see that the error rate declines, altough it remains quite high throughoutthe later parts of the test.

Figure 5.12: The results of the VM Only policy with the Instant Traffic workload. The topgraph shows the number of VMs and the error rate. The middle graph shows the request rateas well as the maximum, average and minimum request time. The bottom graph shows thedistribution of containers over different nodes.

MixedIn this section we discuss the results found with the Mixed setup.

In Figure 5.13 we see the result of the Mixed policy with the Exponential Ramp workload.We see that a maximum of 3 different VMs were available at peak intensity. We also see thata maximum of 9 different containers were available at max request intensity, spread over the

40


different VMs. For the majority of the test there is only one VM and the containers do nothave a definite impact on the response times, however, adding more VMs with new containersdo lower the response time. At the end of the bottom graph we see a similar bump as inFigure 5.10 and Figure 5.12. This bump is the result of a VM being decommisioned and allof the containers to move to another VM. We also see a staircase around minute 6 where theintensity starts to rise. This is an increase in containers on the original VM, but it is notevident what the effect on the request time is.

Figure 5.13: The results of the Mixed policy with the Exponential Ramp workload. The topgraph shows the number of VMs and the error rate. The middle graph shows the request rateas well as the maximum, average and minimum request time. The bottom graph shows thedistribution of containers over different nodes.

41


In Figure 5.14 we see the result of the Mixed policy with the Instant Traffic workload. We seethat the auto-scaler scales late, only after 5 minutes of high intensity workload does it start.When it scales the containers it seems to have an effect on the average response time, loweringit. The biggest effect on response time happens around the 18 minute mark when the VMsare properly set up. At minute 10 we see a decline in average response time and minimumresponse time. This could be because of a second container starting, because there is a smallmargin between when the container starts and when the application inside the container isstarting that the container cannot handle requests.

Figure 5.14: The results of the Mixed policy with the Instant Traffic workload. The topgraph shows the number of VMs and the error rate. The middle graph shows the request rateas well as the maximum, average and minimum request time. The bottom graph shows thedistribution of containers over different nodes.

42

5.6. Policy Experiments for Baseline and Comparison

In Figure 5.15 we see the result of the Mixed policy with the Linear Rise, Fast Drop workload.For this workload the auto-scaler scaled to a maximum of 3 different VMs. Compared to theMixed policy with the Instant Traffic workload it reacted quickly, within a few minutes. It ishard to discern if the response time changed because of the increased number of containers.However, we clearly see the average and minimum response time dropping after more VMs areadded. Throughout the whole test we see a very low error rate.

Figure 5.15: The results of the Mixed policy with the Linear Rise, Fast Drop workload. Thetop graph shows the number of VMs and the error rate. The middle graph shows the requestrate as well as the maximum, average and minimum request time. The bottom graph showsthe distribution of containers over different nodes.

5.6 Policy Experiments for Baseline and Comparison

In this section we show the result of the baseline and comparison experiments. The section issplit into four subsection, one for each scaling policy.

ConstantFor the constant scaling policy there were two different meaningful combinations. One for thebig VM size and one for the small VM size. In Figure 5.16 we can see the result of usingthe small VM and in Figure 5.17 we can see the result of using the big VM. These two testsestablish a baseline for all the other tests to compare to.

43


Figure 5.16: The result of the Constant policy on a small VM with the Exponential Rampworkload. The middle graph shows the request rate as well as the maximum, average andminimum request time. The bottom graph shows the distribution of containers over differentnodes.

Because of the lack of a deployment policy we see that the number of VMs and number ofcontainers are always 1 throughout the whole test. This makes the only interesting metric theminimum, average and maximum response time. For the small VM we see that the metricsall follow the shape of the workload. During peak intensity the response time is roughly 4,5 and 10 seconds for minimum, average and maximum response time, respectively. For thebig VM we can see a similar pattern, just that the reponse time is in general much lower.During peak intensity the response time is roughly 2, 2.5 and 5 seconds for minimum, averageand maximum response time, respectively. Around the 8th minute there is a peak for themaximum response time, we believe the reason for this peak is because of the volatility of themeasurement. Also of note is the error rate at 0% for both runs.

VM OnlyFor the VM Only policy there are four different meaningful combinations, two different sizesand two different deployment policies. In Figure 5.18 we see the result of running the workloadagainst the small VM size and the late deployment policy. In Figure 5.19 we see the result ofrunning the workload against the big VM size and the late deployment policy.

For the VM Only policy on a small VM we can see that the response time, compared to aconstant policy of a similar size, is similar. For a few measurements there is a difference inmaximum and minimum response time, but the average is almost identical to the constant

44


Figure 5.17: The result of the Constant policy on a big VM with the Exponential Rampworkload. The middle graph shows the request rate as well as the maximum, average andminimum request time. The bottom graph shows the distribution of containers over differentnodes.

policy. As shown in the bottom graph of Figure 5.18 we can see that there is a misseddeployment, the containers are deployed to the wrong VM, and that the system recouperatesfrom it. While there is only one active VM, we can see that the average response time is similarto the results for the constant policy. However, when new VMs are added we can immediatelysee that the average response time drops. For this particular test it more than halves from5 seconds to roughly 2 seconds. Unfortunately we also see that the policy has some errors.The errors peak around the 7th minute mark. An explanation would be that the VM closedaround the 6th minute mark causes some errors, making the error rate spike. Overall the errorrate is kept around 0.2 to 0.4%.

In Figure 5.19 we see the results from the VM Only policy with the big VM. Similar to theresults with the small VM we see that the minimum, average and maximum response times arealmost identical to the results of the constant policy. In this figure we also see the reduction inresponse time when adding more VMs. The error rate seems to stay around 0.1% for the run,starting around the 9th minute mark. It is around that time that the extra VMs are added.Comparing it to the Mixed policy of a similar size we see that the error rate is much lower, asis the maximum response. It also seems that the VM Only policy manages to scale up faster,lowering the average response time earlier. The levels of average response time is, however,similar for both policies.

45


Figure 5.18: The result of the VM Only policy on a small VM with the Exponential Rampworkload and late deployment policy. The middle graph shows the request rate as well as themaximum, average and minimum request time. The bottom graph shows the distribution ofcontainers over different nodes.

46


Figure 5.19: The result of the VM Only policy on a big VM with the Exponential Rampworkload and late deployment policy. The middle graph shows the request rate as well as themaximum, average and minimum request time. The bottom graph shows the distribution ofcontainers over different nodes.

47


MixedFor the Mixed policy there are four different meaningful combinations, two different sizes andtwo different deployment policies. In Figure 5.20 we see the result of running the workloadagainst the small VM size and the late deployment policy. In Figure 5.21 we see the result ofrunning the workload against the big VM size and the late deployment policy.

For the Mixed policy with the small VM, shown in Figure 5.20, we can see that it performssimilarily to the VM Only policy of a similar size. It reaches an average response time ofroughly 3 seconds when scaled up, a bit higher than the VM Only policy. The error rates aresimilar and peaks close to each other, altough this is coincidental as the VM Only scales downand the Mixed scales up. The maximum response time is, however, a lot higher for the Mixedpolicy. It is roughly 4 seconds longer for most of the run. An interesting observation is thatthe policy scales from 9 containers to 10 containers, but stops at 10 containers. For other testsit has continued until a new VM has been issued.

Figure 5.20: The result of the Mixed policy on a small VM with the Exponential Rampworkload and late deployment policy. The middle graph shows the request rate as well as themaximum, average and minimum request time. The bottom graph shows the distribution ofcontainers over different nodes.

In Figure 5.21 we can see the results of the Mixed policy with the big VM. The error ratewas not very good for this policy. It peaks after increasing the number of VMs to roughly2% and then slowly falls to roughly 0.6% at the end of the run. The VM Only policy withthe same VM achieved an error rate that peaked around 0.1%. It also had worse responsetime than the VM Only policy. There is only one datapoint for the response time that isbetter for the Mixed policy, the minimum response time at minute 6. It shows similar level

48


of average response times, just slightly higher, compared to the VM Only policy. The onlyimprovement over the Constant policy is seen in the average and minimum response time,altough the average response time is only lower once there are more VMs.

Figure 5.21: The result of the Mixed policy on a big VM with the Exponential Ramp workloadand late deployment policy. The middle graph shows the request rate as well as the maximum,average and minimum request time. The bottom graph shows the distribution of containersover different nodes.

Container OnlyFor the Container Only policy there are two different meaningful combinations. The deploy-ment policies only make a difference when VMs are added, but for the Container Only policythere will always be one VM. In Figure 5.22 we see the result of running the workload againstthe small VM size. In Figure 5.23 we see the result of running the workload against the bigVM size.

For the Container Only policy the scaler often entered a state in which it would be increasingthe number of containers indefinitely. The small VM could not handle such a load and crashed,while the big VM could handle it. For the small VM around minute 9 the health reportingmechanism crashed because of a lack of memory available on the VM. The mechanism iscreated in such a way that if it crashes it will restart, explaining why there are reports afterthe 10 minute mark. However, inspecting the logs shows that they are incomplete or faulty,leading to the sporadic pattern seen in the bottom subplot of Figure 5.22.

In Figure 5.23 we see the result of the Container Only policy. After reaching the second stageof the workload, the ramp around minute 5, we see that the number of VMs increase over the

49


Figure 5.22: The result of the Container Only policy on a small VM with the ExponentialRamp workload. The middle graph shows the request rate as well as the maximum, averageand minimum request time. The bottom graph shows the distribution of containers overdifferent nodes.

rest of the run in steps of two containers per increase. The response time during peak intensityis roughly 0.1, 2.5 and 8 seconds for minimum, average and maximum respectively. This isan improvement over the baseline in regards to response time. However, we see that the errorrate, after the workload rises in intensity, lies around 0.15% - 0.25%. Even though this is arather low error rate it is worse than the baseline.

Early DeploymentHalf of the policies examined with the experiment were run with both a late and early deploy-ment strategy. In this section we show the result of two different scenarios and argue that thelate deployment works better with docker and creates a better run overall. The results of thetwo scenarios not shown in this section are presented in Appendix A.

In Figure 5.24 we see the result of running the Mixed policy on a small VM with the earlydeployment strategy. Two important observations is that the error rate is very high and thatmost containers are concentrated on a single VM. The error rate peaks at 2.5%. Around theerror peak we can also see that there are 3 VMs, where one is being closed. The one that isclosed never runs any containers. Because of this it only costs money, while not contributingto the performance of the system. The reason the containers are distributed like so, is becauseof Docker Swarm. Swarm is designed to not redistribute running containers and only once acontainer crashes or a container is added will it be assigned to the new VM.

50


Figure 5.23: The result of the Container Only policy on a big VM with the Exponential Rampworkload. The middle graph shows the request rate as well as the maximum, average andminimum request time. The bottom graph shows the distribution of containers over differentnodes.

In Figure 5.25 we see the results of running the VM Only policy on a big VM with the earlydeployment strategy. The VMs are scaled up twice. Once around the 7th minute mark andonce again around the 9th minute mark. At most there are 5 VMs, the most for any of theruns with the big VM. However, while there are five available VMs, only two of them havecontainers. Similar to Figure 5.24 we see that the early deployment strategy causes VMs togo empty, or rather that other VMs have more containers than they are supposed to have.The performance of the run is comparable to the VM Only policy on the big VM with a latedeployment strategy, just that there are 3 unused VMs that do not add performance.

The results of the other runs can be found in Appendix A, but they are similar to these resultsand indicate that running the early deployment strategy leads to unused VMs.

51


Figure 5.24: The result of the Mixed policy on a small VM with the Exponential Rampworkload using the early deployment strategy. The middle graph shows the request rate aswell as the maximum, average and minimum request time. The bottom graph shows thedistribution of containers over different nodes.

52


Figure 5.25: The result of the VM Only policy on a big VM with the Exponential Rampworkload using the early deployment strategy. The middle graph shows the request rate aswell as the maximum, average and minimum request time. The bottom graph shows thedistribution of containers over different nodes.

53

6 Discussion

In this chapter we will discuss the results, the method and the sources.

6.1 Results

In this section we discuss the result with respects to the theory and attempt to explain whywe got the results we got.

Error Rates and PricesThe cost of the Mixed setup was around two times larger than the VM Only setup. Even so,it did not always outperform the VM Only policy in terms of error rate. For the ExponentialRamp workload the VM Only costs 3.5 cents and had an error rate of 0.06% while the Mixedpolicy had an error rate of 0.10% and costs 9 cents. However, across all the workloads we seethat the Mixed policy worked generally better in terms of error rate. Especially revealing isthe Instant Traffic error rate, where the Mixed policy got an error rate of 0.10% and the VMOnly policy got an error rate of 0.79%, almost eight times higher. While the VM Only policyperformed better in terms of cost, it has shown a very high variance in error rate, while theMixed policy has been very stable in regards to error rate. However, these tests were also runwith different VM sizes and looking at the baseline tests we see that the difference betweenthe policies is in fact very small. The number of containers used has a small impact in respectto the number of VMs used.

An unfortunate unfairness in the tests are the fact that the workloads run for roughly 20minutes but the billing is hourly. The auto-scaler cannot effectively reduce costs by turningoff VMs. This is an area for which the method could be improved. Specifically, the validityof the results would improve if the workloads ran for a few hours or a few days and the pricecomparison would be fairer. An interesting comparison could be made between a workloadthat was semi-intense all the time compared to a varying workload with a day and nightcycle. It would also be interesting to use traces to see how the system would react and howthe different policies would fare. The main reason the thesis does not use such workloads isbecause of the limited scope.

54

6.2. Method

6.2 Method

In this section we discuss the method, its replicability, validity and reliability.

DigitalOcean EvaluationBeginning with the evaluation of the startup and shutdown times for VMs issued on Digi-talOcean. The evaluation was created with heavy inspiration from earlier studies. The mostimportant data to find was the startup time as the shutdown time very rarely affects theperformance of the application as a whole. When the application needs to shut down a VMit is usually in a state with low utilization. The main deadline for shutdown comes from thehourly pricing model, meaning the window to shut down a VM is quite long. The startuptime, however, is very important to have a good model for as it is the basis for acquiring morecomputational resources. It is also often necessary to get more computational resources withina few minutes.

The evaluation of DigitalOcean was designed to be replicable and valid. The method is de-scribed in Section 4.3. The evaluation mimics how one would use the API to determine activeand inactive VMs in a real world setting. The evaluation is thus replicable and valid. Theresults, however, are not necessarily reliable. The reliability relies on the supply and demandof DigitalOcean VMs. In a competitive market such as cloud hosting it is not certain that sup-ply, or demand will stay the same. This makes the reliability of the results wither over time,becoming less reliable as time passes. It should also be mentioned that it is in the interest ofDigitalOceans to lower startup times as this improves their product.

Auto-Scaling ImplementationThe auto-scaling implementation was created with the aim of allowing part of the Britebackapplication to achieve auto-scaling while at the same time allowing for a useful research con-tribution to be made. One of the big problems with the solution is that it was designed andimplemented in a way that tried to satisfy both end goals in a satisfactory manner. Thismeant that a lot of extra functionality had to be implemented and a lot of time was spent onimplementation details that does not help reach the research goal faster. A lot of this extrafunctionality was aimed at improving reliability of the system and adding features necessaryfor the Briteback application to run. An example would be the pluggable logging system thatallowed for one to write plugins that could be added and changed through the use of configu-ration files. Implementing this functionality introduces complexity, that does not necessarilybenefit in the making of a scientific contribution. From an engineering and product perspec-tive it is however very useful to be able to specify what type of logging should be used wherewith just a few changes to a configuration file. Further studies in similar settings would mostlikely benefit from establishing the boundaries of the method and limit it in such a way thatthere is only one clear focus centered around the scientific contribution.

The scaling decider uses CPU as the main metric for making scaling decisions. It should benoted that this is a quite simple metric. Other metrics that could have been used and thatwould have been interesting to use would, for example, be average completion time of a requestas measured by the image server. It would also have been interesting to use the 95 percentilerequest time, similar to the 99 percentile read latency used by Al-Shishtawy et al. [29]. Thesewould have provided a more direct measurement of what a user would experience and thus bea more user centered metric. The drawback being that more time would have been spent oncreating a satisfactory solution in regards to getting this data from the nodes to the scalingdecider.

Another problem that arose with the implementation was the sheer amount of variables thatexist for auto-scaling. The application that is hosted, in this case the image server, can have

55

6.2. Method

many different properties. It can be CPU-intensive, IO-intensive to disk, IO-intensive to net-work or even a combination of the three. The kind of application that is used changes howexperiments would be constructed. This is one of the variables that must be accounted forwhen creating the experiments, but one must also consider choice of provider, VM specifica-tions, logging and deployment method. These variables make it hard to design and createexperiments when working in this domain. The thesis would have benefited from narrowingthe scope even further, in order to arrive at more reliable results.

Docker and Docker SwarmDocker and Docker Swarm have played a major part in this thesis. The application that isscaled uses Docker Images to be easily replicated. It uses Docker Containers to run severalinstances at the same time as well as running other services like a web proxy and a database.At the same time it uses Docker Swarm in order to create services which can communicate witheach other and easily scale the services. In order to scale to new machines we also use DockerSwarm and its concept of managers that decide where new containers are to be run and whenthey are to be run. This system has had a large effect on the reproducability of the study,since the managers decide how to spread containers over different VMs. The method used inthis thesis only conducts one experiment per policy per workload. A clear improvement tothe process, in regards to the uncertainty provided by Docker Swarm managers, would be torun more experiments, or run longer experiments, preferably both. It would also be beneficialto, in some way, make Docker Swarm more deterministic and reproducible in how it handlescontainer allocation.

A part of the method that directly links to how Docker Swarm is used is the timing of theworkloads. All of them are less than one hour. As far as we know, it is standard in thisarea of research to use longer workloads. For this paper we managed to construct a solutionthat spread the containers efficiently over the different VMs, but the first prototypes did not.Future studies in this area should take great care in which tools are used.

Evaluation of ContainersThe evaluation of the containers is used as a way to strengthen the validity of the report andshowing that containers provide a virtualization benefit that works in tandem with virtualmachines. Regarding the specifics of the evaluation, it was constructed so that it could beused with other types of applications and replicated with minor changes. This provides a highlevel of validity. However, since the evaluation uses proprietary code owned by Briteback itdoes not achieve a high level of replicability. A similar evaluation, with a different application,is simple to do, but using the same application is not.

Traces and WorkloadsA big decision during the construction of the method of this thesis was to decide to use work-loads instead of traces. Traces use collected, and thus “real”, network access data. Workloadsuse artificially generated access data. As far as we can tell it is customary, in the literature,to use workloads but traces are used enough to be called common. A lot of time went intoinvestigating how to use different traces in order to do the main experiment of the thesis.The argument was that one could take a more human approach and look at the system assomething that would be accessed by humans. However, in the end it was decided that wewould use workloads instead, mostly because of their ease of use compared to traces, but alsobecause traces can be complicated to adapt between domains and as far as we know thereare not widely trusted traces that use accesses to image servers. An interesting approach forfuture studies would be to look at how traces can be incorporated in a study with a similarsetting as this one.

56

6.3. Source Criticism

6.3 Source Criticism

In this section we discuss some of the sources used for this thesis. A problem with sources inthis area is that when working on an actual system, you will inevitably work with a systemthat cannot reasonably be reproduced or reconstructed by someone else in a perfect way. Inorder to communicate the ideas you have with others you need to fit your system into someform of model, stripping away parts of it. For cloud computing this process can strip awaya lot of important contextual information that does not fit the model, which in turn makesit hard to find sources that excel. In this section we discuss, in depth, three of the ones thatwere used for this thesis.

Just In Need StateIn order to create a good measure of elasticity we looked at Ai et al. [16] and their model usingCTMC. The model relies on three different states. The Under-Provisioned state, the Over-Provisioned state and the Just-In-Need state. These states are quite intuitive, but they do nothave useful formal definitions. The Just-In-Need state is defined as a function of the numberof active Virtual Machines, the number of active users and an upper and lower threshold.That is, we are in the Just-In-Need state for a certain amount of users if we have between “a”and “b” Virtual Machines. While simple to define and simple to understand it lacks guidencein how to construct or find the values for “a” and “b” and Ai et al. [16] chooses, somewhatarbitrarily, “a” as 1 and “b” as 3.

The decision to scale in the auto-scaling solution created by the thesis uses ideas from theAi et al. [16] paper with three different states. However, it does not use the same metrics,it uses CPU utilization, number of containers and number of active VMs. In order to decidewhich state the application is in, it looks at the mean CPU of the different virtual machines.If the CPU is very low, then we are in the Over-Provisioned state, if it is very high, thenwe are in the Under-Provisioned state. If we are neither in the Over-Provisioned state or inthe Under-Provisioned state then we are in the Just-In-Need state. A suggestion is that theOver-Provisioned state and Under-Provisioned state can be separately defined and any statethat does not fall under either of those will be a Just-In-Need state. Such a definition wouldhave worked better in the context of this thesis.

Elastic MeasurementFor this thesis we used a simple way of determining when to scale up and down. There are moreadvanced ways, as demonstrated by Al-Shistawy et al. [29]. They created a metric called R99pwhich measured the 99th percentile read latency and they used that measurement to createa solution that can scale a database. We choose to include this to show the reader that eventough the way scaling is done in the implementation it can be done in a much more advancedway. A measurement can also be constructed in a way that is tailored to the Auto-Scaledapplication, which Al-Shistawy et al. very clearly show with their R99p measurement.

System ModelWhen working with this thesis and developing the auto-scaling implementation it was clearthat comparing different solutions would be hard and carry little scientific weight without anymodel of the whole system. There are a lot of variables to consider when building an auto-scaling solution and for this thesis we chose to use the model created by Al-Dhuraibi et al. [4].While they discuss the concept of containers they do not discuss the elasticity of containers.The main reason being that there are not that many studies on the elasticy of containers [4].For this thesis we had to construct the experiment with request time for a constant number

57

6.4. The Work in a Wider Context

of containers instead of finding a paper, simply because a paper discussing the properties ofcontainers, specifically their elastic properties, could not be found.

Another reason for choosing the paper by Al-Dhuraibi et al. [4], other than it being a wellresearched paper, is that it is one of the few papers discussing taxonomies and classificationsof elastic cloud computing systems. Another taxonomy is created by Galante et al. [37].Their taxonomy is similar in that it defines “scope”, “policy”, “purpose” and “method” aspart of the taxonomy, but they do not include “configuration”, “provider” or “architecture”.Comparing the two papers suggests that the taxonomy created by Al-Dhuraibi et al. [4] ismore encompassing than the one created by Galante et al. [37].

6.4 The Work in a Wider Context

In a wider context, this thesis suggests that using both containers and virtual machines canprovide a more resource efficient auto-scaling solution compared to using just either virtualmachines or containers. This has impact on economic issues, both on monetary and environ-mental issues. If companies can better utilize the resources they have with different softwarethen they will not only save money, but they will also consume less energy. The impact ofthe research, however, is very small and highly unlikely to create major trend changes. Theseare the societal aspects that the work impacts, but it is hard to make an argument aroundthe ethics of the research as it is focused on an area that allows other software to run. Theapplication that is run may be actively harmful and for such an application it is easier to makean ethical argument. The auto-scaling solution that was developed as a part of this thesiscan be used to scale software but the auto-scaling solution does not pose any danger to eitherhumans or other machines in itself.

58

7 Conclusion

The purpose of this work was to investigate how different auto-scaling policies would affectthe price and SLO violations, in this case the error rate. The policies differed by how theyutilized different virtualization technologies where the technologies were Virtual Machines andContainers. An experiment was conducted in which the different policies were tested against aset of workloads. Additionally, an experiment was conducted in which the same workload wasused but a lot of different configurations were used, such as a Constant policy and a ContainerOnly policy.

The results reveal that choice of deployment strategy matters. All tests run with the earlydeployment strategy all have worse performance than those with the late deployment strategy.The results also suggest that when increasing the number of replicated containers in a DockerSwarm, it may result in errors. The explanation for this is that there is a small time periodbetween the container starting and the web server running in the container has started. DockerSwarm will send requests to the container, without any web server running yet. The resultsdo not show any meaningful impact of containers on response time. Additionally, the resultssuggest that using a late deployment strategy is important for Docker Swarm to utilize availableresources better. This is especially true for an auto-scaling implementation that uses the non-idleness of a CPU to make scaling decisions

The results also show how Docker performs in certain areas such as Docker Image down-load time, Docker Container startup and Docker Container stop, suggesting that automateddeployment of a service, including commisioning a VM, takes roughly 2 to 3 minutes.

59

7.1. Discussion of Future Work

7.1 Discussion of Future Work

Interesting areas of future study would be further investigation of how the coordination ofdeployment and scaling of services can be improved with respect to certain metrics, such aserror rate and price. In this paper we have shown that deployment strategy matters andfuture work could explore this research area to find why deployment strategy matters and findcriteria for an optimal deployment strategy. Further studies should also keep in mind that theperformance of cloud servers also depend heavily on the configuration of the servers [38]. Itwould also be interesting to investigate how cloud service providers could allow customers tofinetune the configurations of servers for better performance.

60

A Early Deployment Figures

Figure A.1: The result of the VM Only policy on a small VM with the Exponential Rampworkload using the early deployment strategy. The middle graph shows the request rate aswell as the maximum, average and minimum request time. The bottom graph shows thedistribution of containers over different nodes.

61

Figure A.2: The result of the Mixed policy on a big VM with the Exponential Ramp workloadusing the early deployment strategy. The middle graph shows the request rate as well as themaximum, average and minimum request time. The bottom graph shows the distribution ofcontainers over different nodes.

62

Bibliography

[1] T. C. Chieu, A. Mohindra, A. A. Karve, and A. Segal, “Dynamic scaling of web ap-plications in a virtualized cloud computing environment,” in Proceedings of the IEEEInternational Conference on e-Business Engineering, pp. 281–286. doi: 10.1109/ICEBE.2009.45.

[2] M. Mao and M. Humphrey, “Auto-scaling to minimize cost and meet application dead-lines in cloud workflows,” in Proceedings of the IEEE International Conference for HighPerformance Computing, Networking, Storage and Analysis (SC), Nov. 2011, pp. 1–12.

[3] Q. Zhang, L. Cheng, and R. Boutaba, “Cloud computing: State-of-the-art and researchchallenges,” Journal of internet services and applications 1.1 (2010): 7-18, 2010.

[4] Y. Al-Dhuraibi, F. Paraiso, N. Djarallah, and P. Merle, “Elasticity in cloud computing:State of the art and research challenges,” IEEE Transactions on Services Computing,pp. 1–1, 2017. doi: 10.1109/tsc.2017.2711009.

[5] A. Karmel, R. Chandramouli, and M. Iorga, NIST Definition of Microservices, Applica-tion Containers and System Virtual Machines (Draft). National Institute of Standardsand Technology, 2016.

[6] N. Dragoni, S. Giallorenzo, A. L. Lafuente, M. Mazzara, F. Montesi, R. Mustafin, andL. Safina, “Microservices: Yesterday, today, and tomorrow,” in Present and UlteriorSoftware Engineering, M. Mazzara and B. Meyer, Eds. Cham, 2017, pp. 195–216, isbn:978-3-319-67425-4. doi: 10.1007/978-3-319-67425-4_12. [Online]. Available: https://doi.org/10.1007/978-3-319-67425-4_12.

[7] M. Villamizar, O. Garces, L. Ochoa, H. Castro, L. Salamanca, M. Verano, R. Casallas,S. Gil, C. Valencia, A. Zambrano, and M. Lang, “Infrastructure cost comparison of run-ning web applications in the cloud using AWS lambda and monolithic and microservicearchitectures,” in Proceedings of the IEEE/ACM International Symposium on Cluster,Cloud and Grid Computing (CCGrid), May 2016. doi: 10.1109/ccgrid.2016.37.

[8] (Nov. 2017). Node js landing page, Node JS Foundation, [Online]. Available: https://nodejs.org/en/.

[9] (Nov. 2017). About node js, Node JS Foundation, [Online]. Available: https://nodejs.org/en/about/.

63

https://doi.org/10.1109/ICEBE.2009.45

https://doi.org/10.1109/ICEBE.2009.45

https://doi.org/10.1109/tsc.2017.2711009

https://doi.org/10.1007/978-3-319-67425-4_12

https://doi.org/10.1007/978-3-319-67425-4_12

https://doi.org/10.1007/978-3-319-67425-4_12

https://doi.org/10.1109/ccgrid.2016.37

https://nodejs.org/en/

https://nodejs.org/en/

https://nodejs.org/en/about/

https://nodejs.org/en/about/

Bibliography

[10] P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt,and A. Warfield, “Xen and the art of virtualization,” Proceedings of the ACM Symposiumon Operating Systems Principles (p 164-177), 2003.

[11] R. Dua, A. R. Raja, and D. Kakadia, “Virtualization vs containerization to supportpaas,” Proceedings of the IEEE International Conference on Cloud Engineering (IC2E),2014.

[12] P. D. Tommaso, E. Palumbo, M. Chatzou, P. Prieto, M. L. Heuer, and C. Notredame,“The impact of docker containers on the performance of genomic pipelines,” PeerJ, vol. 3,e1273, Sep. 2015. doi: 10.7717/peerj.1273.

[13] (Nov. 2017). Docker about page, Docker inc., [Online]. Available: https://www.docker.com/what-docker.

[14] (Nov. 2017). Docker swarm mode overview, Docker Inc., [Online]. Available: https://docs.docker.com/engine/swarm/.

[15] (May 2018). Docker compose file reference, [Online]. Available: https://docs.docker.com/compose/compose-file/.

[16] W. Ai, K. Li, S. Lan, F. Zhang, J. Mei, K. Li, and R. Buyya, “On elasticity measurementin cloud computing,” Scientific Programming, vol. 2016, pp. 1–13, 2016. doi: 10.1155/2016/7519507.

[17] I. Neamtiu, “Elastic executions from inelastic programs,” in Proceeding of the inter-national symposium on Software engineering for adaptive and self-managing systems(SEAMS), 2011. doi: 10.1145/1988008.1988033.

[18] R. Moreno-Vozmediano, R. S. Montero, and I. M. Llorente, “Elastic management ofcluster-based services in the cloud,” in Proceedings of the 1st workshop on Automatedcontrol for datacenters and clouds, ACM, 2009, pp. 19–24.

[19] A. Ashraf, B. Byholm, and I. Porres, “A session-based adaptive admission control ap-proach for virtualized application servers,” in Proceedings of the IEEE InternationalConference on Utility and Cloud Computing, Nov. 2012. doi: 10.1109/ucc.2012.22.

[20] W. Iqbal, M. N. Dailey, D. Carrera, and P. Janecek, “Adaptive resource provisioningfor read intensive multi-tier applications in the cloud,” Future Generation ComputerSystems, vol. 27, no. 6, pp. 871–879, Jun. 2011. doi: 10.1016/j.future.2010.10.016.

[21] A. Ashraf, B. Byholm, and I. Porres, “CRAMP: Cost-efficient resource allocation formultiple web applications with proactive scaling,” in Proceedings of the IEEE Interna-tional Conference on Cloud Computing Technology and Science Proceedings, Dec. 2012.doi: 10.1109/cloudcom.2012.6427605.

[22] U. Sharma, P. Shenoy, S. Sahu, and A. Shaikh, “A cost-aware elasticity provisioningsystem for the cloud,” in Proceedings of the International Conference on DistributedComputing Systems, Jun. 2011. doi: 10.1109/icdcs.2011.59.

[23] N. Roy, A. Dubey, and A. Gokhale, “Efficient autoscaling in the cloud using predictivemodels for workload forecasting,” in Proceedings of the IEEE International Conferenceon Cloud Computing, Jul. 2011. doi: 10.1109/cloud.2011.42.

[24] N. Vasić, D. Novaković, S. Miučin, D. Kostić, and R. Bianchini, “DejaVu,” ACMSIGARCH Computer Architecture News, vol. 40, no. 1, p. 423, Apr. 2012. doi: 10.1145/2189750.2151021.

[25] S. Genaud and J. Gossa, “Cost-wait trade-offs in client-side resource provisioning withelastic clouds,” in Proceedings of the IEEE 4th International Conference on Cloud Com-puting, Jul. 2011. doi: 10.1109/cloud.2011.23.

[26] P. Marshall, K. Keahey, and T. Freeman, “Elastic site: Using clouds to elastically extendsite resources,” in Proceedings of the IEEE/ACM International Conference on Cluster,Cloud and Grid Computing (CCGrid), 2010. doi: 10.1109/ccgrid.2010.80.

64

https://doi.org/10.7717/peerj.1273

https://www.docker.com/what-docker

https://www.docker.com/what-docker

https://docs.docker.com/engine/swarm/

https://docs.docker.com/engine/swarm/

https://docs.docker.com/compose/compose-file/

https://docs.docker.com/compose/compose-file/

https://doi.org/10.1155/2016/7519507

https://doi.org/10.1155/2016/7519507

https://doi.org/10.1145/1988008.1988033

https://doi.org/10.1109/ucc.2012.22

https://doi.org/10.1016/j.future.2010.10.016

https://doi.org/10.1109/cloudcom.2012.6427605

https://doi.org/10.1109/icdcs.2011.59

https://doi.org/10.1109/cloud.2011.42

https://doi.org/10.1145/2189750.2151021

https://doi.org/10.1145/2189750.2151021

https://doi.org/10.1109/cloud.2011.23


Bibliography

[27] R. Han, M. M. Ghanem, L. Guo, Y. Guo, and M. Osmond, “Enabling cost-aware andadaptive elasticity of multi-tier cloud applications,” Future Generation Computer Sys-tems, vol. 32, pp. 82–98, Mar. 2014. doi: 10.1016/j.future.2012.05.018.

[28] D. Serrano, S. Bouchenak, Y. Kouki, T. Ledoux, J. Lejeune, J. Sopena, L. Arantes, and P.Sens, “Towards QoS-oriented SLA guarantees for online cloud services,” in Proceedingsof the IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing(CCGrid), May 2013. doi: 10.1109/ccgrid.2013.66.

[29] A. Al-Shishtawy and V. Vlassov, “ElastMan,” in Proceedings of the ACM - Conferenceon Cloud and Autonomic Computing (CAC), 2013. doi: 10.1145/2494621.2494630.

[30] B. Urgaonkar, P. Shenoy, A. Chandra, P. Goyal, and T. Wood, “Agile dynamic pro-visioning of multi-tier internet applications,” ACM Transactions on Autonomous andAdaptive Systems (TAAS), 2008.

[31] W. Vogels, “Beyond server consolidation,” Queue, vol. 6, no. 1, pp. 20–26, Jan. 2008,issn: 1542-7730. doi: 10.1145/1348583.1348590. [Online]. Available: http://doi.acm.org/10.1145/1348583.1348590.

[32] (May 2018). Amazon ec2 autoscaling, [Online]. Available: https://aws.amazon.com/ec2/autoscaling/.

[33] (May 2018). Kubernetes autoscaling, [Online]. Available: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/.

[34] P. Mell and T. Grance, The NIST Definition of Cloud Computing. NIST, 2011.[35] T. Glad and L. Ljung, Reglerteknik - Grundläggande Teori, S. AB, Ed. Studentlitteratur

AB, 1981, 2006.[36] H. Lim, S. Babu, and J. Chase, “Automated control for elastic storage,” Proceedings

of the 7th international conference on Autonomic computing, (ICAC), 2010, ControlTheory implementation for cloud computing storage.

[37] G. Galante and L. C. E. d. Bona, “A survey on cloud computing elasticity,” Nov. 2012,pp. 263–270. doi: 10.1109/UCC.2012.30.

[38] R. Hashemian, D. Krishnamurthy, M. Arlitt, and N. Carlsson, “Improving the scala-bility of a multi-core web server,” in Proc. ACM/SPEC International Conference onPerformance Engineering (ACM/SPEC ICPE), Apr. 2013, pp. 161–172.

65

https://doi.org/10.1016/j.future.2012.05.018


https://doi.org/10.1145/2494621.2494630

https://doi.org/10.1145/1348583.1348590

http://doi.acm.org/10.1145/1348583.1348590

http://doi.acm.org/10.1145/1348583.1348590

https://aws.amazon.com/ec2/autoscaling/

https://aws.amazon.com/ec2/autoscaling/

https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/

https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/

https://doi.org/10.1109/UCC.2012.30

comparisonofauto-scalingpoli- …1283941/fulltext01.pdftions using auto-scaling, a technique where...

Documents