using burstable instances in the public cloud: when and · pdf fileusing burstable instances...

Using Burstable Instances in the Public Cloud:When and How?

Neda Nasiriani, Cheng Wang, George Kesidis, Bhuvan UrgaonkarSchool of EECS, Penn State University

University Park, PA{nun129,cxw967,gik2,buu1}@psu.edu

School of EECS Technical Report No. CSE-16-005, May 23, 2016Revised: Sep 12, 2016, Oct. 22, 2016 ∗

ABSTRACTAmazon EC2 and Google Compute Engine (GCE) have re-cently introduced a new class of virtual machines called“burstable” instances that are cheaper than even the small-est traditional/regular instances. These lower prices comewith reduced average capacity and increased variance. Us-ing measurements from both EC2 and GCE, we identify keyidiosyncrasies of resource capacity dynamism for burstableinstances that set them apart from other instance types.Most importantly, certain resources for these instances ap-pear to be regulated by deterministic, though in one caseunorthodox, token bucket like mechanisms. We find widelydifferent types of disclosures by providers of the parame-ters governing these regulation mechanisms: full disclosure(e.g., CPU capacity for EC2 t2 instances), partial disclo-sure (e.g., CPU capacity and remote disk IO bandwidthfor GCE shared-core instances), or no disclosure (networkbandwidth for EC2 t2 instances). A tenant modeling thesevariations as random phenomena (as some recent work sug-gests) might make sub-optimal procurement and operationdecisions. We present modeling techniques for a tenant toinfer the properties of these regulation mechanisms via sim-ple offline measurements. We also present two case studiesof how certain memcached workloads might benefit from ourmodeling when operating on EC2 by: (i) temporal multi-plexing of multiple burstable instances to achieve the CPUor network bandwidth (and thereby throughput) equivalentof a more expensive regular EC2 instance, and (ii) augment-ing cheap but low availability in-memory storage offered byspot instances with backup of popular content on burstableinstances.

1. INTRODUCTIONTo attract more customers (tenants), public cloud providers

offer virtual machine (instance) types that, in different ways,trade off lower prices for poorer capacities. Broadly speak-ing, providers employ two approaches for this. The first ap-proach is based on aggressive revocation of cheaper instancesin favor of more expensive instances when overall demandincreases (resulting in tenants perceiving poorer availabilityfor such revocable instances). Examples of this are AmazonEC2’s well-known spot instances [5] and the more recentGCE preemptible instances [38]. The second approach em-

∗This research supported in part by NSF CNS grant1526133, a DARPA XD3 grant, and Amazon AWS CloudCredits gift.

ploys aggressive statistical multiplexing of multiple cheaperinstances on a single physical server (resulting in tenantsexperiencing higher dynamism in the resource capacity ofthese instances, cf., Appendix 8.2 on cloud point of view).Examples of this are EC2’s “t type” instances (t1 introducedin 2010 and t2 in 2014) and the “shared-core” instance typesintroduced by GCE in 2013 [42, 37]. We collectively referto these as burstable instances for their ability to dynami-cally “burst” (i.e., increase the capacity of) their CPU and,possibly, some additional resources. Whereas revocable in-stances (especially EC2’s spot instances) have been studiedextensively, burstable instances have remained largely unex-plored.

Burstable instances are significantly cheaper1 than eventhe smallest“regular”instances [10, 39]. For example, EC2’ssmallest burstable instance (t2.nano with 1 shared core and0.5GB RAM) costs a mere $4.75 per month compared to$49 per month for its smallest regular instance (m3.mediumwith 1 core and 3.75GB RAM). Burstable instances are in-tended for tenants with very small resource needs (and bud-gets) whether absolute or incremental (the latter for dy-namically scaling out). Unlike regular instances, burstableinstances offer time-varying CPU capacity comprising a min-imum guaranteed base capacity/rate, which is much smallerthan a short-lived peak capacity (for bursting) that becomesavailable upon operating at lower than base rate for a suf-ficient duration. For example, a t2.nano instance offers abase capacity equivalent to a mere 5% of a regular core anda peak capacity of 100%, the latter becoming available forone minute only upon remaining idle for 20 minutes.

Whereas EC2 and GCE are promoting burstable instancesas cheap options for testing/development, small web serverswith intermittent traffic bursts, etc. [1], we believe they cango further and may prove cost-effective resource procure-ment options (while still offering acceptable performancelevels) for a larger class of workloads. To achieve this po-tential of burstable instances, however, a tenant would needto carefully understand the significant additional complex-ity these instances possess beyond that disclosed by theproviders.How are Burstable Instances Different? To clarifythis, we have developed the classification, shown in Table 1,of (i) capacity dynamism for an instance resource, and (ii)the nature of disclosure made by the provider about such

1The per unit resource prices are higher than those for regularinstances.

Type Resource

capacity

Disclosure Examples from EC2 and GCE instance offerings

1 Fixed Full main memory capacity and CPU cycles (except for

burstables); most resources for larger instances

2A Random w/

low variance

Partial or

full

network and memory b/w for larger regular instances

2B Random w/

high variance

Partial or

full

network and memory b/w for smaller instances

(including for burstables)

3A Deterministic Full token-bucket regulated CPU for EC2’s burstables; remote

disk b/w for both regular and burstable instances from

both EC2 and GCE

3B Deterministic None or

partial

token-bucket regulated CPU for GCE burstable instances;

network b/w for EC2’s burstable instances

Table 1: Our classification of resource capacity dynamismfor GCE and EC2 instances along with the nature of disclo-sure made by the provider.

dynamism.

• Type 1: It is well-known from existing measurement stud-ies [46, 19, 13, 8, 16] that, generally, larger and more ex-pensive instances offer close-to-fixed resource capacities,effectively appearing as dedicated machines.

• Type 2: The resources of cheaper/smaller instances tendto exhibit non-negligible temporal variations. From a ten-ant’s point of view, much existing work [46, 19] suggeststhese Type 2 variations are best modeled as externally-controlled random phenomena (whose statistical proper-ties the tenant may need to infer through its own mea-surements).

Burstable instances represent a departure from these con-ventional types: whereas some key resources (one or more ofCPU capacity, network bandwidth, and remote disk band-width 2) exhibit high variations similar to Type 2B, their ca-pacities are in fact regulated via deterministic token bucketmechanisms. That is, the capacity variation for these re-sources depends very much on the tenant’s own actions. Wedepict these as new Type 3 variations in Table 1.

A further aspect of the complexity of burstable instancesrelates to the nature of disclosure made by the cloud providerabout its regulation mechanism. In some cases, the providerreveals its regulation mechanism (Type 3A). E.g., recall theearlier example of CPU capacity for t2 instances for whichEC2 reveals the parameters governing the underlying tokenbucket regulator. EC2 also offers online information aboutthe regulator state via its CloudWatch monitoring system [7](see Section 3.2). More generally, however, we find impor-tant quantitative details lacking in information offered byproviders (Type 3B). E.g., GCE suggests that its f1-micro(and also g1-small) CPU capacity is token-bucket regulatedbut, to our knowledge, neither reveals its parameters noroffers an API to help a tenant application track its stateas EC2 does - in our view, a partial disclosure which mayrequire careful characterization and online inference by ten-ants (see Section 3.3). A case of even more limited disclo-sure arises for EC2’s regulation of network bandwidth for itsburstable instances (see Section 3.1) which it only describesas “low to moderate” (which for an EC2 instance can spanthe range 31-508 Mbps).

2Remote disk bandwidth may be token bucket regulated even forsome regular instances [2, 18] although, again, not necessarilywith full disclosure of the underlying regulation.

Research Contributions: The discussion above moti-vates why a tenant using a burstable instance might finduseful (i) models that let it infer (if not disclosed by theprovider) or confirm (if disclosed) the regulation mechanismsemployed by the provider for the instance’s resources and (ii)precise situational awareness of the state of these regulatedresources. Together, (i) and (ii) could help inform what im-pact the tenant’s own resource usage activities might have onnear-term capacities (rather than mistaking them for purelyexternally-controlled random phenomena). Our goal in thispaper is to develop such an understanding and demonstrateits efficacy via case studies involving realistic workloads. To-wards this, we make the following contributions.

• We present our own measurements of capacity dynamismfor a variety of EC2 and GCE instance types. Generally,our measurements agree with observations in recent stud-ies that burstable instances exhibit higher dynamism innetwork bandwidth and CPU capacity. However, we alsofind some (surprising) examples where this is not the case;we present conjectures for why this may be so.

• We then investigate the high capacity dynamism in CPUand network bandwidth observed for burstable instancesand demonstrate that it results from deterministic regula-tion mechanisms employed by the cloud provider. Specif-ically, EC2 appears to employ an unconventional dualtoken bucket algorithm for regulating its t2 instances’network bandwidth. GCE appears to employ a classicaldual token-bucket regulation for the CPU capacity of itsburstable instances.We present analytical models for theseType 3B regulation mechanisms and show their efficacyfor predicting capacity. We also present a methodology toverify the Type 3A token bucket regulation used by EC2for CPU capacity regulation, with novel insights aboutan unconventional token expiration mechanism that EC2appears to employ.

• We present principled approaches for tenants to use thesemodels for exploiting burstable instances as effective re-placements of or supplements to regular instances. Wedemonstrate the efficacy of these ideas via two case stud-ies, both based on memcached. (i) Temporal multiplexingof multiple burstable instances may allow us to achievethe CPU or network bandwidth (and thereby throughputmodulo low congestion levels) equivalent of a regular EC2instance at 25% lower cost. (ii) Augmenting cheap butlow availability EC2 spot instances with passive backupof popular content on burstable instances may offer im-proved failure recovery (with lower performance degrada-tion) than using regular instances for such backup.

Outline: The rest of this paper is organized as follows. InSection 2, we present salient findings from our own measure-ments of capacity dynamism in EC2 and GCE instances. InSection 3, we focus on validating or inferring token bucketregulation for burstable instances. In Section 4, we presentour case studies of memcached workloads that might findburstable instances useful. In Section 5, we discuss relatedwork. Finally, we present our conclusions in Section 6.

2. BACKGROUNDWe have carried out extensive measurements of temporal

variations in raw resource capacities for a variety of EC2 andGCE instance types. Several similar studies have also beencarried out in other recent work [8, 16, 21, 15, 26, 14, 31,

32, 35, 36, 15, 49, 24]. In this section, we highlight a repre-sentative subset of our findings with a focus on results thatare relevant to our current interest in burstable instances;this includes comparing the capacity variations for burstableinstances against those for regular instances.Current Burstable Offerings: EC2 offers 5 different typesof burstable instances (t2 type) having 1-2 virtual cores (vC-PUs) with base capacities in the range 5-60% of a regularEC2 vCPU and RAM capacity in the range 0.5-8 GB. GCEoffers 2 shared-core types with bursting capability: f1-microwith 0.6GB RAM and 1 vCPU and g1-small with 1.7GBRAM and 1vCPU. GCE only discloses that these vCPUsare shared (i.e., lower in capacity than a full core with theability to burst opportunistically) but does not offer a quan-tification of their base rates and bursting capabilities. It ap-pears reasonable to expect GCE to offer more types soon andother providers (such as Microsoft Azure and IBM Bluemix)to also offer similar instances.

We list the instance types considered in our measurementstudy in Table 2. We restrict ourselves to instances with 8 orfewer vCPUs; larger instances generally show low capacityvariations and are not of interest to this study. We cover allburstable instances currently offered. For regular instances,we have representatives from different classes including: (i)general-purpose, (ii) compute-optimized, and (iii) memory-optimized instances. For ease of comparison across EC2 andGCE offerings, each instance is listed along with its closest“iso-capacity”counterpart (specifically, in term of advertisedcapacities for CPU and main memory) from the other cloudprovider 3.

EC2

Instance

vCPU

(count)

RAM

(GB)

Price

($/hr)

Price

($/hr)

RAM

(GB)

vCPU

(count)

GCE

Instance

t2.nano 1 0.5 0.0065 0.006 0.6 1 f1-micro

t2.micro 1 1 0.013

t2.small 1 2 0.026 0.021 1.7 1 g1-small

t2.medium 2 4 0.052

t2.large 2 8 0.104

m3.medium 1 3.75 0.067 0.038 3.75 1 n1-sd-1

m3.large 2 7.5 0.133 0.076 7.5 2 n1-sd-2

m3.xlarge 4 15 0.266 0.152 15 4 n1-sd-4

m3.2xlarge 8 30 0.532 0.304 30 8 n1-sd-8

c3.large 2 3.75 0.105 0.058 1.8 2 n1-hcpu-2

c3.xlarge 4 7.5 0.21 0.116 3.6 4 n1-hcpu-4

c3.2xlarge 8 15 0.42 0.232 7.2 8 n1-hcpu-8

r3.large 2 15 0.166 0.096 13 2 n1-hm-2

r3.xlarge 4 30.5 0.333 0.192 26 4 n1-hm-4

r3.2xlarge 8 61 0.665 0.384 52 8 n1-hm-8

Burs

table

G

ener

al

Com

p-O

pt

Mem

-Opt

Table 2: Nearly iso-capacity instance types offered by EC2 andGCE for instances with ≤ 8 vCPUs. Here the suffixes {sd, hcpu,hm} denote {standard, high CPU, high memory}, respectively.Burstable, general-purpose, compute-optimized, and memory-optimized instances are grouped by the labels {Burstables, Gen-eral, Comp-Opt, Mem-Opt}, respectively.

Next, we describe the details of our measurement method-ology and our main findings.

3The presented prices for EC2 instances are current listingsfor its us-east-1 region. Also note that capacity configura-tions in co-listed instance types are close but not exactlyidentical. This must be kept in mind when comparing theirperformance. For details of resource configurations, see [4,28]

Network Bandwidth: We use the iperf benchmark [22]which is designed to probe available bandwidth over time.We use iperf with its default TCP connection window size of325 KByte. TCP congestion control may affect measuredthroughput depending on other network traffic (and thiswould appear to be a random effect to the tenant). Weconduct our experiments at different times of day and ondifferent days to ensure what we observe is indeed due toEC2’s regulation mechanism and not due to TCP conges-tion control. We make sure that our measurements are notaffected by TCP flow control mechanisms by checking thenumber of retries for each transmission and ensuring theyare negligible. We choose all instances from the same avail-ability region4 us-east-1d. We place one end of iperf on thetarget instance and the other on a well-provisioned instance5

to measure the uplink bandwidth available to the target in-stance. We record measurements over 1 second intervals foran hour.

Findings: We show our network bandwidth measurementsin Figures 1(a) and (d) for EC2 and GCE, respectively.We show 25th, 50th, and 75th percentiles, as well as min-imum and maximum (shown by horizontal bars) and out-liers (shown by cross markers). We find that the networkbandwidth allocations of EC2 burstable (t2) instances havemuch higher variations than those for regular instances -see region R1, Figure 1(a). Surprisingly, we find that GCEburstable instances show very low variation. GCE claimsto offer 2Gbps network bandwidth per vCPU (including forits burstable instances) and we find this to hold for mostof instance types except for instances with 8 vCPUs wherewe find noticeable variation - see region R4, Figure 1(d).We postulate that this could be due to one or both of thefollowing: (a) GCE manages its network more carefully andefficiently [40], (b) having access currently to a smaller shareof the public cloud marketplace than EC2, GCE is offeringhigher network bandwidth at lower cost to attract more cus-tomers.CPU/Memory Resources: We discuss two resources here:bandwidth to main memory (a hardware-managed resource)and CPU cycles per unit time (which we refer to simply asCPU capacity); we have also studied cache bandwidth whichwe omit for space. For recording memory bandwidth, we runthe STREAM benchmark [41] copy operations using all vC-PUs available to the target instance. For measuring CPUcapacity, we use SysBench [23] to calculate prime numbersless than 1000 and use the recorded execution time as ourproxy for CPU capacity. In case of instances with multiplevCPUs, we run one copy of the benchmark per vCPU andrecord each execution time.

Findings: We present our measurements for main mem-ory bandwidth and CPU capacity in Figures 1(b),(e) forEC2 and (c),(f) for GCE, respectively. We show histogramsas well as mean ± standard variation using horizontal bars.For regular instances, we find that, generally, an EC2 in-

4Amazon has 10 completely isolated data centers (“regions”)around the globe, with each region divided into multiple iso-lated availability zones which are partly connected throughlow-latency links. Geo-distributed regions offer tenantsthe flexibility to place their instances closer to their end-customers, or closer to one another. Similarly GCE has 5regions and 15 availability zones.5We choose m4.10xlarge for EC2 and n1-standard-16 forGCE which have guaranteed 10 and 32 Gbps, respectively.

R1

Burstable General Comp-Opt Mem-Opt

(a) network b/w, EC2

R2


(b) memory b/w, EC2

R3


(c) CPU capacity (reciprocal), EC2


R4

(d) network b/w, GCE


R5

(e) memory b/w, GCE


R6

(f) CPU capacity (reciprocal), GCE

Figure 1: A subset of our measurements depicting capacity dynamism for several instance types and three instance resources:network bandwidth, main memory bandwidth, and CPU cycles (as captured via execution time of the sysbench bench-mark [23]). The top row shows our measurements for a large number of EC2 instances whereas the bottom row for GCEinstances. Instances grow in size from left to right with the smallest ones given a single shared core and less than 1GB RAM.We use boxplots for network bandwidth. We use histograms for memory bandwidth and CPU capacity with the horizontalbars indicating mean ± standard deviation. The abbreviations {n, u, S, M, L, XL, 2XL, sd, hcpu, hm} denote{nano, micro,small, medium, large, xlarge, 2xlarge, standard, high CPU, high memory}.

stance and its GCE counterpart (Table 2) indeed possessnearly equal mean capacities. However, there may be signif-icant differences in their capacity variances. Somewhat sur-prisingly, we find higher variation in memory bandwidth forGCE regular instances than its burstable instances whereasthis is not so for EC2 - compare regions R2 and R5. Com-bined with our earlier observation about how network band-width offerings by EC2 and GCE compare, this suggestssignificantly different consolidation strategies being used bythese two providers. We are tempted to postulate thatGCE’s burstable instances, being relatively recent, are notyet heavily used and hence experience plentiful memory band-width allocation with low variability. Finally, per Figure 1(c),EC2 burstable instances (particularly t2.nano and t2.micro)show high variation in CPU capacity whereas GCE’s burstableinstances do not - compare R6 vs. R3.Type 2 vs. Type 3 variations: The regions R1-R6 in Fig-ure 1 represent examples of capacity dynamism that is highenough that a performance-sensitive tenant might need to beaware of it (and model it well). There is a crucial difference,however, between the nature of the dynamism in regionsR1, R3, and R6 versus the rest. To clarify this, we comparethe temporal evolution of the memory bandwidth of a GCEn1-sd-8 instance (Figure 2(a)) with the network bandwidthof an EC2 t2.medium instance (Figure 2(b)). Clearly, theformer is Type 2 while the latter is Type 3.

Time (sec)0 200 400 600 800 1000 1200 1400

Mem

. ban

dw

idth

(M

bp

s)

×105

0

0.5

1

1.5

2

2.5

3

(a) Type 2 variation for GCEn1-sd-8.

Time (sec)0 200 400 600 800 1000 1200 1400

Net

. Ban

dw

idth

(M

bp

s)

0

200

400

600

800

1000

(b) Type 3 variation for EC2t2.medium.

Figure 2: Comparison of Type 2 and Type 3 resources.

To appreciate the significance of identifying and under-standing Type 3 variations, we offer an example involvingthe CPU capacity of EC2’s t2.micro instance, a Type 3Aresource. Using information about regulation revealed byEC2, in Figure 3(a), we let the instance idle for 20 minutesresulting in accumulation of 20(6/60) = 2 “credits”. In thiscase the tenant is able to burst at 100% CPU utilization for 2minutes. In Figure 3(b), we let the instance idle for 50 min-utes thereby accumulating 50(6/60) = 5 tokens. In this case,it is able to burst for 5 minutes. That is, a workload usinga burstable resource would benefit from an understanding

0

1

2

3

4

5

6

0

20

40

60

80

100

120

0 5 10 15 20 25 30 35 40 45 50 55

Nu

m.

of

toke

ns

CP

U (

%)

Time (minutes)

CPU

tokens

0

1

2

3

4

5

6

0

20

40

60

80

100

120

0 5 10 15 20 25 30 35 40 45 50 55

Nu

m.

of

toke

ns

CP

U (

%)

Time (minutes)

CPU

tokens

Figure 3: Illustrative experiments using an EC2 t2.micro in-stance to explain the complexity of CPU capacity dynamismand the tenant’s own role in it.

of the regulation mechanism as well as relevant “situationalawareness” (e.g., an estimate of the pending number of cred-its in our example) and the impact of its own resource usageon this state.

3. VALIDATION AND INFERENCE OF TYPE3 CAPACITY DYNAMISM

In this section, we present techniques for validating (whendisclosed by the provider, i.e., Type 3A) or inferring (Type3B) the regulation of network bandwidth and CPU capacityfor burstable instances. We consider similar validation orinference problems for remote disk IO bandwidth regulationin Appendix 8.1.

3.1 Network Bandwidth of EC2 BurstablesTo further understand EC2’s network bandwidth regula-

tion mechanism, we conduct a set of experiments wherein,rather than letting our t2.micro instance always transmitdata at the maximum rate available to it, we stop datatransmission at t =180 seconds, the duration after which theinstance appears to reach a fixed transmission rate in Fig-ure 2(b). We then let the instance idle for differing amountsof time before resuming transmission at available capacity.In Figure 4, we show the outcome of one such execution.We see that after idling for 360 seconds, the instance is ableto accomplish transmission behavior identical to that duringthe first 180 seconds. Together, Figures 2(b) and 4 suggestthe following:

• There appears to be a token-bucket like policing mecha-nism for the network bandwidth of t2 instances.

• There are some initial tokens in the bucket which are fullyused up after 180 seconds of transmission at the availablerate.

• Tokens are replenished after idling for some time; this re-plenishment mechanism needs to be investigated in depth.

• The regulation mechanism cannot be fully modeled usinga“conventional”token bucket - compare our observation ofEC2’s regulation mechanism vs. a classical token bucketin Figure 4 (the latter in dotted line).

The classical dual token-bucket mechanism (e.g., [20]as a shaper not marker) has three parameters: (i) a peakrate (Π bytes/s), (ii) a sustainable (or committed) rate (Mbytes/s), and (iii) a bucket size (B bytes) associated withthe sustainable rate 6. Ideally, one would expect to see apiece-wise constant throughput profile θ corresponding to

6The token bucket associated with the peak rate has a smallsize (typically corresponding to the maximum size of a singlepacket ≈ 1500 bytes) � B.

Time(sec)0 200 400 600 800

Mea

sure

d B

W(M

bp

s)

0

200

400

600

800

1000

1200T2.MediumClassic Token Bucket

Figure 4: An illustration of the difference between observedregulation and what one would expect from a classical tokenbucket (shown in dotted line) for network bandwidth of anEC2 t2.medium instance.

an initially full token bucket (b(0) = B), i.e., θ(t) = Π fort < T := B/(Π −M) else θ(t) = M . What we instead ob-serve is a gradual reduction in throughput from the peak Πto M over a period of 180 seconds for all 5 types of EC2burstable instances.

3.1.1 A Dynamic-Peak Dual Token-BucketA reasonable hypothesis is that the peak-rate allocation

(π) 7 dynamically depends on the token-bucket occupancy(b), i.e., π = φ(b) for some increasing function φ. In the fol-lowing, we call this mechanism “Dynamic-Peak Dual Token-Bucket Regulation”. The classical token bucket dynamicswould be b = M − θ, where throughput θ ≡ π in our dis-cussion. See Figure 4 for an illustration of this differencebetween the observed regulation and that a classic tokenbucket would offer for a t2.medium instance.

In order to characterize such dynamic-peak dual token-bucket regulation for network bandwidth, we need to modelthe following two aspects of its operation: (i) the mechanismfor diminishing peak-rate (π) and (ii) rate of token replen-ishment as a function of idling time (s), when idling timestarts from the time when the bucket is empty (b = 0).

3.1.2 Diminishing Peak RateWe try to characterize the observed diminishing peak-rate

for uplink bandwidth (π) based on elapsed time from whenπ = Π as a function of the state of the bucket (b(t)) andhence an indirect function of time (φ(b(t))). We have sim-ilarly checked the downlink bandwidth of t2 instances andobserved the same policing, hence we just describe the up-link bandwidth hereafter.

First we model the observed decrease in peak-rate allo-cation π using a deterministic concave function, for whenthe peak rate (π) is higher than the sustainable rate (M)(hence the token bucket occupancy (b) is nonzero). If we

model θ = π = φ(b) = M + wbz = −b+M , for parametersz, w > 0 so that φ is increasing, then we can solve this first-order differential equation by separating variables, with theinitial conditions b(0) = B and b(T ) = 0, to get,

b(t) = B(1− t/T )1/(1−z), (1)

7That is, π is the dynamic rate at which tokens arrive to the(small-sized) token bucket that performs peak-rate shaping.

time (sec)0 20 40 60 80 100 120 140 160 180

Pea

k-R

ate

(Mb

ps)

300

400

500

600

700

800

900

1000

samples

π (t) = 745.8 (1 - t/180)0.3 + 254

Figure 5: An illustration for a t2.medium instance of agree-ment between our model and observations of peak rate evo-lution.

where w = B1−z/T = (Π−M)/Bz. The observed concavityof π (π < 0) requires z/(1− z) < 1, i.e.,

π(t) = M + (Π−M)(b(t)/B)z, for 0 < z < 12. (2)

Substituting (1) and B = T (Π−M) into (2) gives

π(t) = M + (Π−M)(1− t/T )β , 0 < β =z

1− z < 1. (3)

This model accurately characterizes EC2’s Dynamic-PeakDual Token-Bucket mechanism governing network IO. Asan example, for instance type t2.medium we find β = 0.3knowing that Π = 1000 Mbps and M = 254 Mbps. AsFigure 5 shows, our model and measurements agree well. Weobserve the same parameter β for all 5 types of t2 instanceswhich suggests more rapid drop in π(t) for smaller instances.

Instance type

EC2’s net.

Bandwidth SLA

Inferred token-bucket parameters

𝚷 (Mbps) M (Mbps) 𝑺 (sec)

t2.nano Low to moderate 521 32 1700

t2.micro Low to moderate 1000 63 1900

t2.small Low to moderate 1000 127 900

t2.medium Low to moderate 1000 254 350

t2.large Low to moderate 1000 508 120

Table 3: Measured peak and sustainable rates and inferred token-bucket parameters for EC2 burstable instances. For all instancesα = 0.3,β = 0.25. S is upper-bound on (s), the idling time toreplenish the peak-rate completely (π = Π).

3.1.3 Token ReplenishmentWe conduct another set of experiments to help us charac-

terize the token replenishment mechanism. In each exper-iment, we pair a freshly started target burstable instancewith an m4.10xlarge instance and start transmitting fromthe burstable until the rate has reduced to the sustainablerate (after 180 seconds for all instance types). We inter-pret this event as a sign that the token bucket has beenemptied. We report network bandwidth measured over 200second duty cycles (diminishing peak rate and then sus-tainable rate) spaced by increasing idle periods of length

time (sec)0 200 400 600 800 1000 1200 1400

Ban

dw

idth

(M

bp

s)

0

200

400

600

800

1000

(a) How restored peak rate grows with pre-ceding idle period.

Idle time (sec)0 50 100 150 200 250 300 350

Res

tore

d P

eak

Rat

e (M

bp

s)

200

400

600

800

1000

Samples

f(s) = 172.4 s0.25 + 254

R2 = 0.99

(b) An illustration for a t2.medium instanceof the agreement between our model and ob-servations of how restored peak rate dependson the preceding idle period.

Figure 6: Understanding the peak rate replenishment mech-anism for t2 instances’ network bandwidth.

s ∈ {0, 20, 40, ..., 400} seconds. We show a snippet of the re-sult for a t2.medium instance in Figure 6(a). We see that theabsolute peak in a duty cycle increases with the length of theprevious idle period, and thereafter the same diminishing-peak profile follows.

We plot the replenished peak-rate versus different idlingtimes s for a t2.medium instance type in Figure 6(b). It isevident that replenished peak-rate is a non-linear functionof idling time (s ≤ b/M). We also observe in Figure 6(a)that after achieving the restored peak-rate (π > M), its net-work bandwidth characteristic completely follows the deter-ministic profile of π(t). Based on the observed concavity ofrestored peak-rate we choose the following general form forthe model,

f(s) = (Π−M)(s/S)α +M, 0 < α < 1, (4)

where S is the upper-bound on s, which we also found to beaccurate for EC2 burstables.

For instance type t2.medium, we find that S = 350 sec-onds and α = 0.25 as plotted in Figure 6(a). Again, weobserve the same parameter α for all t2 instances resultingin larger S for smaller instances. We present the measuredclassical token bucket parameters of all five t2 instance types(with T ≤ 180 seconds) and associated S values in Table 3.

The maximum idling time (S) for larger instances is smaller.We postulate this as a result of the same size bucket for allt2 instance types and since larger instances have higher sus-tainable rate, the replenishing time for larger instances islower.

3.2 CPU Capacity of EC2 Burstable InstancesCPU capacity of EC2 burstable instances represents an

example of Type 3A. These instances are given a baselinelevel of CPU capacity with the ability to burst up to peaklevel based on their CPU “credits”. EC2 refers to tokensas credits given to the instance. Henceforth, we will simplyuse the term token. CPU tokens are credited continuouslyat a given rate (which is different for different instances).The parameters for this CPU token bucket mechanism - asrevealed by EC2 - are presented in Table 4. As in a classi-cal token bucket, tokens are accumulated when the instanceCPU usage is below the baseline. One CPU token providesthe ability to burst at full capacity of one or two cores (if ap-plicable) for one minute. EC2 offers two metrics related toCPU token-bucket regulation of t2 instances via its Cloud-Watch API [7]. (i) CPUCreditUsage which tracks the expen-diture of tokens over time and (ii) CPUCreditBalance whichtracks the accumulation of tokens over time. Additionally,each instance is given some initial CPU tokens.

instance type

𝒄𝒊 (tokens/min)

initial

tokens

bucket size

(tokens)

CPU capacity (%)

base (M) peak (𝚷)

t2.nano 360 30 72 5 100

t2.micro 660 30 144 10 100

t2.small 1260 30 288 20 100

t2.medium 2460 60 576 40 200

t2.large 3660 60 864 60 200

Table 4: EC2 burstable instances’ CPU token-bucket parame-ters.

Token Expiration Mechanism: One way in which EC2regulation deviates from the classical token bucket is thefollowing (also revealed by EC2): accumulated CPU tokenswill expire after 24 hours. We verify this through our experi-ments. Taking an illustrative example of a t2.micro instance,the bucket size for this instance type is 144 tokens, which isfilled at 6 tokens/hour. We conduct an experiment whereina full bucket is achieved by letting the instance CPU to beidle for 24 hours. Afterwards, by using the CPU at base rate(which is 10% of a regular core) is achieved using the look-busy benchmark [27], we expect to see a constant CPUCred-itBalance if a conventional token bucket were being used.However, we observe that the CPUCreditBalance depletesat 6 tokens/hour, matching the rate of token accumulation.We postulate that there is a timer for the bucket which is re-set at each underflow of the bucket, and expired (older than24 hours) tokens are discarded according to a First In FirstOut (FIFO) mechanism. We further postulate that tokensare consumed by the instance according to a Last In FirstOut (LIFO) mechanism. Token expiration does not impactutilizing the CPU at the base/sustainable rate. This time-based expiration appears peculiar to t2 CPU and we do notobserve it for the network bandwidth regulation.

3.3 CPU Capacity of GCE Burstable Instances

GCE offers two instance types under its “shared-core” cat-egory, namely “f1-micro” and “g1-small”. In contrast withEC2 (cf., Section 3.2) GCE does not reveal the specifics ofthe CPU bursting capacity for these instance types, insteadjust choosing to state that “GCE offers bursting capabili-ties that allow instances to use additional physical CPU forshort periods of time”[28]. We design experiments to help usunderstand the CPU capacity regulation of these instances.

We consider the temporal profile of CPU utilization ob-tained upon running the SysBench benchmark on GCE’sburstable instances. As seen in Figure 7(a) for f1-micro,there is a sudden jump in execution time after ≈ 30 seconds.Similar to observations we made for Amazon EC2 t2 offer-ings in Section 3.2, we suspect that GCE burstable instancesmight have their CPU capacities regulated by a classical dualtoken-bucket mechanism with parameters (Π,M,B) whichneed to be inferred.

For this, we need to overcome additional complications.First, unlike EC2 t2 instances, standard utilization tools(e.g., the top command) in GCE burstable instances do notreveal absolute CPU capacity, but the amount of utilizationin terms of virtualized CPU capacity. So running a CPU-intensive workload within these instances will cause the CPUto be close to 100% utilized even if the physical CPU capac-ity varies considerably over time. Second, unlike networkbandwidth regulation, we are not dealing with packets ofdata whose sizes are precisely known. Instead we must finda reasonable way to account for CPU cycles. Towards this,we need to define a basic unit of work that corresponds to arelatively stable number of cycles of a CPU core. We chooseas this unit of work the determination of all prime numbersless than 500 using the SysBench CPU benchmark. Recallthat a classical token-bucket algorithm allows for a burst-ing rate Π up to Tmax = B/(Π − M) folllwing which thesustainable rate M may be achieved indefinitely.

First, to establish the existence of a token bucket regu-lation for CPU capacity, we need to look at the temporalprofile of consecutive runs of units of work (as identifiedabove) on burstable instances. We can identify the rate pa-rameters (e.g. Π as bursting rate and M as sustainable rate)in terms of our unit of work, and consequently maximum al-lowed bursting time (Tmax) for the token bucket algorithmimposed by GCE for that instance. We use a freshly startedf1-micro instance and measure the performance of consecu-tive runs of unit of process (resulting profile shown in Fig-ure 7(a)). We make the following observations: (i) Thereis a possible token-bucket regulation for CPU; (ii) there isa non-empty bucket when an instance is started, (iii) thereis an abrupt jump in execution time for the same unit ofprocess after about 170 executions; we use a threshold de-fined as γ = 0.5 sec, to be able to track this jump; andfinally (iv) there is much more variation in execution timewhen it is t > γ. This could be a result of more throttlingby the hypervisor when the instance is out of tokens (costof imposing the regulation). Based on these observations inFigure 7(a), we interpret execution time exceeding γ as asign of the token bucket being empty. In this figure, justone round of execution is shown but the same behavior isobserved in more extensive runs of this experiment.

We consider two types of experiments to help us under-stand the underlying model for GCE bursting instances.Experiment A: We look at performance profile of consec-

sysbench task index0 200 400 600 800 1000

Exe

cuti

on

-tim

e(se

c)

0

0.2

0.4

0.6

0.8

1

1.2

performanceγ

(a) Measured performance of Sys-Bench for f1-micro.

execution-index (i)0 200 400 600 800 1000 1200 1400 1600 1800 2000

Exe

-tim

e (s

eco

nd

s)

0

0.2

0.4

0.6

0.8

1

(b) SysBench measurements with dif-ferent idling times for f1-micro.

Idle-time(sec)0 50 100 150

Bu

cket

-sta

te(t

oke

ns)

0

20

40

60

80

100

120

140

measurements

C = 1.08s, R2 = 0.9983

(c) Measured vs. theoretical replen-ishment rate for f1-micro.

Figure 7: Measurement results for inferring f1-micro instance token bucket parameters.

utive runs of a unit of work for multiple instances to help usderive the rate parameters and maximum bursting time forthe token-bucket regulation. Experiment B: This experi-ment helps us verify our inferred parameters are correct bylooking at the replenishing rate of the bucket vs. sustainablerate calculated based on our observations from ExperimentA.

In order to get the rate parameters in terms of our basicunit of work, we run 8 instances (all freshly started) andmeasure the execution time for consecutive runs of unit ofwork. Using γ to indicate an empty bucket, we calculate therates M and Π using the following equations:

Π =

∑i,ti<γ

1∑i,ti<γ

ti,M =

∑i,ti>γ

1∑i,ti>γ

ti, (5)

where ti is the execution time of the ith unit of process. Weget the following values for f1-micro instance type:

M = 1.08 tokens/sec,Π = 5.42 tokens/sec, Tmax = 30.85sec.

Thus, the capacity of the bucket is

B = (Π−M)Tmax = 133.89token.

To verify, we run experiment B to measure the replen-ishing rate for f1-micro instance type. Each execution cycleconsist of 200 consecutive units of process spaced by increas-ing idle periods of length s ∈ {0, 20, 40, ..., 140} seconds. Apart of the temporal profile of experiment B measurements isshown in Figure 7(b). The amount of bursting (running withexecution time ≈ 0.2 sec) increases as idling time increases,suggesting more token accumulation with higher idling time.

The periodic state of bucket b(s) as a function of idlingtime s after it has drained (b = 0) can be observed in Fig-ure 7(c): if b(0) = 0 then b(s) = Ms = (Π−M)Tmax(s) (clas-sical token bucket), where Tmax(s) is the maximum bursttime after idling for s seconds, so that b(s + Tmax(s)) = 0again.

The results of our measurements vs. standard replenishingrate of the token bucket can be seen in Figure 7(c), whichsuggests acceptable agreement between empirical observa-tions and our modeling. We use the same methodology forthe g1-small instance type to derive the following CPU to-ken bucket parameters for it: M = 2.7 tokens/sec, Π = 5.44tokens/sec and Tmax = 120.43 sec with γ = 0.25 sec.

4. EXPLOITING BURSTABLE INSTANCES

We present two case studies wherein using EC2 burstableinstances as replacements of regular instances or to supple-ment them carefully might yield cost or performance ben-efits. Both our case studies are based on memcached, apopular distributed in-memory key-value store that is usedto construct the caching tier of large-scale storage applica-tions [29]. For both case studies, we present: (i) argumentsbased on our earlier characterization for cost or performanceimprovements, (ii) the specific type of memcached workloadthat we expect to benefit, (iii) necessary enhancements tomemcached to realize these gains and any online regulatorstate that might be useful, and (iv) measurements from ex-periments on our EC2 based prototypes and how they matchup against our expectations.

At the heart of our case studies are the following two prop-erties of burstable instances:

• Property A: Whereas burstable instances are more ex-pensive than regular instances when one considers theirbase/sustainable network bandwidth and CPU capacities(as expected), they are significantly cheaper when one con-siders their peak capacities.

• Property B: For every dollar invested, burstable instancesoffer much higher network bandwidth or CPU capacity perunit RAM than regular instances.

4.1 Case Study 1: Temporal Multiplexing ofBurstable Instances

1:45 3:30 5:15 7:00 8:45 10:30 Time (min)

Ban

dw

idth

(M

bp

s) 1000

828

Time (sec)

1 vCPU

1:45 3:30 5:15 7:00 8:45 10:30

Figure 8: Staggering 3 t2.medium instances to achieve 1vCPU,>900Mbps network IO, and 12 GB RAM on standby.

4.1.1 When?Many memcached workloads exhibit highly skewed popu-

larity distributions, hence popular (i.e., “hot”) content usu-ally represents a small fraction of overall cache size. A com-mon workload pattern for memcached is to have a small hotset (i.e., small memory needs) but high request rate requir-ing high CPU and network bandwidth. Additionally, if theworkload happens to be read-intensive, then it becomes fea-sible to replicate the hot content and be able to maintainconsistency among these replicas at low cost/performanceoverheads. Given property A, these replicas could be lo-cated on multiple identical burstable instances that could be“staggered” in time to offer the CPU and network bandwidthequivalents of a more expensive regular instance (with allo-cation corresponding to the peak of the burstables). Figure 8offers an illustration for three t2.medium burstable instancesstaggered in this manner whereby a fully token-provisionedinstance is on standby indefinitely.

4.1.2 How?We describe a simple methodology to translate the above

basic idea into a practically useful technique. Consider amemcached workload with a small highly popular hot set.Such a workload has low RAM needs but requires high CPUcapacity and network bandwidth to meet low latency goals.If using EC2 burstable instances, two token-bucket regula-tors (for CPU and network bandwidth) are involved andusually the limiting one will dictate the staggered instancesscheduling plan. For example, a tenant could stagger threet2.medium burstable instances so that a fully token-provisionedinstance is on standby at the required frequency. The cur-rent total cost of three t2.medium instances is $0.156/hr,whereas that of an approximately equivalent regular instance8

cost $0.201/hr. That is, the use of burstable instances isabout 22% cheaper and also offers higher network band-width.

We present a way to stagger multiple identical burstableinstance to achieve the equivalent of 1 vCPU and 2 vCPU ca-pacity “on demand” in Table 5. The number ni of burstabletype-i instances needed to get δΠi CPU capacity, for anyfraction 0 < δ ≤ 1, can be calculated as follows:

(ni − 1)tci ≥ tδΠi

Πi⇒ ni ≥

δ

ci+ 1, (6)

where t is the burst-time for each instance. Note that (ni −1)t (< 24× 60 minutes) corresponds to the size of the tokenbucket for EC2 burstable instances, recall Table 4.

Now consider a network-intensive workload with averagenetwork bandwidth requirement d Mbps, where M < d < Π.We can operate each burstable in a tighter range (dmin, dmax)with Mi < dmin < dmax < Πi

d =

∫ π−1i (dmin)

π−1i (dmax)

πi(s)ds

π−1i (dmin)− π−1

i (dmax)≥ dmin + dmax

2. (7)

For a staggered arrangement ofmi type-i burstable instances,to achieve d Mbps network bandwidth on average, each in-stance is allowed to burst for ωi seconds (reduce to dmin) and

8The cheapest EC2 instance to offer equivalent CPU capac-ity is m3.medium which has 296Mbps network IO, 1 vCPUand 3.75 GB RAM, whereas three burstable t2.medium in-stances provide much higher network IO and a total of 1vCPU and 12 GB RAM.

Instance

type

𝒄𝒊 (tokens/min)

# of Instances

1 vCPU 2 vCPU

t2.nano 360 21 NA

t2.micro 660 11 NA

t2.small 1260 6 NA

t2.medium 2460 3 4

t2.large 3660 2 3

Table 5: Number of instances required to achieve 1 vCPU or 2vCPU (if applicable), by staggering burstable instances.

needs to idle for τi seconds to replenish its tokens (increaseto dmax). These values are calculated as follows:

ωi(dmin, dmax) = π−1i (dmin)− π−1

i (dmax),

τi(dmin, dmax) = (Si − f−1i,s (dmin))− (Si − f−1

i,s (dmax)),

where π(·), f(·) are defined in equations (3) and (4) respec-tively. Note that a staggering plan using mi burstable in-stances of type i to achieve d Mbps network bandwidth isfeasible if and only if

(mi − 1)ωi ≥ τi.

We find the best scheduling plan by minimizing mi overdmin, dmax. In Figure 9, we show an example

(ωt2.medium, τt2.medium) = (31, 56.5)

(ωt2.large, τt2.large) = (24, 21.1)

for network bandwidth d ≈ 750 Mbps and dmin = 700Mbps and dmax = 800 Mbps. The optimal scheduling plan(ωi, τi,mi) for network bandwidth requirement d Mbps is:

mi =

⌈min

dmin,dmax

(τi(dmin, dmax)

ωi(dmin, dmax)

)⌉+ 1 s.t. (7). (8)

We show the result of this optimization problem in Fig-ure 9(b), considering instance types i ∈ {t2.micro, t2.small,t2.medium, t2.large} and network bandwidth requirementsd ∈ {100, 200, · · · , 900}. As expected, fewer higher capacityinstances are needed to achieve the same network bandwidthd, however lower capacity instances are naturally cheaper9.

To schedule a CPU and network IO intensive workloadwith resource requirements (δΠi CPU, dMbps network band-width) we need max{ni,mi}, identical instances of type-i,where ni and mi are calculated from (6) and (8) respectively.These instances are allowed to burst for = ωi sec and willbe idle for = (max{ni,mi} − 1)ωi sec.

4.1.3 Example application: Staggered burstables op-erating memcached

Consider a memcached server caching 7GB data. Assumethat this server needs to service queries with constant ar-rival rate λ =64 Kops/sec which are 100% read (get()) op-erations with average latency requirement <700 µsec and99th percentile latency to be < 2 msec. Assume that thekey popularity follows, a uniform distribution and the sizeof values are 1600B resulting in a network bandwidth in-tensive workload (d ≈800Mbps). The CPU requirements to

9Note that for solving this optimization problem we do notconsider any constraints on ωi however some constraintsmight be imposed on ωi based on workload requirements.

𝝎𝐭𝟐.𝐌

𝝉𝐭𝟐.𝐌

𝝎𝐭𝟐.𝐋

𝝉𝐭𝟐.𝐋

𝝅=𝒅𝐦𝐢𝐧

𝝅=𝒅𝐦𝒂𝒙

(a) An example for feasible scheduling plan of staggeredinstances using t2.medium denoted as t2.M and t2.largedenoted as t2.L.

d (Mbps)100 200 300 400 500 600 700 800 900

Nu

mb

er o

f In

stan

ces

0

2

4

6

8

10

12

14t2.larget2.mdiumt2.smallt2.micro

(b) Selected result for d ∈ {100, 200, · · · ,900} Mbps vs.optimal number of instances of each type.

Figure 9: Optimal scheduling of staggered burstable in-stances.

serve this workload based on our workload characterizationis 100% out of 2vCPUs for t2.large instance type. The re-source demand of this server is (1vCPU, 7GB RAM, 800Mbps).

We use Yahoo! Cloud Serving Benchmark (YCSB) [48]to generate 64 Kops/sec requests. The client is procuredon a well-provisioned instance (c4.4xlarge) with (16 vCPUs,30GB RAM, >4900Mbps) to avoid any bottlenecks at theclient end.

We study the efficacy of our proposed methodology (Prop)and compare it against the following two baseline approaches(Baseline I and Baseline II).

Prop: In this case we use staggered burstable instances toachieve 1vCPU. Based on our static analysis (Table 5) thenumber of burstables needed is, nt2.large = 2 instances10.Next in order to achieve the required network bandwidth

10Note that the maximum bursting/idling time is B/24×60corresponding a full token bucket (B).

1:19 2:38 3:57 5:16

Ban

dw

idth

(Mb

ps)

900

700

1:19 2:38 3:57 5:16 Time (min)

100%

CP

U

t2.l

arge

t2.l

arge

t2.l

arge

t2.l

arge

t2.l

arge

t2.l

arge

t2.l

arge

t2.l

arge

(a) Prop: Staggered.

BW

. (M

bp

s) 508

60%

CP

U

BW

. (M

bp

s) 508

60%

CP

U

Time (min)

t2.l

arge

t2

.lar

ge

𝝀

𝝀

𝟐

𝝀

𝟐

LB

(b) Baseline I: Parallel.

Figure 10: memcached case study schematic: (a) Prop: stag-gering two t2.large instances. (b) Baseline I using two t2.largeinstances in parallel.

(d ≈800 Mbps), we choose11 as follows

(dmax, dmin) = (900, 700)Mbps⇒ (ωi, τi) = (79, 45.6)sec.

Considering this scheduling plan, instances will be burstingfor 79 sec proceeded by idling for 79 sec which will replenishthe peak rate to πi(79) > dmax since ωi > τi, resulting inmt2.large = 2 based on (8) hence max{mt2.large, nt2.large} =2. We present the schematic of this case study in Fig-ure 10(a). The cost of this case is 2×0.104 which is $0.208/hr.

Baseline I: As our first baseline, we run two t2.large in-stances in parallel and service half of the workload (λ/2 = 32Kops/sec) by each instance. In Baseline I, the CPU to-kens are not replenished since the instances are never idle.Given the arrival and service rates for Prop as λ and µrespectively, according to a simple M/M/1 model (whichdoes not capture common networking overhead), BaselineI has average response time of about twice that of Prop:1/(µ/2−λ/2) = 2 ·1/(µ−λ). Here the cost is obviously thesame as Prop $0.208/hr.

Baseline II: In this case, to serve the workload, we use thecheapest regular instance with the required network band-width and> 1 vCPU, which is m3.xlarge at cost of $0.266/hr(resulting in 22% savings).

We show the experiments result in Figure 11. We seecomparable average latency for Prop and Baseline II. Wealso see worse latency for Baseline I as expected.We alsoobserve that 99th percentile latency is 1 msec for Prop andBaseline II while it is 8 msec for Baseline I.

4.2 Case Study 2: Passive Backup for Spot In-stances

4.2.1 When?Spot instances may be upto 10x cheaper (on average) than

their on-demand counterparts [5]. Since spot instances maybe revoked by the provider 12, their lower cost comes withthe risk of degraded performance - although spot instancerevocation does not affect application correctness (since all

11Note there are infinitely many possible pairs; here wechoose to have dmax − dmin = 100 Mbps to avoid very smallmultiplexing time window.

12This happens when the tenant’s“bid” falls below a dynamicspot price that may exhibit hard-to-predict spikes.

0

0.2

0.4

0.6

0.8

1

0.5 0.7 0.9 1.1 1.3 1.5 1.7 1.9

Em

pri

cal

CD

F

Average Latency (msec)

PropBaseline IBaseline II

Figure 11: Average latency results for Prop with staggeredburstable instances , Baseline I and Baseline II.

cached data is persisted in a back-end), it does result inan increased number of cache misses that are then servicedfrom the slower back-end.

Our idea for improving this trade-off is inspired by theclassic primary-backup fault-tolerance technique [9]. In ourscheme, we keep the “primary” copy of the entire cache onthe requisite number of spot instances leveraging extensiveprior work on bid placement. Our passive “backup” resideson instances that have high availability. To keep the costsof this backup small, we only replicate (small) hot content.When a spot instance (say sold) is about to be revoked (EC2offers a 2 minute warning prior to revocation), we begincopying the hot content it was caching using replicas onthe passive backup into a freshly started instance (say snew)meant to replace sold. Once this copying finishes, mem-cached’s load balancer can redirect requests to snew that hasalready been“warmed up”via this copying. If our copying ofhot content can be accomplished within the 2 minute warn-ing period it would let the primary spot instance sold operateat full speed past the warning thereby further helping keepany performance degradation in check.

4.2.2 How?Clearly, in this scheme, the extent of performance degra-

dation upon a spot revocation crucially depends on the timeit takes for the replacement snew of the revoked spot in-stance sold to become operational. This in turn depends onthe time required to copy hot content from the backup tosnew. Property B described above suggests that burstable in-stances might be ideal candidates for this. Our calculationssuggest that, for every dollar invested, t2 instances, if oper-ating at their peak rates, can offer 1.6-12.5 times higher net-work bandwidth and 1.9-15.3 times higher CPU capacity perunit RAM compared to memory optimized instances. Basedon this, we expect a backup constructed using burstable in-stances to outperform and/or be less expensive than onebased on on-demand instances.

Figure 12 illustrates key ideas including small modifica-tions to mcrouter that our scheme requires. A read (get())request is served from the spot instance holding the primarycopy of the requested key. A write (put()) request is addi-tionally sent to the burstable instance holding the replica to

Passive backup(Burstable)

Load balancer

RequestsSpot pool

Bid

failure

If DestIP ∈ failed spot

Cold cache warm-up

Figure 12: Illustration of our scheme with a passive memcachedbackup.

keep the copies consistent. If the workload is ready-heavy,the burstable instances making up the backup would be richin CPU and network bandwidth tokens (being “idle” most ofthe time) when called upon for a recovery. Even for work-loads containing a non-negligible fraction of writes, the situ-ational awareness of token expenditure enabled by our tech-niques in Section 3 can be used to ensure enough tokens arekept available to assist in speedy recovery.

4.2.3 EvaluationWe offer results from an experiment that validate the ex-

pected performance gains due to a backup based on burstableinstances. We use the YCSB benchmark to generate a read-only workload at an arrival rate of 40 Kops/sec, with the keypopularity following a Zipfian distribution (Zipfian constant0.99). We consider a size of 10GB for the cached contentout of which we deem the most popular 3GB to be hot (i.e.,worthy of backup). We use an m4.2xlarge spot instance asour primary memcached server which has 15GB RAM andenough CPU capacity and network bandwidth to offer anaverage latency of 800 µsec (which we deem acceptable).

We emulate a spot revocation and observe how perfor-mance is affected depending on which of the following in-stance types is used for our passive backup: (i) m3.medium,(ii) c3.large, and (iii) t2.medium. While (i) and (ii) havethe smallest RAM capacity (3.75GB) among all regular on-demand instances, t2.medium has the closest RAM capac-ity (4GB) among all burstable instances. In Figure 13 we

Time (sec)0 50 100 150 200

Lat

ency

(u

s)

0

1000

2000

3000

4000Realtime latency during recovery

t2.medium ($0.052/h)m3.medium ($0.067/h)c3.L ($0.105/h)No backup

Figure 13: Performance during recovery from a spot instancerevocation with different types of backup options.

plot latency during the recovery period (recorded once everysecond) with options (i)-(iii) and also without any backup.

t = 0 corresponds to the beginning of the period when anewly launched instance (replacement for the spot instancethat is to fail) is ready for use. Then we start to warm upthe empty cache of the new instance using the backup. Asexpected, owing to its ability to burst its CPU and networkbandwidth when needed, the t2.medium instance is able tohelp match the performance behavior otherwise obtained us-ing the twice more expensive c3.large instance. Option (ii)which costs more than using the burstable instance offerspoorer performance.

5. RELATED WORKMeasurement-based studies of effective resource capaci-

ties (and other properties such as availability, dynamic spotprices, etc.) constitute an area of significant recent research [8,16, 21, 15, 26, 14, 34, 44, 45, 31, 32, 35, 36, 15, 49, 24]. Thesestudies have been undertaken in a variety of different con-texts, e.g., for (i) private vs. public settings (with differencesin what measurement-related facilities may be assumed), (ii)informing different types of optimization goals (e.g., perfor-mance anomaly detection, capacity planning and SLA selec-tion over coarse timescales, or fine timescale measurementsthat can inform autoscaling or other resource managementdecisions), and (iii) different types of resources within in-stances (network bandwidth or others). We discuss a repre-sentative subset here.

In a private setting, the designer of a measurement studymay have access to significantly more detailed informationthan in a public cloud. For instance, in [49] from Google,a method called CPI2 is suggested that relies on real-timemeasurements of cycles-per-instruction using hardware per-formance counters to diagnose tasks whose resource alloca-tions should be throttled to reduce the resource interferencethey may be causing for other co-located tasks. Since ourmeasurements are from the perspective of a tenant of EC2,a public cloud, we cannot assume access to such informa-tion and must work with information available to a virtualmachine.

A first set of capacity measurements for public cloud of-ferings is made up of studies from the providers themselves(e.g., [6]). Similar studies have also been offered by someother third-party companies, e.g., Cloudlook [12] and Cloud-Harmony [33]. Cloudharmony offers its own score calledthe Cloudharmony Compute Unit (CCU) for CPU capac-ity across different public cloud providers including EC2.A key shortcoming of these studies in our opinion is thatthey tend to lack details of their measurement methodol-ogy making it difficult to reproduce them. Research paperstend to offer more detailed description of their methodol-ogy and have looked both at spatial (across instances of thesame type) and temporal variations in capacity. Some pa-pers use macro-benchmarks or realistic applications to studythe performance dynamism of specific public cloud-hostedworkloads such as high performance computing [15], map-reduce jobs [50], or search of academic articles [43] Oth-ers have studied low-level resource capacity dynamics usingmicro-benchmarks [16, 15, 46, 31].

Many papers have reported/inferred high variations (bothtemporal for a fixed instance and across instances of thesame type) in resource capacities offered by public clouds.E.g., [50, 51] shows that the application performance of map-reduce type jobs can be very different across instances withidentical advertised CPU and RAM capacities; [45] stud-

ies the effect of CPU virtualization on network through-put through measurements; [19, 16] study the variationin network bandwidth for Amazon m1.small instance type;[17] measures network performance (upload/download timesand bandwidth) between instances across different regions ofAmazon EC2; and [11] studies how to use network perfor-mance measurements, including latency and bandwidth, toperform efficient task-mapping for the application.

All of these studies have focused on regular instance types.Generally, they find higher capacity dynamism than seen incomparable in-house environments and higher variance forsmaller/cheaper instance types. From the tenant’s perspec-tive, resource capacity is best modeled as a random process(controlled by the provider). We find regulated resources inburstable instances to behave very differently.

Burstable instances have received relatively scant atten-tion (likely due to their recency). Cloudability blog [30]presents a simple cost comparison of EC2 burstable instancesagainst others but does not take into account complicationsdue to their higher capacity dynamism (whether disclosedor otherwise). In [46], the authors report high variabilityfor t1.micro, the only burstable instance type at that time(EC2’s first generation of burstables) and try to improve ap-plication performance by injecting different delays into theworkload being serviced by that instance in a reactive fash-ion.

Most closely related to our work, [25] presents a formalmodel to capture CPU and disk IO dynamism for 3 EC2 t2instances types. Our work is complementary to it and goesbeyond it significantly in its scope (in its consideration ofnetwork bandwidth regulation, our classification of capacityvariation types, and in our case studies). Finally, recentresearch has explored other ways of using EC2 spot instancesfor reducing the costs of hosting memcached in the publiccloud, e.g., [47]. Our work is novel in its use of burstableinstances to improve the cost/performance of such solutions.

6. CONCLUDING REMARKSUsing measurements from both Amazon EC2 and Google

Compute Engine (GCE), we identified key idiosyncrasies ofresource capacity dynamism for burstable instances that setthem apart from other instance types. The network band-width and CPU capacity these instances were found to beregulated by deterministic, token bucket like mechanisms.We found widely different types of disclosures by providersof the parameters governing these regulation mechanisms:full disclosure (e.g., CPU capacity for EC2 t2 instances),partial disclosure (e.g., CPU capacity and remote disk IObandwidth for GCE shared-core instances), or no disclosure(network bandwidth for EC2 t2 instances). A tenant mod-eling these resource capacities as random phenomena (assome recent work suggests) might make sub-optimal pro-curement and operation decisions. We presented modelingtechniques for a tenant to infer the properties of these regu-lation mechanisms via simple offline measurements. We alsopresented two case studies of how certain memcached work-loads might utilize our modeling for cost-efficacy on EC2based on: (i) temporal multiplexing of multiple burstableinstances to achieve the CPU or network bandwidth (andthereby throughput) equivalent of a more expensive regularEC2 instance, and (ii) augmenting cheap but low availabilityin-memory storage offered by spot instances with backup ofpopular content on burstable instances.

7. REFERENCES[1] B. Adler. Amazon t2 use cases.

http://www.rightscale.com/blog/cloud-cost-analysis/will-aws-t2-replace-30-percent-instances-not-so-fast.

[2] Amazon EC2 EBS Volume.http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html.

[3] Amazon Elastic Block Store (EBS).https://aws.amazon.com/ebs/.

[4] Amazon instance types.https://aws.amazon.com/ec2/instance-types/.

[5] Amazon spot instances.https://aws.amazon.com/ec2/spot/, 2016.

[6] Amazon EC2 FAQs: What is an EC2 Compute Unitand why did you introduce it? https://aws.amazon.com/ec2/faqs/#hardware-information.

[7] Amazon cloudwatch - cloud and network monitoringservices. https://aws.amazon.com/cloudwatch/.

[8] S. K. Barker and P. Shenoy. Empirical evaluation oflatency-sensitive application performance in the cloud.In Proc. ACM Multimedia, 2010.

[9] N. Budhiraja, K. Marzullo, F. B. Schneider, andS. Toueg. The primary-backup approach. InDistributed Systems (2nd Ed.), pages 199–216. ACMPress/Addison-Wesley Publishing Co., 1993.

[10] Amazon burstable instances. http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/t2-instances.html.

[11] E. D. Carreno, M. Diener, E. Cruz, and P. Navaux.Automatic communication optimization of parallelapplications in public clouds. In Proc. IEEE/ACMCCGRID, 2016.

[12] CloudLook. http://www.cloudlook.com.

[13] C. Delimitrou and C. Kozyrakis. Hcloud:Resource-efficient provisioning in shared cloudsystems. In Proc. ACM ASPLOS, 2016.

[14] EC2 variability: The numbers revealed.http://tech.mangot.com/roller/dave/entry/ec2variability the numbers revealed.

[15] Y. El-Khamra, H. Kim, S. Jha, and M. Parashar.Exploring the performance fluctuations of HPCworkloads on clouds. In Proc. IEEE CloudCom, 2010.

[16] B. Farley, A. Juels, V. Varadarajan, T. Ristenpart,K. D. Bowers, and M. M. Swift. More for your money:exploiting performance heterogeneity in public clouds.In Proc. ACM SOCC, 2012.

[17] A. Gandhi and J. Chan. Analyzing the Network forAWS Distributed Cloud Computing. Proc. ACMSIGMETRICS PER, 2015.

[18] GCE Disk Performance. https://cloud.google.com/compute/docs/disks/performance.

[19] M. Hajjat, R. Liu, Y. Chang, T. E. Ng, and S. Rao.Application-specific configuration selection in thecloud: impact of provider policy and potential ofsystematic testing. In Proc. IEEE INFOCOM, 2015.

[20] J. Heinanen, T. Finland, and R. Guerin. A two ratethree color marker. RFC 2698 available atwww.ietf.org, 1999.

[21] A. Iosup, N. Yigitbasi, and D. Epema. On theperformance variability of production cloud services.In Proc. of IEEE/ACM CCGrid, 2011.

[22] iPerf. https://iperf.fr/iperf-download.php.

[23] A. Kopytov. Sysbench manual. MySQL AB, 2012.

[24] P. Leitner and J. Cito. Patterns in the Chaos - AStudy of Performance Variation and Predictability inPublic IaaS Clouds. ACM Transactions on InternetTechnology (TOIT), 2016.

[25] P. Leitner and J. Scheuner. Bursting withPossibilities–An Empirical Study of Credit-BasedBursting Cloud Instance Types. In Proc. IEEE/ACMUCC, 2015.

[26] A. Li, X. Yang, S. Kandula, and M. Zhang.CloudCmp: comparing public cloud providers. InProc. ACM SIGCOMM IMC, 2010.

[27] lookbusy - a synthetic load generator.https://www.devin.com/lookbusy/.

[28] GCE Machine Types. https://cloud.google.com/compute/docs/machine-types.

[29] Memcached. http://memcached.org/.

[30] A. Nhem. Cloudability. https://blog.cloudability.com/how-cost-efficient-is-the-new-burstable-aws-t2-large/,2016.

[31] S. Ostermann, A. Iosup, N. Yigitbasi, R. Prodan,T. Fahringer, and D. Epema. A performance analysisof EC2 cloud computing services for scientificcomputing. Cloud computing, 2009.

[32] Z. Ou, H. Zhuang, J. K. Nurminen, A. Yla-Jaaski, andP. Hui. Exploiting hardware heterogeneity within thesame instance type of Amazon EC2. In Proc. USENIXHotCloud, 2012.

[33] J. Read. Cloudharmony.http://blog.cloudharmony.com/2010/05/what-is-ecu-cpu-benchmarking-in-cloud.html.

[34] M. S. Rehman and M. F. Sakr. Initial findings forprovisioning variation in cloud computing. In Proc.IEEE CloudCom, 2010.

[35] C. Reiss, A. Tumanov, G. R. Ganger, R. H. Katz, andM. A. Kozuch. Heterogeneity and dynamicity ofclouds at scale: Google trace analysis. In Proc. ACMSOCC, 2012.

[36] J. Schad, J. Dittrich, and J. Quiane-Ruiz. Runtimemeasurements in the cloud: observing, analyzing, andreducing variance. Proc. VLDB Endowment, 2010.

[37] GCE Micro History.https://en.scio.pw/Google Compute Engine.

[38] GCE Preemptible Instances. https://cloud.google.com/compute/docs/instances/preemptible.

[39] GCE Shared-Core Instances. https://cloud.google.com/compute/docs/machine-types#sharedcore.

[40] A. Singh, J. Ong, A. Agarwal, G. Anderson,A. Armistead, R. Bannon, S. Boving, G. Desai,B. Felderman, P. Germano, et al. Jupiter rising: Adecade of Clos topologies and centralized control inGoogle’s datacenter network. ACM SIGCOMMComputer Communication Review, 2015.

[41] STREAM, 2016. http://www.cs.virginia.edu/stream/.

[42] EC2 t2 Instances History. https://aws.amazon.com/about-aws/whats-new/2014/07/01/introducing-t2-the-new-low-cost-general-purpose/instance-type-for-amazon-ec2/, 2014.

[43] P. B. Teregowda and C. L. Giles. Scaling seersuite inthe cloud. In Proc. IEEE Int’l Conf. on Cloud Engr,2013.

[44] C. Wang, B. Urgaonkar, A. Gupta, L. Chen, R. Birke,and G. Kesidis. Effective capacity modulation as anexplicit control knob for public cloud profitability. InProc. IEEE ICAC, 2016.

[45] G. Wang and T. E. Ng. The impact of virtualizationon network performance of amazon EC2 data center.In Proc. IEEE INFOCOM, 2010.

[46] J. Wen, L. Lu, G. Casale, and E. Smirni. Less can bemore: Micro-managing VMs in Amazon EC2. In Proc.IEEE CLOUD, 2015.

[47] Z. Xu, C. Stewart, N. Deng, and X. Wang. Blendingon-demand and spot instances to lower costs forin-memory storage. In Proc. IEEE INFOCOM, 2016.

[48] Yahoo Cloud Serving Benchmark (YCSB.https://research.yahoo.com/news/yahoo-cloud-serving-benchmark.

[49] X. Zhang, E. Tune, R. Hagmann, R. Jnagal,V. Gokhale, and J. Wilkes. CPI2: CPU performanceisolation for shared compute clusters. In Proc.Eurosys, 2013.

[50] Z. Zhang, L. Cherkasova, and B. T. Loo. Exploitingcloud heterogeneity for optimized cost/performancemapreduce processing. In Proc. InternationalWorkshop on Cloud Data and Platforms, 2014.

[51] Z. Zhang, L. Cherkasova, and B. T. Loo. Optimizingcost and performance trade-offs for mapreduce jobprocessing in the cloud. In Proc. IEEE NOMS, 2014.

8. APPENDIX

8.1 Disk (EBS volume) IO Regulation

15 30 45 60 75 900

1000

2000

3000

4000(a) Sequential read

Time (min)

IOP

S

t2.lm3.mm3.lc4.lm4.l

15 30 45 60 75 900

1000

2000

3000

4000 (b) Sequential write

Time (min)IO

PS

t2.lm3.mm3.lc4.lm4.l

Figure 14: Disk IO of EBS volumes when running fio with (a)sequential read and (b) sequential write, where IOPS is IO blocksper second with Block size equal to 16KB.

In addition to the traditional disk storage on the serverhosting the instance, EC2 also provides remote EBS (elas-tic block storage) volumes that can be networked to a run-ning instance. According to [3], EC2 enforces a credit-basedtoken-bucket mechanism to limit the disk IOPS of the EBSvolumes of an instance. From Figure 14, we observe us-ing fio that the sequential read13 of EBS volumes of vari-ous instance types exhibit an ideal token-bucket mechanism(though m3.medium has a different peak IOPS), which hasa different profile compared to that of burstable instancesnetwork bandwidth (Figure 4). Also, we do not observe no-ticeable performance degradation for either network or diskIO when we run fio and iperf simultaneously in our limitedexperiments. So we conclude that EC2 uses different token-bucket regulation for EBS disk IO and regular network traf-fic although both likely involve using common networkinginfrastructure.

Also, when running the same fio benchmark on our lab-based physical machine and on EC2 instances, we observehigher variance of fio sequential read IOPS on the physicalmachine (the baseline/sustainable IOPS of EC2 instanceshas little variation). However, after we employ the Linuxcgroups technique to limit the IOPS of the physical ma-chine to be lower than the maximum IOPS, the variancebecomes comparable with that observed on EC2 instances.This leads us to postulate that EC2 applies cgroups-like tech-niques to provide guaranteed disk IO isolation and reduceperformance variation.

8.2 Exploiting Burstables: Tenant with Inter-mittent workload

Intermittent Network Bandwidth Requirement:.In this section we look at the implications of dynamic

peak-rate token-bucket regulation on tenants workload andhow this model can be used in their favor. If the applicationhas an intermittent network bandwidth requirement, it canperiodically idle for s seconds to achieve a higher long-termaverage throughput:

θ(s) :=

∫ Tπ−1(f(s))

π(t)dt

T − π−1(f(s)) + s, (9)

where π and f are plotted for t2.medium in Figure 5(a)

13We run fio with sequential read jobs (accessing contiguousblocks on disk) to exclude the additional random variationfrom disk seek times.

and (b), respectively, and the models are presented in equa-tions (1) and (4). Recall that f(0) = M , f(S) = Π (whereS ≈ 350s for t2.medium), π(0) = Π and π(T ) = M (whereT ≈ 180s from Figure 5(a)). Fitting (3) and (4) for thet2.medium gives

α = 0.25 and β = 0.3.

The resulting mean throughput is plotted as θ in Figure

Idle time (sec)0 100 200 300 400

Mea

n T

hro

ug

hp

ut

(Mb

ps)

200

220

240

260

280

300

Figure 15: (c) Mean throughput for different idle times untiltime T = B/(Π−M).

5(c) which is an increasing function of s. Note that thisresult is different from the classical dual token-bucket, wherethe mean throughput is a constant function of s ≤ B/M ,

i.e., θ = Π Ms/(Π−M)s+Ms/(Π−M)

= M . Here, the mean throughput

is increasing (to a maximum 30% improvement) and, fors > 90 seconds, surprisingly higher than sustainable rateM .

Discussion of Stat Mux Gain from the Public Cloud’sPoint of View.

Based on the dynamic-peak dual-token bucket mechanismintroduced in Section 3.1, we now consider possible gainsthat a cloud provider can achieve by consolidating multipleburstable instances on a single physical machine. Let var(θ)be the variance of the periodic transmission corresponding(extremal) throughput with mean θ.

N0 5 10 15 20

gai

n(M

bp

s)

0

2000

4000

6000

8000

10000

12000stat-mux-gain

Figure 16: Worst-case stat-mux gain based on (10) for the cloudprovider, by consolidating N instances of t2-medium.

On one server, consider the superposition (consolidation)of N such independent processes, each with assumed uni-formly random phase and operating in identical burstable in-

stances, so that their aggregate mean and variance is Nθ andNvar(θ), respectively. If the cloud takes Nθ + 2

√Nvar(θ)

(mean plus two standard deviations) as a “statistical proxy”for the aggregate peak, then the statistical multiplexing gaincan be interpreted as

NΠ− (Nθ + 2√Nvar(θ)) > 0. (10)

This gain is plotted in Figure 16. In future burstable VMofferings, Amazon may make the z (or β) parameter in (2)selectable by the tenant/customer, with more expensive in-stances corresponding to smaller values of z. Finally, notethat increased gain obviously results when burstables of thesame tenant are consolidated on the same server and thattenant operates them in a staggered arrangement (so thatonly one burstable is active at any given time).

using burstable instances in the public cloud: when and · pdf fileusing burstable instances...

Documents