disciplined inconsistency - dada€¦ · tion more latitude for enforcing it.ipa’s library...

Disciplined InconsistencyBrandon Holt, James Bornholt, Irene Zhang, Dan Ports, Mark Oskin, Luis Ceze

{bholt,bornholt,iyzhang,drkp,oskin,luisceze}@cs.uw.eduTechnical Report UW-CSE-16-06-01

AbstractDistributed applications and web services, such as on-line stores or social networks, are expected to be scalable,available, responsive, and fault-tolerant. To meet thesesteep requirements in the face of high round-trip laten-cies, network partitions, server failures, and load spikes,applications use eventually consistent datastores that al-low them to weaken the consistency of some data. How-ever, making this transition is highly error-prone becauserelaxed consistency models are notoriously difficult to un-derstand and test.

In this work, we propose a new programming modelfor distributed data that makes consistency properties ex-plicit and uses a type system to enforce consistency safety.With the Inconsistent, Performance-bound, Approximate(IPA) storage system, programmers specify performancetargets and correctness requirements as constraints on per-sistent data structures and handle uncertainty about the re-sult of datastore reads using new consistency types. Weimplement a prototype of this model in Scala on top of anexisting datastore, Cassandra, and use it to make perfor-mance/correctness tradeoffs in two applications: a ticketsales service and a Twitter clone. Our evaluation showsthat IPA prevents consistency-based programming errorsand adapts consistency automatically in response to chang-ing network conditions, performing comparably to weakconsistency and 2-10× faster than strong consistency.1. IntroductionTo provide good user experiences, modern datacenter ap-plications and web services must balance the competing re-quirements of application correctness and responsiveness.For example, a web store double-charging for purchases orkeeping users waiting too long (each additional millisec-ond of latency [26, 36]) can translate to a loss in trafficand revenue. Worse, programmers must maintain this bal-ance in an unpredictable environment where a black andblue dress [42] or Justin Bieber [38] can change applica-tion performance in the blink of an eye.

Recognizing the trade-off between consistency and per-formance, many existing storage systems support config-urable consistency levels that allow programmers to set

the consistency of individual operations [4, 11, 34, 58].These allow programmers to weaken consistency guaran-tees only for data that is not critical to application correct-ness, retaining strong consistency for vital data. Some sys-tems further allow adaptable consistency levels at runtime,where guarantees are only weakened when necessary tomeet availability or performance requirements (e.g., dur-ing a spike in traffic or datacenter failure) [59, 61]. Un-fortunately, using these systems correctly is challenging.Programmers can inadvertently update strongly consistentdata in the storage system using values read from weaklyconsistent operations, propagating inconsistency and cor-rupting stored data. Over time, this undisciplined use ofdata from weakly consistent operations lowers the consis-tency of the storage system to its weakest level.

In this paper, we propose a more disciplined approachto inconsistency in the Inconsistent, Performance-bound,Approximate (IPA) storage system. IPA introduces the fol-lowing concepts:• Consistency Safety, a new property that ensures that

values from weakly consistent operations cannot flowinto stronger consistency operations without explicitendorsement from the programmer. IPA is the first stor-age system to provide consistency safety.

• Consistency Types, a new type system in which typesafety implies consistency safety. Consistency types de-fine the consistency and correctness of the returnedvalue from every storage operation, allowing program-mers to reason about their use of different consistencylevels. IPA’s type checker enforces the disciplined useof IPA consistency types statically at compile time.

• Error-bounded Consistency. IPA is a data structurestore, like Redis [54] or Riak [11], allowing it to pro-vide a new type of weak consistency that places nu-meric error bounds on the returned values. Withinthese bounds, IPA automatically adapts to return thestrongest IPA consistency type possible under the cur-rent system load.

We implement an IPA prototype based on Scala and Cas-sandra and show that IPA allows the programmer to tradeoff performance and consistency, safe in the knowledgethat the type system has checked the program for consis-

1

Showings

! Grand Theater

Star Wars 7pmRemaining:

STAR WARS 7pm

5

Purchase"Star Wars 9pm

Spectre 6:30pm

// adjust price based on number of tickets left

def computePrice(ticketsRemaining: Int): Float

// called from purchaseTicket & displayEvent

def getTicketCount(event: UUID): Int =

// use weak consistency for performance

readWeak(event+"ticket_count")

def purchaseTicket(event: UUID) = {

val ticket = reserveTicket(event)

val remaining = getTicketCount(event)

// compute price based on inconsistent read

val price = computePrice(remaining)

display("Enter payment info. Price: ", price)

}

STAR WARS 7pm

Enter payment info.

Price: $15

Figure 1. Ticket sales service. To meet a performance tar-get in displayEvent, developer switches to a weak read forgetTicketCount, not realizing that this inconsistent read will beused elsewhere to compute the ticket price.

tency safety. We demonstrate experimentally that thesemechanisms allow applications to dynamically adapt cor-rectness and performance to changing conditions withthree applications: a simple counter, a Twitter clone basedon Retwis [55] and a Ticket sales service modeled afterFusionTicket [1].

2. The Case for Consistency SafetyUnpredictable Internet traffic and unexpected failuresforce modern datacenter applications to trade off consis-tency for performance. In this section, we demonstrate thepitfalls of doing so in an undisciplined way. As an exam-ple, we describe a movie ticketing service, similar to AMCor Fandango. Because ticketing services process financialtransactions, they must ensure correctness, which they cando by storing data in a strongly consistent storage system.Unfortunately, providing strong consistency for every stor-age operation can cause the storage system and applicationto collapse under high load, as several ticketing servicesdid in October 2015, when tickets became available forthe new Star Wars movie [21].

To allow the application to scale more gracefullyand handle traffic spikes, the programmer may chose toweaken the consistency of some operations. As shownin Figure 1, the ticket application displays each showing ofthe movie along with the number of tickets remaining. Forbetter performance, the programmer may want to weakenthe consistency of the read operation that fetches the re-maining ticket count to give users an estimate, instead ofthe most up-to-date value. Under normal load, even withweak consistency, this count would often still be correctbecause propagation is typically fast compared to updates.However, eventual consistency makes no guarantees, so

under heavier traffic spikes, the values could be signifi-cantly incorrect and the application has no way of know-ing by how much.

While this solves the programmer’s performance prob-lem, it introduces a data consistency problem. Supposethat, like Uber’s surge pricing, the ticket sales applicationwants to raise the price of the last 100 tickets for eachshowing to $15. If the application uses a strongly consis-tent read to fetch the remaining ticket count, then it canuse that value to compute the price of the ticket on thelast screen in Figure 1. However, if the programmer reusesgetTicketCount which used a weak read to calculate theprice, then this count could be arbitrarily wrong. The ap-plication could then over- or under-charge some users de-pending on the consistency of the returned value. Worse,the theater expects to make $1500 for those tickets with thenew pricing model, which may not happen with the newweaker read operation. Thus, programmers need to be care-ful in their use of values returned from storage operationswith weak consistency. Simply weakening the consistencyof an operation may lead to unexpected consequences forthe programmer (e.g., the theater not selling as many tick-ets at the higher surge price as expected).

In this work, we propose a programming model that canprevent using inconsistent values where they were not in-tended, as well as introduce mechanisms that allow thestorage system to dynamically adapt consistency withinpredetermined performance and correctness bounds.3. Programming ModelWe propose a programming model for distributed datathat uses types to control the consistency–performancetrade-off. The Inconsistent, Performance-bound, Approx-imate (IPA) type system helps developers trade consis-tency for performance in a disciplined manner. This sec-tion presents the IPA programming model, including theavailable consistency policies and the semantics of oper-ations performed under the policies. §4 will explain howthe type system’s guarantees are enforced.3.1. OverviewThe IPA programming model consists of three parts:• Abstract data types (ADTs) implement common data

structures (such as Set[T]) on distributed storage.• Consistency policies on ADTs specify the desired con-

sistency level for an object in application-specific terms(such as latency or accuracy bounds).

• Consistency types track the consistency of operation re-sults and enforce consistency safety by requiring devel-opers to consider weak outcomes.

Programmmers annotate ADTs with consistency policiesto choose their desired level of consistency. The consis-tency policy on the ADT operation determines the consis-tency type of the result. Table 1 shows some examples; the

2

ADT / Method Consistency(Strong) Consistency(Weak) LatencyBound(_) ErrorTolerance(_)

Counter.read() Consistent[Int] Inconsistent[Int] Rushed[Int] Interval[Int]

Set.size() Consistent[Int] Inconsistent[Int] Rushed[Int] Interval[Int]

Set.contains(x) Consistent[Bool] Inconsistent[Bool] Rushed[Bool] N/AList[T].range(x,y) Consistent[List[T]] Inconsistent[List[T]] Rushed[List[T]] N/AUUIDPool.take() Consistent[UUID] Inconsistent[UUID] Rushed[UUID] N/AUUIDPool.remain() Consistent[Int] Inconsistent[Int] Rushed[Int] Interval[Int]

Table 1. Example ADT operations; consistency policies determine the consistency type of the result.

next few sections will introduce each of the policies andtypes in detail. Together, these three components providetwo key benefits for developers. First, the IPA type systemenforces consistency safety, tracking the consistency levelof each result and preventing inconsistent values fromflowing into consistent values. Second, the programminginterface enables performance–correctness trade-offs, be-cause consistency policies on ADTs allow the runtime toselect a consistency level for each individual operationthat maximizes performance in a constantly changing en-vironment. Together, these systems allow applications toadapt to changing conditions with the assurance that theprogrammer has expressed how it should handle varyingconsistency.3.2. Abstract Data TypesThe base of the IPA type system is a set of abstract datatypes (ADTs) for distributed data structures. ADTs presenta clear abstract model through a set of operations thatquery and update state, allowing users and systems aliketo reason about their logical, algebraic properties ratherthan the low-level operations used to implement them.Though the simplest key-value stores only support prim-itive types like strings for values, many popular datastoreshave built-in support for more complex data structuressuch as sets, lists, and maps. However, the interface tothese datatypes differs: from explicit sets of operations foreach type in Redis, Riak, and Hyperdex [11, 25, 31, 54]to the pseudo-relational model of Cassandra [32]. IPA’sextensible library of ADTs allows it to decouple the se-mantics of the type system from any particular datastore,though our reference implementation is on top of Cassan-dra, similar to [57].

Besides abstracting over storage systems, ADTs arean ideal place from which to reason about consistencyand system-level optimizations. The consistency of a readdepends on the write that produced the value. Annotat-ing ADTs with consistency policies ensures the necessaryguarantees for all operations are enforced, which we willexpand on in the next section.

Custom ADTs can express application-level correctnessconstraints. IPA’s Counter ADT allows reading the cur-rent value as well as increment and decrement operations.In our ticket sales example, we must ensure that the ticket

count does not go below zero. Rather than forcing all oper-ations on the datatype to be linearizable, this application-level invariant can be expressed with a more specializedADT, such as a BoundedCounter, giving the implementa-tion more latitude for enforcing it. IPA’s library is extensi-ble, allowing custom ADTs to build on common features;see §5.3.3. Consistency PoliciesPrevious systems [4, 11, 34, 58, 61] require annotatingeach read and write operation with a desired consistencylevel. This per-operation approach complicates reasoningabout the safety of code using weak consistency, and hin-ders global optimizations that can be applied if the systemknows the consistency level required for future operations.The IPA programming model provides a set of consistencypolicies that can be placed on ADT instances to specifyconsistency properties for the lifetime of the object. Con-sistency policies come in two flavors: static and dynamic.

Static policies are fixed, such as Consistency(Strong)which states that operations must have strongly consistentbehavior. Static annotations provide the same direct con-trol as previous approaches but simplify reasoning aboutcorrectness by applying them globally on the ADT.

Dynamic policies specify a consistency level in terms ofapplication requirements, allowing the system to decide atruntime how to meet the requirement for each executedoperation. IPA offers two dynamic consistency policies:• A latency policy LatencyBound(x) specifies a target

latency for operations on the ADT (e.g., 20 ms). Theruntime can choose the consistency level for each is-sued operation, optimizing for the strongest level thatis likely to satisfy the latency bound.

• An accuracy policy ErrorTolerance(x%) specifies thedesired accuracy for read operations on the ADT. Forexample, the size of a Set ADT may only need tobe accurate within 5% tolerance. The runtime can opti-mize the consistency of write operations so that readsare guaranteed to meet this bound.

Dynamic policies allow the runtime to extract more perfor-mance from an application by relaxing the consistency ofindividual operations, safe in the knowledge that the IPAtype system will enforce safety by requiring the developerto consider the effects of weak operations.

3

Rushed[T]

⊤ := Consistent[T]

⊥ := Inconsistent[T]

LocalQuorum[T]Interval[T] ...

Datastore-specific

consistency levels

Figure 2. IPA Type Lattice parameterized by a type T.Static and dynamic policies can apply to an entire

ADT instance or on individual methods. For example, onecould declare List[Int] with LatencyBound(50 ms),in which case all read operations on the list are subject tothe bound. Alternatively, one may wish to declare a Setwith relaxed consistency for its size but strong consis-tency for its contains predicate. The runtime is responsi-ble for managing the interaction between these policies. Inthe case of a conflict between two bounds, the system canbe conservative and choose stronger policies than speci-fied without affecting correctness.

In the ticket sales application, the Counter for eachevent’s tickets could have a relaxed accuracy policy,ErrorTolerance(5%), allowing the system to quicklyread the count of tickets remaining. An accuracy pol-icy is appropriate here because it expresses a domainrequirement—users want to see accurate ticket counts. Aslong as the system meets this requirement, it is free to re-lax consistency and maximize performance without vio-lating correctness. The List ADT used for events has alatency policy that also expresses a domain requirement—that pages on the website load in reasonable time.3.4. Consistency TypesThe key to consistency safety in IPA is the consistencytypes—enforcing type safety directly enforces consistencysafety. Read operations of ADTs annotated with consis-tency policies return instances of a consistency type. Theseconsistency types track the consistency of the results andenforce a fundamental non-interference property: resultsfrom weakly consistent operations cannot flow into com-putations with stronger consistency without explicit en-dorsement. This could be enforced dynamically, as in dy-namic information flow control systems, but the staticguarantees of a type system allow errors to be caught atcompile time.

The consistency types encapsulate information aboutthe consistency achieved when reading a value. Formally,the consistency types form a lattice parameterized by aprimitive type T, shown in Figure 2. Strong read opera-tions return values of type Consistent[T] (the top ele-ment), and so (by implicit cast) behave as any other in-stance of type T. Intuitively, this equivalence is becausethe results of strong reads are known to be consistent,which corresponds to the control flow in conventional(non-distributed) applications. Weaker read operations re-turn values of some type lower in the lattice (weak consis-

tency types), reflecting their possible inconsistency. Thebottom element Inconsistent[T] specifies an object withthe weakest possible (or unknown) consistency. The otherconsistency types follow a subtyping relation ≺ as illus-trated in Figure 2.

The only possible operation on Inconsistent[T] isto endorse it. Endorsement is an upcast, invoked byConsistent(x), to the top element Consistent[T] fromother types in the lattice:

Γ ⊢ e1 : τ[T ] T ≺ τ[T ]

Γ ⊢ Consistent(e1) : T

The core type system statically enforces safety by prevent-ing weaker values from flowing into stronger computa-tions. Forcing developers to explicitly endorse inconsis-tent values prevents them from accidentally using incon-sistent data where they did not determine it was accept-able, essentially inverting the behavior of current systemswhere inconsistent data is always treated as if it was safe touse anywhere. However, endorsing values blindly in thisway is not the intended use case; the key productivity ben-efit of the IPA type system comes from the other consis-tency types which correspond to the dynamic consistencypolicies in §3.3 which allow developers to handle dynamicvariations in consistency, which we describe next.3.4.1. Rushed typesThe weak consistency type Rushed[T] is the result of readoperations performed on an ADT with consistency policyLatencyBound(x). Rushed[T] is a sum (or union) type,with one variant per consistency level available to the im-plementation of LatencyBound. Each variant is itself aconsistency type (though the variants obviously cannot beRushed[T] itself). The effect is that values returned by alatency-bound object carry with them their actual consis-tency level. A result of type Rushed[T] therefore requiresthe developer to consider the possible consistency levelsof the value.

For example, a system with geo-distributed replicasmay only be able to satisfy a latency bound of 50 ms witha local quorum read (that is, a quorum of replicas withina single datacenter). In this case, Rushed[T] would be thesum of three types Consistent[T], LocalQuorum[T], andInconsistent[T]. A match statement destructures the re-sult of a latency-bound read operation:

set.contains() match {

case Consistent(x) => print(x)

case LocalQuorum(x) => print(x+", locally")

case Inconsistent(x) => print(x+"???")

}

The application may want to react differently to a lo-cal quorum as opposed to a strongly or weakly consis-tent value. Note that because of the subtyping relationon consistency types, omitted cases can be matched by

4

any type lower in the lattice, including the bottom ele-ment Inconsistent(x); other cases therefore need onlybe added if the application should respond differently tothem. This subtyping behavior allows applications to beportable between systems supporting different forms ofconsistency (of which there are many).3.4.2. Interval typesTagging values with a consistency level is useful becauseit helps programmers tell which operation reorderings arepossible (e.g. strongly consistent operations will be ob-served to happen in program order). However, accuracypolicies provide a different way of dealing with inconsis-tency by expressing it in terms of value uncertainty. Theyrequire knowing the abstract behavior of operations in or-der to determine the change in abstract state which resultsfrom each reordered operation (e.g., reordering incrementson a Counter has a known effect on the value of reads).

The weak consistency type Interval[T] is the result ofoperations performed on an ADT with consistency policyErrorTolerance(x%). Interval[T] represents an inter-val of values within which the true (strongly consistent) re-sult lies. The interval reflects uncertainty in the true valuecreated by relaxed consistency, in the same style as workon approximate computing [15].

The key invariant of the Interval type is that the inter-val must include the result of some linearizable execution.Consider a Set with 100 elements. With linearizability, ifwe add a new element and then read the size (or if thisordering is otherwise implied), we must get 101 (providedno other updates are occurring). However, if size is anno-tated with ErrorTolerance(5%), then it could return anyinterval that includes 101, such as [95,105] or [100,107],so the client cannot tell if the recent add was included inthe size. This frees the system to optimize to improve per-formance, such as by delaying synchronization. While anypartially-ordered domain could be represented as an inter-val (e.g., a Set with partial knowledge of its members), inthis work we consider only numeric types.

In the ticket sales example, the counter ADT’s accuracypolicy means that reads of the number of tickets returnan Interval[Int]. If the entire interval is above zero,then users can be assured that there are sufficient ticketsremaining. In fact, because the interval could representmany possible linearizable executions, in the absence ofother user actions, a subsequent purchase must succeed.On the other hand, if the interval overlaps with zero, thenthere is a chance that tickets could already be sold out, sousers could be warned. Note that ensuring that tickets arenot over-sold is a separate concern requiring a differentform of enforcement, which we describe in §5. The re-laxed consistency of the interval type allows the systemto optimize performance in the common case where thereare many tickets available, and dynamically adapt to con-

tention when the ticket count diminishes.4. Enforcing consistency policiesThe consistency policies introduced in the previous sec-tion allow programmers to describe application-levelcorrectness properties. Static consistency policies (e.g.Strong) are enforced by the underlying storage system;the annotated ADT methods simply set the desired con-sistency level when issuing requests to the store. The dy-namic policies each require a new runtime mechanism toenforce them: parallel operations with latency monitor-ing for latency bounds, and reusable reservations for er-ror tolerance. But first, we briefly review consistency inDynamo-style replicated systems.

To be sure of seeing a particular write, strong readsmust coordinate with a majority (quorum) of replicas andcompare their responses. For a write and read pair tobe strongly consistent (in the CAP sense [17]), the repli-cas acknowledging the write (W ) plus the replicas con-tacted for the read (R) must be greater than the total num-ber of replicas (W + R > N). This can be achieved, forexample, by writing to a quorum ((N + 1)/2) and read-ing from a quorum (QUORUM in Cassandra), or writing toN (ALL) and reading from 1 (ONE) [22]. To support theConsistency(Strong) policy, the designer of each ADTmust choose consistency levels for its operations whichtogether enforce strong consistency.4.1. Latency boundsThe time it takes to achieve a particular level of consis-tency depends on current conditions and can vary overlarge time scales (minutes or hours) but can also vary sig-nificantly for individual operations. During normal opera-tion, strong consistency may have acceptable performancewhile at peak traffic times the application would fall over.Latency bounds specified by the application allow the sys-tem to dynamically adjust to maintain comparable perfor-mance under varying conditions.

Our implementation of latency-bound types takes ageneric approach: it issues read requests at different con-sistency levels in parallel. It composes the parallel opera-tions and returns a result either when the strongest oper-ation returns, or with the strongest available result at thespecified time limit. If no responses are available at thetime limit, it waits for the first to return.

This approach makes no assumptions about the imple-mentation of read operations, making it easily adaptable todifferent storage systems. Some designs may permit moreefficient implementations: for example, in a Dynamo-stylestorage system we could send read requests to all replicas,then compute the most consistent result from all responsesreceived within the latency limit. However, this requiresdeeper access to the storage system implementation thanis traditionally available.

5

4.1.1. MonitorsThe main problem with our approach is that it wastes workby issuing parallel requests. Furthermore, if the system isresponding slower due to a sudden surge in traffic, then itis essential that our efforts not cause additional burden onthe system. In these cases, we should back off and only at-tempt weaker consistency. To do this, the system monitorscurrent traffic and predicts the latency of different consis-tency levels.

Each client in the system has its own Monitor (thoughmulti-threaded clients can share one). The monitor recordsthe observed latencies of reads, grouped by operation andconsistency level. The monitor uses an exponentially de-caying reservoir to compute running percentiles weightedtoward recent measurements, ensuring that its predictionscontinually adjust to current conditions.

Whenever a latency-bound operation is issued, itqueries the monitor to determine the strongest consistencylikely to be achieved within the time bound, then issuesone request at that consistency level and a backup at theweakest level, or only weak if none can meet the bound.In §6.2.1 we show empirically that even simple monitorsallow clients to adapt to changing conditions.4.2. Error boundsWe implement error bounds by building on the concepts ofescrow and reservations [27, 44, 48, 50]. These techniqueshave been used in storage systems to enforce hard limits,such as an account balance never going negative, whilepermitting concurrency. The idea is to set aside a pool ofpermissions to perform certain update operations (we’llcall them reservations or tokens), essentially treating op-erations as a manageable resource. If we have a counterthat should never go below zero, there could be a num-ber of decrement tokens equal to the current value of thecounter. When a client wishes to decrement, it must firstacquire sufficient tokens before performing the update op-eration, whereas increments produce new tokens. The in-sight is that the coordination needed to ensure that thereare never too many tokens can be done off the critical path:tokens can be produced lazily if there are enough aroundalready, and most importantly for this work, they can bedistributed among replicas. This means that replicas canperform some update operations safely without coordinat-ing with any other replicas.4.2.1. Reservation ServerReservations require mediating requests to the datastoreto prevent updates from exceeding the available tokens.Furthermore, each server must locally know how many to-kens it has without synchronizing. We are not aware ofa commercial datastore that supports custom mediationof requests and replica-local state, so we need a custommiddleware layer to handle reservation requests, similar

Replica 1

Replica 3

Replica 2

ReservationServer A

6 4

reservations

allocated

Datastore

ReservationServer B

100

A B

incr(1)

incr(1)

Counter with ErrorTolerance(10%)

read()⇒ 100..106

100

used: 2

local: 5

total: 10100

used: 0

local: 4

total: 10

Figure 3. Reservations enforcing error bounds.

to other systems which have built stronger guarantees ontop of existing datastores [8, 10, 57].

Any client requests requiring reservations are routed toone of a number of reservation servers. These servers thenforward operations when permitted along to the underly-ing datastore. All persistent data is kept in the backingstore; these reservation servers keep only transient statetracking available reservations. The number of reservationservers can theoretically be decoupled from the number ofdatastore replicas; our implementation simply colocates areservation server with each datastore server and uses thedatastore’s node discovery mechanisms to route requeststo reservation servers on the same host.4.2.2. Enforcing error boundsReservations have been used previously to enforce hardglobal invariants in the form of upper or lower boundson values [10], integrity constraints [9], or logical as-sertions [37]. However, enforcing error tolerance boundspresents a new design challenge because the bounds areconstantly shifting. Consider a Counter with a 10% errorbound, shown in Figure 3. If the current value is 100, then10 increments can be done before anyone must be toldabout it. However, we have 3 reservation servers, so these10 reservations are distributed among them, allowing eachto do some increments without synchronizing. If only 10outstanding increments are allowed, reads are guaranteedto maintain the 10% error bound.

In order to perform more increments after a server hasexhausted its reservations, it must synchronize with theothers, sharing its latest increments and receiving anychanges of theirs. This is accomplished by doing a strongwrite (ALL) to the datastore followed by a read. Once thatsynchronization has completed, those 3 tokens becomeavailable again because the reservation servers all tem-porarily agree on the value (in this case, at least 102).

Read operations for these types go through reservationservers as well: the server does a weak read from anyreplica, then determines the interval based on how manyreservations there are. For the read in Figure 3, there are10 reservations total, but Server B knows that it has notused its local reservations, so it knows that there cannotbe more than 6 and can return the interval [100,106].

6

4.2.3. Narrowing boundsError-tolerance policies give an upper bound on theamount of error; ideally, the interval returned will be moreprecise than the maximum error when conditions are favor-able. The error bound determines the maximum number ofreservations that can be allocated per instance. To allowa variable number of tokens, each ADT instance keeps acount of tokens allocated by each server, and when serversreceive write requests, they increment their count to allo-cate tokens to use. Allocating must be done with strongconsistency to ensure all servers agree, which can be ex-pensive, so we use long leases (on the order of seconds)to allow servers to cache their allocations. When a leaseis about to expire, it preemptively refreshes its lease in thebackground so that writes do not block.

For each type of update operation there may be a dif-ferent pool of reservations. Similarly, there could be dif-ferent error bounds on different read operations. It is upto the designer of the ADT to ensure that all error boundsare enforced with appropriate reservations. Consider a Setwith an error tolerance on its size operation. This requiresseparate pools for add and remove to prevent the overallsize from deviating by more than the bound in either direc-tion, so the interval is [v−remove.delta,v+add.delta]where v is the size of the set and delta computes thenumber of outstanding operations from the pool. In somesituations, operations may produce and consume tokensin the same pool – e.g., increment producing tokens fordecrement – but this is only allowable if updates propa-gate in a consistent order among replicas, which may notbe the case in some eventually consistent systems.5. ImplementationIPA is implemented mostly as a client-side library to an off-the-shelf distributed storage system, though reservationsare handled by a custom middleware layer which medi-ates accesses to any data with error tolerance policies. Ourimplementation is built on top of Cassandra, but IPA couldwork with any replicated storage system that supports fine-grained consistency control, which many commercial andresearch datastores do, including Riak [11].

IPA’s client-side programming interface is written inScala, using the asynchronous futures-based Phantom [45]library for type-safe access to Cassandra data. Reserva-tion server middleware is also built in Scala using Twit-ter’s Finagle framework [63]. Communication is done be-tween clients and Cassandra via prepared statements, andbetween clients and reservation servers via Thrift remote-procedure-calls [6]. Due to its type safety features, ab-straction capability, and compatibility with Java, Scalahas become popular for web service development, in-cluding widely-used frameworks such as Akka [35] andSpark [5], and at established companies such as Twitterand LinkedIn [2, 18, 29].

trait LatencyBound {

// execute readOp with strongest consistency possible

// within the latency bound

def rush[T](bound: Duration,

readOp: ConsistencyLevel => T): Rushed[T]

}

/* Generic reservaton pool, one per ADT instance.

`max` recomputed as needed (e.g. for % error) */

class ReservationPool(max: () => Int) {

def take(n: Int): Boolean // try to take tokens

def sync(): Unit // sync to regain used tokens

def delta(): Int // # possible ops outstanding

}

/* Counter with ErrorBound (simplified) */

class Counter(key: UUID) with ErrorTolerance {

def error: Float // % tolerance (defined by instance)

def maxDelta() = (cassandra.read(key) * error).toInt

val pool = ReservationPool(maxDelta)

def read(): Interval[Int] = {

val v = cassandra.read(key)

Interval(v - pool.delta, v + pool.delta)

}

def incr(n: Int): Unit =

waitFor(pool.take(n)) { cassandra.incr(key, n) }

}

Figure 4. Some of the reusable components provided by IPA andan example implemention of a Counter with error bounds.

The IPA type system, responsible for consistency safety,is also simply part of our client library, leveraging Scala’ssophisticated type system. The IPA type lattice is imple-mented as a subclass hierarchy of parametric classes, us-ing Scala’s support for higher-kinded types to allow themto be destructured in match statements, and implicit con-versions to allow Consistent[T] to be treated as type T.We use traits to implement ADT annotations; e.g. when theLatencyBound trait is mixed into an ADT, it wraps eachof the methods, redefining them to have the new semanticsand return the correct IPA type.

IPA comes with a library of reference ADT implemen-tations used in our experiments, but it is intended to beextended with custom ADTs to fit more specific use cases.Our implementation provides a number of primitives forbuilding ADTs, some of which are shown in Figure 4. Tosupport latency bounds, there is a generic LatencyBoundtrait that provides facilities for executing a specified readoperation at multiple consistency levels within a time limit.For implementing error bounds, IPA provides a genericreservation pool which ADTs can use. Figure 4 shows howa Counter with error tolerance bounds is implemented us-ing these pools. The library of reference ADTs includes:• Counter based on Cassandra’s counter, supporting in-

crement and decrement, with latency and error bounds• BoundedCounter CRDT from [10] that enforces a hard

lower bound even with weak consistency. Our imple-mentation adds the ability to bound error on the valueof the counter and set latency bounds.

• Set with add, remove, contains and size, supportinglatency bounds, and error bounds on size.

• UUIDPool generates unique identifiers, with a hard limiton the number of IDs that can be taken from it; built on

7

top of BoundedCounter and supports the same bounds.• List: thin abstraction around a Cassandra table with a

time-based clustering order, supports latency bounds.Figure 4 shows Scala code using reservation pools to im-plement a Counter with error bounds. The actual imple-mentation splits this functionality between the client andthe reservation server.6. EvaluationThe goal of the IPA programming model and runtime sys-tem is to build applications that adapt to changing condi-tions, performing nearly as well as weak consistency butwith stronger consistency and safety guarantees. To thatend, we evaluate our prototype implementation under a va-riety of network conditions using both a real-world testbed(Google Compute Engine [28]) and simulated networkconditions. We start with simple microbenchmarks to un-derstand the performance of each of the runtime mecha-nisms independently. We then study two applications inmore depth, exploring qualitatively how the programmingmodel helps avoid potential programming mistakes ineach and then evaluating their performance against strongand weakly consistent implementations.6.1. Simulating adverse conditionsTo control for variability, we perform our experimentswith a number of simulated conditions, and then vali-date our findings against experiments run on globally dis-tributed machines in Google Compute Engine. We use a lo-cal test cluster with nodes linked by standard ethernet andLinux’s Network Emulation facility [62] (tc netem) to in-troduce packet delay and loss at the operating system level.We use Docker containers [24] to enable fine-grained con-trol of the network conditions between processes on thesame physical node.

Table 2 shows the set of conditions we use in ourexperiments to explore the behavior of the system. Theuniform 5ms link simulates a well-provisioned datacen-ter; slow replica models contention or hardware problemsthat cause one replica to be slower than others, and geo-distributed replicates the latencies between virtual ma-chines in the U.S., Europe, and Asia on Amazon EC2 [3].These simulated conditions are validated by experimentson Google Compute Engine with virtual machines in fourdatacenters: the client in us-east, and the storage replicasin us-central, europe-west, and asia-east. We elide the re-sults for Local (same rack in our testbed) except in Fig-ure 11 because the differences between policies are negli-gible, so strong consistency should be the default there.6.2. Microbenchmark: CounterWe start by measuring the performance of a simple applica-tion that randomly increments and reads from a number ofcounters with different IPA policies. Random operations(incr(1) and read) are uniformly distributed over 100

Network Condition Latencies (ms)Simulated Replica 1 Replica 2 Replica 3Uniform / High load 5 5 5Slow replica 10 10 100Geo-distributed (EC2) 1 ± 0.3 80 ± 10 200 ± 50Actual Replica 1 Replica 2 Replica 3Local (same rack) <1 <1 <1Google Compute Engine 30 ± <1 100 ± <1 160 ± <1

Table 2. Network conditions for experiments: latency from clientto each replicas, with standard deviation if high.

100% 100% 0% 0% 100% 100% 0% 0% 100% 83% 0% 0% 100% 0% 0% 0%

(809 ms)

Uniform(5ms)

High load Slow replicaGeo-

distributed

0

10

25

50

75

stro

ng

late

ncy: 5

0ms

late

ncy: 1

0ms

wea

k

stro

ng

late

ncy: 5

0ms

late

ncy: 1

0ms

wea

k

stro

ng

late

ncy: 5

0ms

late

ncy: 1

0ms

wea

k

stro

ng

late

ncy: 5

0ms

late

ncy: 1

0ms

wea

k

Mean late

ncy (

ms)

Figure 5. Counter: latency bounds, mean latency. Beneath eachbar is the % of strong reads. Strong consistency is never possi-ble for the 10ms bound, but 50ms bound achieves mostly strong,only resorting to weak when network latency is high.

counters from a single multithreaded client (allowing upto 4000 concurrent operations).6.2.1. Latency boundsLatency bounds provide predictable performance whilemaximizing consistency. We found that when latenciesand load are low it is often possible to achieve strongconsistency. Figure 5 shows the average operation latencywith strong and weak consistency, as well with both 10msand 50ms latency bounds.

As expected, there is a significant cost to strong con-sistency under all network conditions. IPA cannot achievestrong consistency under 10ms in any case, so the systemmust always default to weak consistency. With a 50msbound, IPA can achieve strong consistency in conditionswhen network latency is low (i.e., the single datacentercase). Cassandra assigns each client to read at a differentreplica for load balancing, so, with one slow replica, IPAwill attempt to achieve strong consistency for all clientsbut not succeed. In our experiments, IPA was able to getstrong consistency 83% of the time. Finally, with our geo-distributed environment, there are no 2 replicas within50ms of our client, so strong consistency is never possiblewithin our bounds; as a result, IPA adapts to only attemptweak in both cases.

8

Uniform(5ms)


distributed

0

5

10

15

20

0

5

10

15

20

0

20

40

60

0

200

400

600

800

stro

ng

erro

r: 1%

erro

r: 5%

erro

r: 10

%wea

k

stro

ng

erro

r: 1%

erro

r: 5%

erro

r: 10

%wea

k

stro

ng

erro

r: 1%

erro

r: 5%

erro

r: 10

%wea

k

stro

ng

erro

r: 1%

erro

r: 5%

erro

r: 10

%wea

k

Ove

rall

me

an

la

ten

cy (

ms)

(a) Mean latency (increment and read).

Local High load Slow replicaGeo-

distributed

0.00

0.25

0.50

0.75

1.00

0

20

40

me

an

% e

rro

rm

ax

% e

rro

r

0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1

fraction increment ops

Bounds strong error: 1% weak

(b) Observed % error for weak and strong, compared with theactual interval widths returned for 1% error tolerance.

Figure 6. Counter benchmark: error tolerance. In (a), we see that wider error bounds reduce mean latency because fewer synchro-nizations are required, matching weak around 5-10%. In (b), we see actual error of weak compared with the actual interval for a 1%error bound with varying fraction of writes; average error is less than 1% but maximum error can be extremely high: up to 60%.

(2162 ms)

Uniform(5ms)


distributed

010

50

100

150

stro

ng

late

ncy: 5

0ms

late

ncy: 1

0ms

wea

k

stro

ng

late

ncy: 5

0ms

late

ncy: 1

0ms

wea

k

stro

ng

late

ncy: 5

0ms

late

ncy: 1

0ms

wea

k

stro

ng

late

ncy: 5

0ms

late

ncy: 1

0ms

wea

k

95th

perc

entile

late

ncy

Figure 7. Counter: 95th percentile latency. Latency bounds keeptail latency down, backing off to weak when necessary.

Figure 7 shows the 95th percentile latencies for thesame workload. The tail latency of the 10ms bound iscomparable to weak consistency, whereas the 50ms boundoverloads the slow server with double the requests, caus-ing it to exceed the latency 5% of the time. There is agap between latency-bound and weak consistency in thegeo-distributed case because the weak condition uses weakreads and writes, while our rushed types, in order to havethe option of getting strong reads without requiring a readof ALL, must do QUORUM writes.

Note that, without consistency types, it would be chal-lenging for programmers to handle the varying consis-tency of returned values in changing network conditions.However, IPA’s type system not only gives programmersthe tools to reason about different consistency levels, it en-forces consistency safety.6.2.2. Error boundsThis experiment measures the cost of enforcing errorbounds using the reservation system described in §4.2,and its precision. Reservations move synchronization offthe critical path: by distributing write permissions amongreplicas, reads can get strong guarantees from a single

replica. Note that reservations impact write performance,so we must consider both in our experiments.

Figure 6a shows latencies for error bounds of 1%, 5%,and 10%, plotting the average of read and increment oper-ations. As expected, tighter error bounds increase latencybecause it forces more frequent synchronization betweenreplicas. The 1% error bound provides most of the bene-fit, except in the slow replica and geo-distributed environ-ments where it forces synchronization frequently enoughthat the added latency slows down the system. 5-10% errorbounds provide latency comparable to weak consistency.In the geo-distributed case, the coordination required forreservations makes even the 10% error bound 4× slowerthan weak consistency, but this is still 28× faster thanstrong consistency.

While we have verified that error-bounded reads remainwithin our defined bounds, we also wish to know whaterror occurs in practice. We modified our benchmark toobserve the actual error from weak consistency by incre-menting counters a predetermined amount and reading thevalue; results are shown in Figure 6b. We plot the percenterror of weak and strong against the actual observed inter-val width for a 1% error bound, going from a read-heavy(1% increments) to write-heavy (all increments, except tocheck the value).

First, we find that the mean interval is less than the 1%error bound because, for counters that are less popular, IPAis able to return a more precise interval. At low write rate,this interval becomes even smaller, down to .5% in the geo-distributed experiment. Next, we find that the mean errorwith weak consistency is also much less than 1%; how-ever the maximum error that we observed is up to 60%of the actual value. This result motivates the need for er-ror bounded consistency to ensure that applications do notsee drastically incorrect values from weakly consistent op-erations. Further, using the Interval type, IPA is able to

9

// creates a table of pools, so each event gets its own

// 5% error tolerance on `remaining` method, weak otherwise

val tickets = UUIDPool() with Consistency(Weak)

with Remaining(ErrorTolerance(0.05))

// called from displayEvent (& purchaseTicket)

def getTicketCount(event: UUID): Interval[Int] =

tickets(event).remaining()

def purchaseTicket(event: UUID) = {

// UUIDPool is safe even with weak consistency (CRDT)

endorse(tickets(event).take()) match {

case Some(ticket) =>

// imprecise count returned due to error tolerance

val remaining = getTicketCount(event)

// use maximum count possible to be fair

val price = computePrice(remaining.max)

display("Ticket reserved. Price: $" + price)

prompt_for_payment_info(price)

case None =>

display("Sorry, all sold out.")

}

}

Figure 8. Ticket service code demonstrating consistency types.

give the application an estimate of the variance in the weakread, which is often more precise than the upper bound setby the error tolerance policy.6.3. ApplicationsNext, we explore the implementation of two applicationsin IPA and compare their performance against Cassandrausing purely strong or weak consistency on our simulatednetwork testbed and Google Compute Engine.6.3.1. Ticket serviceOur Ticket sales web service, introduced in §2, is modeledafter FusionTicket [1], which has been used as a bench-mark in recent distributed systems research [65, 66]. Wesupport the following actions:• browse: List events by venue• viewEvent: View the full description of an event in-

cluding number of remaining tickets• purchase: Purchase a ticket (or multiple)• addEvent: Add an event at a venue.

Figure 8 shows a snippet of code from the IPA implemen-tation which can be compared with the non-IPA versionfrom Figure 1. Tickets are modeled using the UUIDPooltype, which generates unique identifiers to reserve ticketsfor purchase. The ADT ensures that, even with weak con-sistency, it never gives out more than the maximum num-ber of tickets, so it is safe to endorse the result of the takeoperation as long as one is okay with the possibility of afalse negative. Rather than just using a weak read as in theoriginal example, in IPA we can bound the inconsistencyof the remaining ticket count using an error tolerance an-notation on the tickets pool. Now to compute the price ofthe reserved ticket, we call getTicketCount and get anInterval, forcing us to decide how to handle the rangeof possible ticket counts. We decide to use the max valuefrom the interval to be fair to users; the 5% error boundensures that we don’t sacrifice too much profit this way.

Uniform(5ms)


distributedGCE

(actual)

10

20

10

20

50

100

10

20

50

100

10

20

50

100

500

1000

10

20

50

100

500

1000

stro

ng

erro

r: 5%

late

ncy: 2

0ms

wea

k

stro

ng

erro

r: 5%

late

ncy: 2

0ms

wea

k

stro

ng

erro

r: 5%

late

ncy: 2

0ms

wea

k

stro

ng

erro

r: 5%

late

ncy: 2

0ms

wea

k

stro

ng

erro

r: 5%

late

ncy: 2

0ms

wea

k

Me

an

la

ten

cy (

ms)

Figure 9. Ticket service: mean latency, log scale. Strong consis-tency is far too expensive (>10× slower) except when load andlatencies are low, but 5% error tolerance allows latency to becomparable to weak consistency. The 20ms latency-bound vari-ant is either slower or defaults to weak, providing little benefit.Note: the ticket Pool is safe even when weakly consistent.

0k

1k

2k

3k

0 1000 2000 3000 4000

# concurrent clients

Th

rou

gh

pu

t (a

ctio

ns/s

)

Bounds

strong

error: 5%

latency: 20ms

weak

Figure 10. Ticket service: throughput on Google Compute En-gine globally-distributed testbed. Note that this counts actionssuch as tweet, which can consist of multiple storage operations.Because error tolerance does mostly weak reads and writes, itsperformance tracks weak. Latency bounds reduce throughput dueto issuing the same operation in parallel.

To evaluate the performance, we run a workload mod-elling a typical small-scale deployment: 50 venues and200 events, with an average of 2000 tickets each (gaussiandistribution centered at 2000, stddev 500); this ticket-to-event ratio ensures that some events run out tickets. Be-cause real-world workloads exhibit power law distribu-tions [20], we use a moderately skewed Zipf distributionwith coefficient of 0.6 to select events.

Figure 9 shows the average latency of a workload con-sisting of 70% viewEvent, 19% browse, 10% purchase,and 1% addEvent. We plot with a log scale because strongconsistency has over 5× higher latency. The purchaseevent, though only 10% of the workload, drives most ofthe latency increase because of the work required to pre-vent over-selling tickets. We explore two different IPA im-plementations: one with a 20ms latency bound on all ADTsaiming to ensure that both viewEvent and browse com-plete quickly, and one where the ticket pool size (“ticketsremaining”) has a 5% error bound. We see that both per-form with nearly the same latency as weak consistency.With the low-latency condition (uniform and high load),20ms bound does 92% strong reads, 4% for slow replica,

10

class User(id: UserID, name: String,

followers: Set[UserID] with LatencyBound(20 ms),

timeline: List[TweetID] with LatencyBound(20 ms))

class Tweet(id: TweetID, user: UserID, text: String,

retweets: Set[UserID] with Size(ErrorTolerance(5%)))

def viewTimeline(user: User) = {

// `range` returns `Rushed[List[TweetID]]`

user.timeline.range(0,10) match { // use match to unpack

case Consistent(tweets) =>

for (tweetID <- tweets)

displayTweet(tweetID)

case Inconsistent(tweets) =>

// tweets may not have fully propagated yet

for (tweetID <- tweets)

// guard load and retry if there's an error

Try { displayTweet(tweetID) } retryOnError

}

}

def displayTweet(id: TweetID, user: User) = {

val rct: Interval[Int] = tweets(id).retweets.size()

if (rct > 1000) // abbreviate large counts (e.g. "2k")

display("${rct.min/1000}k retweets")

else if (rct.min == rct.max) // count is precise!

display("Exactly ${rct.min} retweets")

//...

// here, `contains` returns `Consistent[Boolean]`

// so it is automatically coerced to a Boolean

if (tweets(id).retweets.contains(user))

disable_retweet_button()

}

Figure 11. Twitter data model with policy annotations,Rushed[T] helps catch referential integrity violations andInterval[T] represents approximate retweet counts.and all weak on both geo-distributed conditions.

Figure 9 also shows results on Google Compute Engine(GCE). We see that the results of real geo-replication vali-date the findings of our simulated geo-distribution results.

On this workload, we observe that the 5% error boundperforms well even under adverse conditions, which dif-fers from our findings in the microbenchmark. This is be-cause ticket UUIDPools begin full, with many tokens avail-able, requiring less synchronization until they are closeto running out. Contrast this with the microbenchmark,where counters started at small numbers (average of 500),where a 5% error tolerance means fewer tokens.6.3.2. Twitter cloneOur second application is a Twitter-like service based onthe Redis data modeling example, Retwis [55]. The datamodel is simple: each user has a Set of followers, and aList of tweets in their timeline. When a user tweets, thetweet ID is eagerly inserted into all of their followers’ time-lines. Retweets are tracked with a Set of users who haveretweeted each tweet.

Figure 11 shows the data model with policy annota-tions: latency bounds on followers and timelines and anerror bound on the retweets. This ensures that when tweetsare displayed, the retweet count is not grossly inaccurate.As shown in displayTweet, highly popular tweets withmany retweets can tolerate approximate counts – they ac-tually abbreviate the retweet count (e.g. “2.4M”) – but

LocalUniform

(5ms)High load

Slowreplica

Geo-distributed

GCE(actual)

0

20

40

60

0

25

50

75

100

0

100

200

300

400

0

250

500

750

0

1000

2000

3000

4000

0

2000

4000

6000

stro

ng IPA

wea

k

stro

ng IPA

wea

k

stro

ng IPA

wea

k

stro

ng IPA

wea

k

stro

ng IPA

wea

k

stro

ng IPA

wea

k

Mean late

ncy (

ms)

Figure 12. Twitter clone: mean latency (all actions). The IPAversion performance comparably with weak consistency in allbut one case, while strong consistency is 2-10× slower.

average tweets, with less than 20 retweets, will get anexact count. This is important because for regular peo-ple, they will notice if a friend’s retweet is not reflectedin the count, whereas Ellen Degeneres’s record-breakingcelebrity selfie, which nearly brought down Twitter in2014 [7], can scale because a 5% error tolerance on 3.4million retweets provides significant slack.

The code for viewTimeline in Figure 11 demonstrateshow latency-bound Rushed[T] types can be destructuredwith a match statement. In this case, the timeline (list oftweet IDs) is retrieved with a latency bound. Tweet con-tent is added to the store before tweet IDs are pushed ontotimelines, so with strong consistency we know that thelist of IDs will all be able to load valid tweets. However,if the latency-bound type returns with weak consistency(Inconsistent case), then this referential integrity prop-erty may not hold. In that case, we must guard the call todisplayTweet and retry if any of the operations fails (e.g.,if the retweet set wasn’t created yet).

We simulate a realistic workload by generating a syn-thetic power-law graph, using a Zipf distribution to deter-mine the number of followers per user. Our workload is arandom mix with 50% timeline reads, 14% tweet, 30%retweet, 5% follow, and 1% newUser.

We see in Figure 12 that for all but the local (same rack)case, strong consistency is over 3× slower. Our imple-mentation, combining latency and error-bounds, performscomparably with weak consistency but with stronger guar-antees for the programmer. Our simulated geo-distributedcondition turns out to be the worst scenario for IPA’s Twit-ter, with latency over 2× slower than weak consistency.This is because weak consistency performed noticeablybetter on our simulated network, which had one very close(1ms latency) replica that it used almost exclusively.7. Related WorkConsistency models. IPA’s consistency types could be ex-tended to more of the thriving ecosystem of consistencymodels from sequential consistency [33] and linearizabil-ity [30], to eventual consistency [64]. A variety of interme-

11

diate models fit elsewhere in the spectrum, each makingdifferent trade-offs balancing performance against easeof programming. Session guarantees, including read-your-writes, strengthen ordering for individual clients but re-duce availability [60]. Many datastores allow fine-grainedconsistency control: Cassandra [4] per operation, Riak[11] on an object granularity, and others [34, 58]. TheConit consistency model [67] breaks down the consistencyspectrum into numerical error, order error, and staleness,but requires annotating each operation and explicit depen-dency tracking, rather than annotating ADTs.

Higher-level consistency requirements. Rather thanenforcing consistency safety, some programming modelsallow users to express application correctness criteria di-rectly. In Quelea [57], programmers write contracts to de-scribe visibility and ordering constraints, independent ofany particular consistency hierarchy, then the system se-lects the consistency level necessary for each operation.In Indigo [9], programmers write invariants over abstractstate and annotate post-conditions on actions in termsof the abstract state. The system analyzes annotated pro-grams and adds coordination logic to prevent invariant vio-lations, using a reservation system to enforce numeric con-straints. Neither Indigo nor Quelea, however, allow pro-grammers to specify approximations or error tolerances,nor do they enforce any kind of performance bounds.

IPA’s latency-bound policies were inspired by theconsistency-based SLAs of Pileus [61]. Consistency SLAsspecify a target latency and consistency level (e.g. 100ms with read-my-writes), associated with a utility. Eachoperation specifies a set of SLAs, and the system pre-dicts which is most likely to be met, attempting to max-imize utility, and returns both the value and the achievedconsistency level. Other systems, including PRACTI [12],PADS [13], and WheelFS [59], have given developersways of expressing their desired performance and correct-ness requirements through semantic cues and policies.

A long history of systems have been built aroundthe principle that applications may be willing to toler-ate slightly stale data in exchange for improved perfor-mance, including databases [14, 46, 49, 51] and distributedcaches [43, 47]. These systems generally require develop-ers to explicitly specify staleness bounds on each trans-action in terms of absolute time (although Bernstein etal.’s model can generate these from error bounds when avalue’s maximum rate of change is known).

The above techniques are relevant but largely orthog-onal to our work: they provide techniques which couldbe used in an IPA datastore to trade off correctness innew ways. This work builds on those insights, introduc-ing a new error tolerance mechanism, proposing ADT an-notations rather than per-operation, but most importantly,providing consistency safety via consistency types, whichensure that all possible cases are handled whenever the

system adjusts consistency to meet performance targets.Previous systems gave some feedback to programs aboutachieved consistency, but did not provide facilities to en-sure and help developers use the information correctly.

Types for approximation. IPA’s type system is in-spired by work on approximate computing, in which com-putations can be selectively made inaccurate to improveenergy efficiency and performance. EnerJ [16, 53] andRely [19, 39] track the flow of approximate values toprevent them from interfering with precise computation.IPA’s interval types are similar to Uncertain<T>’s proba-bility distributions [15] and to interval analysis [40]. Onekey difference for IPA is that inconsistent values can bestrengthened if desired with additional synchronization.

Types for distributed and secure systems. Conver-gent (or conflict-free) replicated data types (CRDTs) [56]are data types designed for eventual consistency. Simi-lar to how IPA types express weakened semantics whichallow for implementation on weak consistency, CRDTsguarantee that they will converge on eventual consistencyby forcing all update operations to commute. CRDTs canbe useful because they allow concurrent updates withmeaningful semantics, but they are still only eventually(or causally) consistent, so users must still deal with tem-porary divergence and out-of-date reads, and they do notincorporate performance bounds or variable accuracy. Par-ticularly relevant to IPA, the Bounded Counter CRDT [10]enforces hard limits on the global value of a counter in away similar to reservations but less general; this designinformed our own reservations system for error bounds.Information flow tracking systems [23, 41, 52], also usestatic type checking and dynamic analysis to enforce non-interference between sensitive data and untrusted chan-nels, but, to the best of our knowledge, those techniqueshave not been applied to enforce consistency safety by sep-arating weakly and strongly consistent data.

8. ConclusionThe IPA programming model provides programmers withdisciplined ways to trade consistency for performancein distributed applications. By specifying application-specific performance and accuracy targets in the form of la-tency and error tolerance bounds, they tell the system howto adapt when conditions change and provide it with oppor-tunities for optimization. Meanwhile, consistency typesensure consistency safety, ensuring that all potential weakoutcomes are handled, and allowing applications to makechoices based on the accuracy of the values the systemreturns. The policies, types and enforcement systems im-plemented in this work are only a sampling of the fullrange of possibilities within the framework of Inconsistent,Performance-bound, and Approximate types.

12

References[1] Fusion ticket. http://fusionticket.org.[2] Scala in the enterprise. http://www.scala-lang.

org/old/node/1658, March 2009.[3] Amazon Web Services, Inc. Elastic compute cloud

(ec2) cloud server & hosting – aws. https://aws.amazon.com/ec2/, 2016 .

[4] Apache Software Foundation. Cassandra. http://cassandra.apache.org/, 2015.

[5] Apache Software Foundation. Apache spark -lightning-fast cluster computing. http://spark.apache.org/, 2016a.

[6] Apache Software Foundation. Apache thrift.https://thrift.apache.org/, 2016b.

[7] Lisa Baertlein. Ellen’s Oscar ’selfie’ crashesTwitter, breaks record. http://www.reuters.com/article/2014/03/03/us-oscars-selfie-idUSBREA220C320140303, March 2014.

[8] Peter Bailis, Ali Ghodsi, Joseph M. Hellerstein,and Ion Stoica. Bolt-on causal consistency.In Proceedings of the 2013 ACM SIGMOD In-ternational Conference on Management of Data,SIGMOD ’13, pages 761–772, New York, NY,USA, 2013. ACM. ISBN 978-1-4503-2037-5.doi:10.1145/2463676.2465279.

[9] Valter Balegas, Sérgio Duarte, Carla Ferreira, Ro-drigo Rodrigues, Nuno Preguiça, Mahsa Najafzadeh,and Marc Shapiro. Putting consistency backinto eventual consistency. In Proceedings ofthe Tenth European Conference on Computer Sys-tems, EuroSys, pages 6:1–6:16, New York, NY,USA, 2015a. ACM. ISBN 978-1-4503-3238-5.doi:10.1145/2741948.2741972.

[10] Valter Balegas, Diogo Serra, Sergio Duarte, CarlaFerreira, Marc Shapiro, Rodrigo Rodrigues, andNuno Preguiça. Extending eventually consistentcloud databases for enforcing numeric invariants.34th International Symposium on Reliable Dis-tributed Systems (SRDS 2015), September 2015b.

[11] Basho Technologies, Inc. Riak. http://docs.basho.com/riak/latest/, 2015.

[12] Nalini Belaramani, Mike Dahlin, Lei Gao, AmolNayate, Arun Venkataramani, Praveen Yalagandula,and Jiandan Zheng. Practi replication. In Proceed-ings of the 3rd Conference on Networked Systems De-sign & Implementation - Volume 3, NSDI’06, pages

5–5, Berkeley, CA, USA, 2006. USENIX Associ-ation. URL http://dl.acm.org/citation.cfm?id=1267680.1267685.

[13] Nalini Belaramani, Jiandan Zheng, Amol Nayate,Robert Soulé, Mike Dahlin, and Robert Grimm.Pads: A policy architecture for distributed storagesystems. In Proceedings of the 6th USENIX Sympo-sium on Networked Systems Design and Implemen-tation, NSDI’09, pages 59–73, Berkeley, CA, USA,2009. USENIX Association. URL http://dl.acm.org/citation.cfm?id=1558977.1558982.

[14] Philip A. Bernstein, Alan Fekete, Hongfei Guo,Raghu Ramakrishnan, and Pradeep Tamma. Relaxedcurrency serializability for middle-tier caching andreplication. In Proceedings of the 2006 ACM SIG-MOD International Conference on Management ofData, Chicago, IL, USA, June 2006. ACM.

[15] James Bornholt, Todd Mytkowicz, and Kathryn S.McKinley. Uncertain<T>: A First-Order Typefor Uncertain Data. In Proceedings of the19th International Conference on ArchitecturalSupport for Programming Languages and Oper-ating Systems - ASPLOS 14, ASPLOS. Associ-ation for Computing Machinery (ACM), 2014.doi:10.1145/2541940.2541958.

[16] Brett Boston, Adrian Sampson, Dan Grossman, andLuis Ceze. Probability type inference for flexibleapproximate programming. In Proceedings of the2015 ACM SIGPLAN International Conference onObject-Oriented Programming, Systems, Languages,and Applications, OOPSLA 2015, pages 470–487,2015. doi:10.1145/2814270.2814301.

[17] Eric A. Brewer. Towards robust distributed sys-tems. In Keynote at PODC (ACM Symposiumon Principles of Distributed Computing). Asso-ciation for Computing Machinery (ACM), 2000.doi:10.1145/343477.343502.

[18] Travis Brown. Scala at scale at Twitter (talk).http://conferences.oreilly.com/oscon/open-source-2015/public/schedule/detail/42332,July 2015.

[19] Michael Carbin, Sasa Misailovic, and Martin C. Ri-nard. Verifying quantitative reliability for programsthat execute on unreliable hardware. In Proceedingsof the 2013 ACM SIGPLAN International Confer-ence on Object Oriented Programming Systems Lan-guages & Applications, OOPSLA 2013, pages 33–52,2013. doi:10.1145/2509136.2509546.

13

http://fusionticket.org

http://www.scala-lang.org/old/node/1658

http://www.scala-lang.org/old/node/1658

https://aws.amazon.com/ec2/

https://aws.amazon.com/ec2/

http://cassandra.apache.org/

http://cassandra.apache.org/

http://spark.apache.org/

http://spark.apache.org/

https://thrift.apache.org/

http://www.reuters.com/article/2014/03/03/us-oscars-selfie-idUSBREA220C320140303



http://dx.doi.org/10.1145/2463676.2465279

http://dx.doi.org/10.1145/2741948.2741972

http://docs.basho.com/riak/latest/

http://docs.basho.com/riak/latest/

http://dl.acm.org/citation.cfm?id=1267680.1267685




http://dx.doi.org/10.1145/2541940.2541958

http://dx.doi.org/10.1145/2814270.2814301

http://dx.doi.org/10.1145/343477.343502

http://conferences.oreilly.com/oscon/open-source-2015/public/schedule/detail/42332

http://conferences.oreilly.com/oscon/open-source-2015/public/schedule/detail/42332

http://dx.doi.org/10.1145/2509136.2509546

[20] Brian F. Cooper, Adam Silberstein, Erwin Tam,Raghu Ramakrishnan, and Russell Sears. Bench-marking cloud serving systems with YCSB.In Proceedings of the 1st ACM symposiumon Cloud computing - SoCC 10. Associa-tion for Computing Machinery (ACM), 2010.doi:10.1145/1807128.1807152.

[21] Hayley C. Cuccinello. ’star wars’ presales crashticketing sites, set record for fandango. http://www.forbes.com/sites/hayleycuccinello/2015/10/20/star-wars-presales-crash-ticketing-sites-sets-record-for-fandango/, October2015.

[22] Datastax, Inc. How are consistent read and writeoperations handled? http://docs.datastax.com/en/cassandra/3.x/cassandra/dml/dmlAboutDataConsistency.html, 2016.

[23] Dorothy E. Denning and Peter J. Denning. Certifica-tion of programs for secure information flow. Com-munications of the ACM, 20 (7): 504–513, July 1977.

[24] Docker, Inc. Docker. https://www.docker.com/,2016.

[25] Robert Escriva, Bernard Wong, and Emin GünSirer. HyperDex. In Proceedings of theACM SIGCOMM Conference. Association forComputing Machinery (ACM), August 2012.doi:10.1145/2342356.2342360.

[26] Brady Forrest. Bing and google agree: Slow pageslose users. Radar, June 2009. http://radar.oreilly.com/2009/06/bing-and-google-agree-slow-pag.html.

[27] Dieter Gawlick and David Kinkade. Varieties ofConcurrency Control in IMS/VS Fast Path. IEEEDatabase Engineering Bulletin, 8 (2): 3–10, 1985.

[28] Google, Inc. Compute engine — google cloudplatform. https://cloud.google.com/compute/,2016.

[29] Susan Hall. Employers can’t find enough scalatalent. http://insights.dice.com/2014/04/04/employers-cant-find-enough-scala-talent/,March 2014.

[30] Maurice P. Herlihy and Jeannette M. Wing. Lin-earizability: a correctness condition for concurrentobjects. ACM Transactions on Programming Lan-guages and Systems, 12 (3): 463–492, July 1990.doi:10.1145/78969.78972.

[31] Hyperdex. Hyperdex. http://hyperdex.org/,2015.

[32] Avinash Lakshman and Prashant Malik. Cassandra:A decentralized structured storage system. SIGOPSOper. Syst. Rev., 44 (2): 35–40, April 2010. ISSN0163-5980. doi:10.1145/1773912.1773922.

[33] Lamport. How to make a multiprocessor computerthat correctly executes multiprocess programs. IEEETransactions on Computers, C-28 (9): 690–691,September 1979. doi:10.1109/tc.1979.1675439.

[34] Cheng Li, Daniel Porto, Allen Clement, JohannesGehrke, Nuno Preguiça, and Rodrigo Rodrigues.Making geo-replicated systems fast as possible,consistent when necessary. In Presented as part ofthe 10th USENIX Symposium on Operating SystemsDesign and Implementation (OSDI 12), pages265–278, Hollywood, CA, 2012. USENIX. ISBN978-1-931971-96-6. URL https://www.usenix.org/conference/osdi12/technical-sessions/presentation/li.

[35] Lightbend Inc. Akka. http://akka.io/, 2016.

[36] Greg Linden. Make data useful. Talk, November2006. http://glinden.blogspot.com/2006/12/slides-from-my-talk-at-stanford.html.

[37] Jed Liu, Tom Magrino, Owen Arden, Michael D.George, and Andrew C. Myers. Warranties forfaster strong consistency. In USENIX Symposiumon Networked Systems Design and Implementation(NSDI’14), pages 503–517, Seattle, WA, April 2014.USENIX Association. ISBN 978-1-931971-09-6.URL https://www.usenix.org/conference/nsdi14/technical-sessions/presentation/liu_jed.

[38] Cade Metz. How Instagram Solved Its Justin BieberProblem, November 2015. URL http://www.wired.com/2015/11/how-instagram-solved-its-justin-bieber-problem/.

[39] Sasa Misailovic, Michael Carbin, Sara Achour,Zichao Qi, and Martin C. Rinard. Chisel: reliability-and accuracy-aware optimization of approximatecomputational kernels. In Proceedings of the2014 ACM International Conference on ObjectOriented Programming Systems Languages & Ap-plications, OOPSLA 2014, pages 309–328, 2014.doi:10.1145/2660193.2660231.

[40] Ramon E. Moore. Interval analysis. Prentice-Hall,1966.

14

http://dx.doi.org/10.1145/1807128.1807152

http://www.forbes.com/sites/hayleycuccinello/2015/10/20/star-wars-presales-crash-ticketing-sites-sets-record-for-fandango/




http://docs.datastax.com/en/cassandra/3.x/cassandra/dml/dmlAboutDataConsistency.html



https://www.docker.com/

http://dx.doi.org/10.1145/2342356.2342360

http://radar.oreilly.com/2009/06/bing-and-google-agree-slow-pag.html



https://cloud.google.com/compute/

http://insights.dice.com/2014/04/04/employers-cant-find-enough-scala-talent/

http://insights.dice.com/2014/04/04/employers-cant-find-enough-scala-talent/

http://dx.doi.org/10.1145/78969.78972

http://hyperdex.org/

http://dx.doi.org/10.1145/1773912.1773922

http://dx.doi.org/10.1109/tc.1979.1675439

https://www.usenix.org/conference/osdi12/technical-sessions/presentation/li



http://akka.io/

http://glinden.blogspot.com/2006/12/slides-from-my-talk-at-stanford.html

http://glinden.blogspot.com/2006/12/slides-from-my-talk-at-stanford.html

https://www.usenix.org/conference/nsdi14/technical-sessions/presentation/liu_jed



http://www.wired.com/2015/11/how-instagram-solved-its-justin-bieber-problem/



http://dx.doi.org/10.1145/2660193.2660231

[41] Andrew C. Myers. Jflow: Practical mostly-static in-formation flow control. In Proceedings of the 26thACM Symposium on Principles of Programing Lan-guages (POPL ’99), San Antonio, TX, USA, January1999. ACM.

[42] Dao Nguyen. What it’s like to work on buz-zfeed’s tech team during record traffic. http://www.buzzfeed.com/daozers/what-its-like-to-work-on-buzzfeeds-tech-team-during-record-t, February 2015.

[43] Chris Olston, Boon Thau Loo, and Jennifer Widom.Adaptive precision setting for cached approximatevalues. In Proceedings of the 1999 ACM SIGMODInternational Conference on Management of Data,Santa Barbara, CA, USA, May 1999. ACM.

[44] Patrick E. O’Neil. The escrow transactional method.ACM Transactions on Database Systems, 11 (4):405–430, December 1986. doi:10.1145/7239.7265.

[45] outworkers ltd. Phantom by outworkers. http://outworkers.github.io/phantom/, March 2016.

[46] Christian Plattner and Gustavo Alonso. Ganymed:Scalable replication for transactional web applica-tions. In Proceedings of the International Middle-ware Conference, Toronto, Ontario, Canada, October2004.

[47] Dan R. K. Ports, Austin T. Clements, Irene Zhang,Samuel Madden, and Barbara Liskov. Transactionalconsistency and automatic management in an appli-cation data cache. In Proceedings of the 9th USENIXSymposium on Operating Systems Design and Imple-mentation (OSDI ’10), Vancouver, BC, Canada, Oc-tober 2010. USENIX.

[48] Nuno Preguiça, J. Legatheaux Martins, MiguelCunha, and Henrique Domingos. Reservations forconflict avoidance in a mobile database system. InProceedings of the 1st international conference onMobile systems, applications and services - MobiSys03, MobiSys. Association for Computing Machinery(ACM), 2003. doi:10.1145/1066116.1189038.

[49] Calton Pu and Avraham Leff. Replica control in dis-tributed systems: An asynchronous approach. In Pro-ceedings of the 1991 ACM SIGMOD InternationalConference on Management of Data, Denver, CO,USA, May 1991. ACM.

[50] Andreas Reuter. Concurrency on high-traffic dataelements. ACM, New York, New York, USA, March1982.

[51] Uwe Röhm, Klemens Böhm, Hans-Jörg Schek, andHeiko Schuldt. FAS — a freshness-sensitive coor-dination middleware for a cluster of OLAP compo-nents. In Proceedings of the 28th International Con-ference on Very Large Data Bases (VLDB ’02), HongKong, China, August 2002.

[52] Andrei Sabelfeld and Andrew C. Myers. Language-based information-flow security. IEEE Journal onSelected Areas in Communications, 21 (1): 1–15,January 2003.

[53] Adrian Sampson, Werner Dietl, Emily Fortuna,Danushen Gnanapragasam, Luis Ceze, and DanGrossman. Enerj: approximate data types forsafe and general low-power computation. InProceedings of the 32nd ACM SIGPLAN Confer-ence on Programming Language Design and Im-plementation, PLDI 2011, pages 164–174, 2011.doi:10.1145/1993498.1993518.

[54] Salvatore Sanfilippo. Redis. http://redis.io/,2015a.

[55] Salvatore Sanfilippo. Design and implementation ofa simple Twitter clone using PHP and the Redis key-value store. http://redis.io/topics/twitter-clone, 2015b.

[56] Marc Shapiro, Nuno Preguiça, Carlos Baquero, andMarek Zawirski. Conflict-free Replicated DataTypes. In Proceedings of the 13th International Con-ference on Stabilization, Safety, and Security of Dis-tributed Systems, SSS, pages 386–400, 2011. ISBN978-3-642-24549-7.

[57] KC Sivaramakrishnan, Gowtham Kaki, and SureshJagannathan. Declarative programming over even-tually consistent data stores. In Proceedings ofthe 36th ACM SIGPLAN Conference on Program-ming Language Design and Implementation - PLDI2015, PLDI. Association for Computing Machinery(ACM), 2015. doi:10.1145/2737924.2737981.

[58] Yair Sovran, Russell Power, Marcos K. Aguilera,and Jinyang Li. Transactional storage for geo-replicated systems. In ACM Symposium on Op-erating Systems Principles - SOSP’11, SOSP. As-sociation for Computing Machinery (ACM), 2011.doi:10.1145/2043556.2043592.

[59] Jeremy Stribling, Yair Sovran, Irene Zhang, XavidPretzer, Jinyang Li, M. Frans Kaashoek, and RobertMorris. Flexible, wide-area storage for distributedsystems with WheelFS. In Proceedings of the 6thUSENIX Symposium on Networked Systems Design

15

http://www.buzzfeed.com/daozers/what-its-like-to-work-on-buzzfeeds-tech-team-during-record-t




http://dx.doi.org/10.1145/7239.7265

http://outworkers.github.io/phantom/

http://outworkers.github.io/phantom/

http://dx.doi.org/10.1145/1066116.1189038

http://dx.doi.org/10.1145/1993498.1993518

http://redis.io/

http://redis.io/topics/twitter-clone

http://redis.io/topics/twitter-clone

http://dx.doi.org/10.1145/2737924.2737981

http://dx.doi.org/10.1145/2043556.2043592

and Implementation (NSDI), NSDI’09, pages 43–58, Berkeley, CA, USA, 2009. USENIX Associ-ation. URL http://dl.acm.org/citation.cfm?id=1558977.1558981.

[60] D.B. Terry, A.J. Demers, K. Petersen, M.J. Spre-itzer, M.M. Theimer, and B.B. Welch. Sessionguarantees for weakly consistent replicated data. InProceedings of 3rd International Conference onParallel and Distributed Information Systems, PDIS.Institute of Electrical& Electronics Engineers (IEEE), 1994.doi:10.1109/pdis.1994.331722.

[61] Douglas B. Terry, Vijayan Prabhakaran, Ramakr-ishna Kotla, Mahesh Balakrishnan, Marcos K. Aguil-era, and Hussam Abu-Libdeh. Consistency-basedservice level agreements for cloud storage. In Pro-ceedings of the Twenty-Fourth ACM Symposium onOperating Systems Principles - SOSP 13. ACMPress, 2013. doi:10.1145/2517349.2522731.

[62] The Linux Foundation. netem. http://www.linuxfoundation.org/collaborate/workgroups/networking/netem, November2009.

[63] Twitter, Inc. Finagle. https://twitter.github.io/finagle/, March 2016.

[64] Werner Vogels. Eventually consistent. Commu-nications of the ACM, 52 (1): 40, January 2009.doi:10.1145/1435417.1435432.

[65] Chao Xie, Chunzhi Su, Manos Kapritsos, YangWang, Navid Yaghmazadeh, Lorenzo Alvisi, andPrince Mahajan. Salt: Combining acid and base ina distributed database. In 11th USENIX Symposiumon Operating Systems Design and Implemen-tation (OSDI 14), pages 495–509, Broomfield,CO, October 2014. USENIX Association. ISBN978-1-931971-16-4. URL https://www.usenix.org/conference/osdi14/technical-sessions/presentation/xie.

[66] Chao Xie, Chunzhi Su, Cody Littley, LorenzoAlvisi, Manos Kapritsos, and Yang Wang.High-Performance ACID via Modular Con-currency Control. In ACM Symposium onOperating Systems Principles (SOSP), SOSP,pages 276–291, 2015. ISBN 978-1-4503-2388-8.doi:10.1145/2517349.2522729.

[67] Haifeng Yu and Amin Vahdat. Design and evaluationof a conit-based continuous consistency model forreplicated services. ACM Transactions on ComputerSystems (TOCS), 20 (3): 239–282, 2002.

16



http://dx.doi.org/10.1109/pdis.1994.331722

http://dx.doi.org/10.1145/2517349.2522731

http://www.linuxfoundation.org/collaborate/workgroups/networking/netem



https://twitter.github.io/finagle/

https://twitter.github.io/finagle/

http://dx.doi.org/10.1145/1435417.1435432

https://www.usenix.org/conference/osdi14/technical-sessions/presentation/xie



http://dx.doi.org/10.1145/2517349.2522729

disciplined inconsistency - dada€¦ · tion more latitude for enforcing it.ipa’s library...

Documents