wyatt lloyd * michael j. freedman * michael kaminsky † david g. andersen ‡
DESCRIPTION
Don’t Settle for Eventual : Scalable Causal Consistency for Wide -Area Storage with COPS. Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky † David G. Andersen ‡ * Princeton, † Intel Labs, ‡ CMU. Wide-Area Storage. Stores: Status Updates Likes Comments - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky † David G. Andersen ‡](https://reader036.vdocuments.site/reader036/viewer/2022062517/56813651550346895d9dd438/html5/thumbnails/1.jpg)
Wyatt Lloyd*
Michael J. Freedman*
Michael Kaminsky†
David G. Andersen‡
*Princeton, †Intel Labs, ‡CMU
Don’t Settle for Eventual: Scalable Causal
Consistency for Wide-
Area Storage with COPS
![Page 2: Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky † David G. Andersen ‡](https://reader036.vdocuments.site/reader036/viewer/2022062517/56813651550346895d9dd438/html5/thumbnails/2.jpg)
Wide-Area Storage
Stores:Status UpdatesLikesCommentsPhotosFriends List
Stores:TweetsFavoritesFollowing List
Stores:Posts+1sCommentsPhotosCircles
![Page 3: Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky † David G. Andersen ‡](https://reader036.vdocuments.site/reader036/viewer/2022062517/56813651550346895d9dd438/html5/thumbnails/3.jpg)
Wide-Area StorageServes Requests Quickly
![Page 4: Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky † David G. Andersen ‡](https://reader036.vdocuments.site/reader036/viewer/2022062517/56813651550346895d9dd438/html5/thumbnails/4.jpg)
Inside the Datacenter
Web Tier Storage Tier
A-F
G-L
M-R
S-Z
Web Tier Storage Tier
A-F
G-L
M-R
S-Z
Remote DC
Replication
![Page 5: Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky † David G. Andersen ‡](https://reader036.vdocuments.site/reader036/viewer/2022062517/56813651550346895d9dd438/html5/thumbnails/5.jpg)
Desired Properties: ALPS
• Availability
• Low Latency
• Partition Tolerance
• Scalability
“Always On”
![Page 6: Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky † David G. Andersen ‡](https://reader036.vdocuments.site/reader036/viewer/2022062517/56813651550346895d9dd438/html5/thumbnails/6.jpg)
ScalabilityIncrease capacity and throughput in each datacenter
A-Z A-ZA-L
M-Z
A-L
M-Z
A-F
G-L
M-R
S-Z
A-F
G-L
M-R
S-Z
A-C
D-F
G-J
K-L
M-O
P-S
T-V
W-Z
A-C
D-F
G-J
K-L
M-O
P-S
T-V
W-Z
![Page 7: Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky † David G. Andersen ‡](https://reader036.vdocuments.site/reader036/viewer/2022062517/56813651550346895d9dd438/html5/thumbnails/7.jpg)
Desired Property: Consistency
• Restricts order/timing of operations
• Stronger consistency:– Makes programming easier– Makes user experience better
![Page 8: Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky † David G. Andersen ‡](https://reader036.vdocuments.site/reader036/viewer/2022062517/56813651550346895d9dd438/html5/thumbnails/8.jpg)
Consistency with ALPS
Strong
Sequential
Causal
Eventual
Impossible [Brewer00, GilbertLynch02]
Impossible [LiptonSandberg88, AttiyaWelch94]
COPS
Amazon LinkedIn Facebook/ApacheDynamo Voldemort Cassandra
![Page 9: Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky † David G. Andersen ‡](https://reader036.vdocuments.site/reader036/viewer/2022062517/56813651550346895d9dd438/html5/thumbnails/9.jpg)
System A L P S Consistency
Scatter ✖ ✖ ✖ ✔ ✔ StrongWalter ✖ ✖ ✖ ? PSI + Txn
COPS ✔ ✔ ✔ ✔ Causal+
Bayou ✔ ✔ ✔ ✖ Causal+
PNUTS ✔ ✔ ? ✔ Per-Key Seq.
Dynamo ✔ ✔ ✔ ✔ ✖ Eventual
![Page 10: Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky † David G. Andersen ‡](https://reader036.vdocuments.site/reader036/viewer/2022062517/56813651550346895d9dd438/html5/thumbnails/10.jpg)
Causality By Example
Remove boss from friends group
Post to friends: “Time for a new
job!”
Friend reads post
Causality ( )Thread-of-Execution
Gets-From
TransitivityNew Job!
FriendsBoss
![Page 11: Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky † David G. Andersen ‡](https://reader036.vdocuments.site/reader036/viewer/2022062517/56813651550346895d9dd438/html5/thumbnails/11.jpg)
Causality Is Useful
For Programmers:For Users:
Photo Upload
Add to album
Employment Integrity Referential Integrity
New Job!
FriendsBoss
![Page 12: Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky † David G. Andersen ‡](https://reader036.vdocuments.site/reader036/viewer/2022062517/56813651550346895d9dd438/html5/thumbnails/12.jpg)
Conflicts in Causal
K=2K=1K=1
K=2K=1
K=2
![Page 13: Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky † David G. Andersen ‡](https://reader036.vdocuments.site/reader036/viewer/2022062517/56813651550346895d9dd438/html5/thumbnails/13.jpg)
Conflicts in Causal
K=2K=3
K=2K=3
K=2K=3
Causal + Conflict Handling = Causal+
![Page 14: Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky † David G. Andersen ‡](https://reader036.vdocuments.site/reader036/viewer/2022062517/56813651550346895d9dd438/html5/thumbnails/14.jpg)
Previous Causal+ Systems
• Bayou ‘94, TACT ‘00, PRACTI ‘06– Log-exchange based
• Log is single serialization point– Implicitly captures and enforces causal order– Limits scalability OR– No cross-server causality
![Page 15: Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky † David G. Andersen ‡](https://reader036.vdocuments.site/reader036/viewer/2022062517/56813651550346895d9dd438/html5/thumbnails/15.jpg)
Scalability Key Idea
• Dependency metadata explicitly captures causality
• Distributed verifications replace single serialization– Delay exposing replicated puts until all
dependencies are satisfied in the datacenter
![Page 16: Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky † David G. Andersen ‡](https://reader036.vdocuments.site/reader036/viewer/2022062517/56813651550346895d9dd438/html5/thumbnails/16.jpg)
COPS
Key-Value Store
Causal+Replication
AllData
AllData
AllData
Client Library
Local Datacenter
![Page 17: Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky † David G. Andersen ‡](https://reader036.vdocuments.site/reader036/viewer/2022062517/56813651550346895d9dd438/html5/thumbnails/17.jpg)
Get
getget
Client Library
Key-Value Store
Local Datacenter
![Page 18: Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky † David G. Andersen ‡](https://reader036.vdocuments.site/reader036/viewer/2022062517/56813651550346895d9dd438/html5/thumbnails/18.jpg)
Put
Client Library
put put_after
?
?
Key-Value Store
Replication Qput
after
K:V
put+
orderingmetadata
putafter =
Local Datacenter
![Page 19: Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky † David G. Andersen ‡](https://reader036.vdocuments.site/reader036/viewer/2022062517/56813651550346895d9dd438/html5/thumbnails/19.jpg)
Dependencies
• Dependencies are explicit metadata on values• Library tracks and attaches them to put_afters
![Page 20: Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky † David G. Andersen ‡](https://reader036.vdocuments.site/reader036/viewer/2022062517/56813651550346895d9dd438/html5/thumbnails/20.jpg)
Dependencies
• Dependencies are explicit metadata on values• Library tracks and attaches them to put_afters
put(Key, Val)put_after(Key,Val,deps)
versiondeps
. . . Kversion
(Thread-Of-Execution Rule)
Client 1
![Page 21: Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky † David G. Andersen ‡](https://reader036.vdocuments.site/reader036/viewer/2022062517/56813651550346895d9dd438/html5/thumbnails/21.jpg)
Dependencies
• Dependencies are explicit metadata on values• Library tracks and attaches them to put_afters
deps. . . Kversion
L337
M195
(Gets-From Rule)
get(K)
get(K)
value,version,deps'value
(Transitivity Rule)
deps'L337
M195
Client 2
![Page 22: Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky † David G. Andersen ‡](https://reader036.vdocuments.site/reader036/viewer/2022062517/56813651550346895d9dd438/html5/thumbnails/22.jpg)
Causal+ Replication
Key-Value Store
Replication Qput
after
put_after(K,V,deps)K:V,deps
![Page 23: Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky † David G. Andersen ‡](https://reader036.vdocuments.site/reader036/viewer/2022062517/56813651550346895d9dd438/html5/thumbnails/23.jpg)
Causal+ Replication
put_after(K,V,deps) dep_check(L337)K:V,deps
deps
L337
M195 dep_check(M
195 )Exposing values after dep_checks return
ensures causal+
![Page 24: Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky † David G. Andersen ‡](https://reader036.vdocuments.site/reader036/viewer/2022062517/56813651550346895d9dd438/html5/thumbnails/24.jpg)
Basic COPS Summary
• Serve operations locally, replicate in background– “Always On”
• Partition keyspace onto many nodes– Scalability
• Control replication with dependencies– Causal+ Consistency
![Page 25: Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky † David G. Andersen ‡](https://reader036.vdocuments.site/reader036/viewer/2022062517/56813651550346895d9dd438/html5/thumbnails/25.jpg)
RemoteDatacenter
Boss
Portugal!
Gets Aren’t EnoughRemoteProgress
RemoteProgress
RemoteProgress
MyOperations
New Job!
BossBoss
Portugal!
Boss
Boss New Job!New Job!
You’re Fired!!
![Page 26: Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky † David G. Andersen ‡](https://reader036.vdocuments.site/reader036/viewer/2022062517/56813651550346895d9dd438/html5/thumbnails/26.jpg)
RemoteDatacenter
Boss
Portugal!
Gets Aren’t Enough
Boss
Portugal!
Boss
BossNew Job!New Job!
You’re Fired!!
Portugal!
RemoteProgress
RemoteProgress
RemoteProgress
MyOperations
New Job!
Boss
Boss
New Job!Portugal!
BossBoss
![Page 27: Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky † David G. Andersen ‡](https://reader036.vdocuments.site/reader036/viewer/2022062517/56813651550346895d9dd438/html5/thumbnails/27.jpg)
Get Transactions
• Provide consistent view of multiple keys– Snapshot of visible values
• Keys can be spread across many servers
• Takes at most 2 parallel rounds of gets
• No locks, no blocking
Low Latency
![Page 28: Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky † David G. Andersen ‡](https://reader036.vdocuments.site/reader036/viewer/2022062517/56813651550346895d9dd438/html5/thumbnails/28.jpg)
RemoteDatacenter
Boss
Portugal!
Get TransactionsRemoteProgress
RemoteProgress
RemoteProgress
MyOperations
New Job!
BossBoss
Portugal!Boss
Portugal!
Boss
New Job!
Portugal! RemoteProgress
RemoteProgress
Boss
New Job!Portugal!
BossBoss
Boss Portugal!
Portugal!Boss
Portugal!Boss
New Job!Boss
Could Get
NeverBoss New Job!
![Page 29: Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky † David G. Andersen ‡](https://reader036.vdocuments.site/reader036/viewer/2022062517/56813651550346895d9dd438/html5/thumbnails/29.jpg)
System So Far
• ALPS and Causal+, but …
• Proliferation of dependencies reduces efficiency– Results in lots of metadata– Requires lots of verification
• We need to reduce metadata and dep_checks– Nearest dependencies– Dependency garbage collection
![Page 30: Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky † David G. Andersen ‡](https://reader036.vdocuments.site/reader036/viewer/2022062517/56813651550346895d9dd438/html5/thumbnails/30.jpg)
Many Dependencies
• Dependencies grow with client lifetime
Put
Put
Put
Put
GetGet
![Page 31: Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky † David G. Andersen ‡](https://reader036.vdocuments.site/reader036/viewer/2022062517/56813651550346895d9dd438/html5/thumbnails/31.jpg)
Nearest Dependencies
• Transitively capture all ordering constraints
![Page 32: Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky † David G. Andersen ‡](https://reader036.vdocuments.site/reader036/viewer/2022062517/56813651550346895d9dd438/html5/thumbnails/32.jpg)
The Nearest Are Few
• Transitively capture all ordering constraints
![Page 33: Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky † David G. Andersen ‡](https://reader036.vdocuments.site/reader036/viewer/2022062517/56813651550346895d9dd438/html5/thumbnails/33.jpg)
The Nearest Are Few
• Only check nearest when replicating
• COPS only tracks nearest
• COPS-GT tracks non-nearest for transactions
• Dependency garbage collection tames metadata in COPS-GT
![Page 34: Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky † David G. Andersen ‡](https://reader036.vdocuments.site/reader036/viewer/2022062517/56813651550346895d9dd438/html5/thumbnails/34.jpg)
Extended COPS Summary
• Get transactions– Provide consistent view of multiple keys
• Nearest Dependencies– Reduce number of dep_checks– Reduce metadata in COPS
![Page 35: Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky † David G. Andersen ‡](https://reader036.vdocuments.site/reader036/viewer/2022062517/56813651550346895d9dd438/html5/thumbnails/35.jpg)
Evaluation Questions
• Overhead of get transactions?
• Compare to previous causal+ systems?
• Scale?
![Page 36: Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky † David G. Andersen ‡](https://reader036.vdocuments.site/reader036/viewer/2022062517/56813651550346895d9dd438/html5/thumbnails/36.jpg)
Experimental Setup
COPS
Remote DC
COPS ServersClients
Local Datacenter
N N
N
Replication
![Page 37: Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky † David G. Andersen ‡](https://reader036.vdocuments.site/reader036/viewer/2022062517/56813651550346895d9dd438/html5/thumbnails/37.jpg)
COPS & COPS-GTCompetitive for Expected Workloads
High per-client write rates result in 1000s of dependencies
Low per-clientwrite rates expected
People tweeting 1000 times/sec
People tweeting 1 time/sec
All Put Workload – 4 Servers / Datacenter
![Page 38: Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky † David G. Andersen ‡](https://reader036.vdocuments.site/reader036/viewer/2022062517/56813651550346895d9dd438/html5/thumbnails/38.jpg)
COPS & COPS-GTCompetitive for Expected Workloads
Varied Workloads – 4 Servers / Datacenter
Pathological Expected
Workload
![Page 39: Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky † David G. Andersen ‡](https://reader036.vdocuments.site/reader036/viewer/2022062517/56813651550346895d9dd438/html5/thumbnails/39.jpg)
COPS Low Overhead vs. LOG
• COPS – dependencies ≈ LOG• 1 server per datacenter only
• COPS and LOG achieve very similar throughput– Nearest dependencies mean very little metadata– In this case dep_checks are function calls
![Page 40: Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky † David G. Andersen ‡](https://reader036.vdocuments.site/reader036/viewer/2022062517/56813651550346895d9dd438/html5/thumbnails/40.jpg)
COPS Scales Out
![Page 41: Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky † David G. Andersen ‡](https://reader036.vdocuments.site/reader036/viewer/2022062517/56813651550346895d9dd438/html5/thumbnails/41.jpg)
Conclusion
• Novel Properties– First ALPS and causal+ consistent system in COPS– Lock free, low latency get transactions in COPS-GT
• Novel techniques– Explicit dependency tracking and verification with
decentralized replication– Optimizations to reduce metadata and checks
• COPS achieves high throughput and scales out