volley: automated data placement for geo-distributed cloud...

Volley:Automated Data Placement for Geo-Distributed Cloud

Services

Authors:Sharad Agarwal, John Dunagen, Navendu Jain, Stefan Saroiu, Alec Wolman, Harbinder Bogan

7th USENIX Symposium on Networked Systems Design and Implementation (NSDI 2010)

Martin BeyerDecember 14, 2010

1 / 45Volley: Automated Data Placement for Geo-Distributed Cloud Services

N

Introduction

Cloud ServicesUsers from all continents want to collaborate through cloud services

Do not accept high latencies

Cloud services deal with highly dynamic data (e.g. Facebook wall)

Placement of user & application data


N

Introduction

Worldwide DistributionServe all users from the best datacenter (DC) with respect to userperceived latency

Cloud service providers use many geographically dispersed DCs

What data to store at which datacenter?

Interdependencies between data itemsMinimize operational cost of the datacenters

Inter-DC traffic due to data sharing or interdependenciesProvisioned capacity at each DC


N

Introduction

ReplicationData replication for fault-tolerance

Hardware failuresNatural disasters

Replication for availabilityLarge scale outages

No single point of failureReplicas need to communicate frequently

SynchronizationEnsure consistency


N

Introduction

Impacts of Data PlacementLatency increases between distant locations→ Move data near the users that most frequently access it

Amount of inter-DC traffic influences bandwidth costs→ Colocate data items

Capacity skew among DCs increases hardware costs→ Uniformly distribute among DCs


N

Introduction

Approaches to Data PlacementHow to find a good data placement that reduces latency andoperational cost?Full replication at each datacenter

Lowest latency for the usersExcessive costs for DC operators

Single DC holds all dataNo inter-DC trafficMany unhappy users due to high latency

Partition data across multiple DCsChallenging problem to find good placementNeed to analyze patterns of data accessProcess � 108 objects


N

1 Introduction

2 Analysis of Cloud Services

3 Data Placement

4 Evaluation

5 Conclusion

Analysis of Cloud Services

Challenges of Data PlacementCloud services deal with highly dynamic data

High update rates lead to stale replicasUpdates need to be visible worldwide

Collaboration around the worldUsers work together on a shared data item

Data interdependenciesPublish-Subscribe mechanisms; “Friend of a friend”Can be modeled as dependency graph

Generate huge data setsNeed solutions for efficient analysis of the dependency graph


N


Challenges of Data Placement (cont’d)Applications change frequently

Need to continuously adapt to changing usage patterns

Increasing user mobilityWhen should data be migrated to new location?

Infrastructure can changeCapacity limits or latencies between DCs may change


N


Network TracesDatacenter applications collect workload traces

Month-long log from Live Mesh and Live MessengerAnalysis focuses on the aspects of

Shared dataData interdependenciesUser mobility


N


Live MeshFile & Application synchronization

Cloud storage

Data feeds


N


Live MessengerInstant Messaging

Video conferencing

Continuous group conversation

Contact status updates


N

Client 1 Client 2

FacebookRSS Feed 1

FacebookWall 1

FacebookWall 2

FacebookRSS Feed 2


FacebookFacebook wall

Connects users to all of their friends

Users can receive updates via RSS feeds

Interdependencies between walls and RSS feeds


N

Client 1 Client 2

Frontend Frontend

Queue

Pub-Sub

DeviceConnectivity


Data Sharing in Live MeshClients access Live Mesh through Web frontend

Update to device connectivity status

Multiple queue items can subscribe to publish-subscribe object


N


Data Interdependencies in Live MeshChange to a document creates an update message atPublish-Subscribe objects

Queue objects receive a copy of that message

Long tail of very popular data items


N


Geographically Distant Data SharingCompute sharing centroid for each data item

Weighted mean between the users that access it

Large amount of sharing occurs between distant clients


N


Client MobilityGeo-location database quova.com

Maps IP address to geographic region

Centroid computed from all locations where the client contacted theservice

Large movements in the Live Messenger trace


N

1 Introduction


3 Data Placement

4 Evaluation

5 Conclusion

Data Placement

Known HeuristicsDetermine user locationMove data to closest datacenter for that user

with the goal to reduce user latency

Ignores major sources of operational costsWAN bandwidth between DCsOverprovisioned datacenter capacity due to skewed load

19 / 45Volley: Automated Data Placement for Geo-Distributed Cloud ServicesN

Data Placement

Volley’s ApproachVolley optimizes data placement for latency

and allows to limit operational costs

Correlates application logs into graph that captures a global viewon data accesses

Analyzes data interdependencies and user behavior within cloudservices

Compute data placement and output recommendations when datashould be migrated


N

Capacity ModelCost Model Latency Model

Constraints on Placement

ProposalVOLLEY

DistributedStorage

DC 3

DC 1DC 2 LOG

Data Placement

Volley in a NutshellInput: logs & models

Datacenter logs in distributed storage systemModels for cost, capacity and latencyConstraints on placement

Iterative optimization algorithmDistributed computing framework

Output: migration recommendations


N

Data Placement

Requirements for LogsCapture logical flow of control across components→ Construct dependency graph

Provide unique identifiers fordata items: GUIDusers: IP

Request log record:TimestampSource-entity: IP or GUIDRequest sizeDestination entity: GUIDTransaction ID: trace request in logs


N

Data Placement

Logged EventsLive Mesh Trace

Changes to filesDevice connectivity

Live Messenger TraceLogin/Logoff eventsParticipants in each conversationNumber of messages between users


N

Data Placement

Datacenter Cost ModelCost per transaction, such as RAM, disk and CPU

Capacity model for all DCs, e.g. amount of data stored at each DC

Cost model for all DCs

Models change on slower time scales

→ Specify the hardware provisioning in DCs to run the service

→ Required network bandwidth

→ Charging model for service use of network bandwidth


N

Data Placement

Additional InputsLocation of each data itemModel of latency

Network coordinate system: n-dimensional space specified by themodelLocations of nodes → predicted latency

Constraints on placementReplication at distant datacentersLegal constraints

→ Allows to make placement decisions


N

Data Placement

AlgorithmPhase 1 Compute initial placementPhase 2 Iteratively move data to reduce latencyPhase 3 Collapse data to datacenters


N

Data Placement

Phase 1: Initial PlacementMap data items to the weighted average of the geographiccoordinates of the clients that access it

Weight = amount of communication client↔data item∀ data items: compute weighted spherical mean

Interpolate between 2 initial points (clients)Average in additional points

Some data items may never be accessed directly by a clientMove them near the already fixed data items

Ignores data interdependencies!


N

Data Placement

Phase 1: Initial Placement


N

Data Placement

Phase 2: Iteratively Improve PlacementMove data items closer to users and other data items thatfrequently interact∀ data items: determine movement to another node

Current latency and amount of communication increases thecontracting force

Updates to placement pull nodes togetherData items moveableClient locations fixed

Replicas treated as separate data items that interact frequently

→ Reduce latency

→ Reduce inter-DC traffic (if data items colocated)


N

Data Placement

Phase 3: Collapse Data to DatacentersMove data to nearest datacenterIf DC over specified capacity

Identify data objects with fewest accessesMove them to the next closest DC

Iterations ≤ #DCs


N

Data Placement

Output: Migration ProposalsApplication-specific migration

Supports diverse datacenter applications

Proposal record:Entity: GUIDNew datacenterAverage latency change per requestOngoing bandwidth change per dayOne-time migration bandwidth


N

1 Introduction


3 Data Placement

4 Evaluation

5 Conclusion

Evaluation

Test EnvironmentMonth-long Live Mesh trace

Compute placement on week 1Evaluate placement on weeks 2-4

12 datacenters as potential locations

Capacity limit: ≤ 10% of all data at each DC

Analytic evaluation using network coordinate system


N

Evaluation

HeuristicscommonIP

Place data near IP with most frequent accessOptimizes for latency

oneDCPlace all data in one datacenterOptimizes for zero inter-DC traffic

hashPlace data according to hash functionOptimizes for zero capacity skew


N

Evaluation

Capacity Skew & Inter-DC Traffic


N

Evaluation

LatencyVolley performs better than commonIP and provides

lower capacity skewfewer inter-DC messages


N

20 VMs12 DCs

109 Client nodes

Evaluation

Live System LatencyLive Mesh prototype

Frontend: allows clients to connect to any DCDocument Service: stores IP addresses of the clientsPublish-Subscribe Service: notifies about changes in the documentserviceMessage Queue Service: buffers messages from thePublish-Subscribe Service

Live Mesh trace replayed from 109 nodes scattered around the world


N

Evaluation

Live System LatencyExternal sources of noise

Less client locations than real world scenarioConnectivity of the simulated clients does not conform to Volley’slatency model


N

Evaluation

Impact of Iteration Count on CapacitySkew

Most objects do not move after phase 1

Capacity skew smoothed in phase 3


N

Evaluation

Impact of Iteration Count on Client La-tency

Latency remains stable after few iterations of phase 2

Almost no penalty from phase 3


N

Evaluation

Re-ComputationVolley should be re-run frequently

Stale placements increase request latency due to client mobilityInter-datacenter traffic increases due to new objects that cannot beplaced intelligentlyChanging access patterns may require data movementNew clients need to be served


N

Evaluation

Migrated ObjectsPercentage of objects moved in placement computed after week Xcompared to first week

Most old objects do not move


N

Conclusion

SummaryAutomatic recommendations for data-placement underconstraintsPlacement can be controlled to

take resource usage into account (Cost & Capacity Models)ensure replication (Constraints on Placement)

Application independence allowing for specialized migrationmechanisms

Analysis of cloud services hightlighted the trends that motivatedVolley: shared data, data interdependencies and user mobilityEvaluation shows that Volley simultaneously reduces latency andoperational costs

Improvement over state-of-the-art heuristic


N

Conclusion

Open QuestionsVolley handles placement decisions within cloud service

Extension to output recommendations to DC operators toupgrade their DCs or build new ones

Can we allow new objects to be registered such that they get agood initial placement?Volley handles replicas as separate data items

Better alternative for modeling replicas?


N

Thanks for your attention


N

volley: automated data placement for geo-distributed cloud...

Documents