mainak ghosh, wenting wang, gopalakrishna holla, indranil gupta

Morphus: Supporting Online Reconfigurations

in Sharded NoSQL Systems

Mainak Ghosh, Wenting Wang, Gopalakrishna Holla, Indranil Gupta

2

NoSQL Databases

Predicted to become a $3.4B industry by 2018

3

Database Reconfiguration

Problem: Changing database or table-level configuration parameterso Primary/Shard Key – MongoDB, Cassandrao Ring size - Cassandra

Challenge: Affects a lot of data at once

Motivation:o Initially DB configured based on guesseso Sys admin needs to play around with different parameters

• Seamlessly, and efficientlyo Later, as workload changes and business/use case evolves:

• Change parameters, but do it in a live database

4

Today's Prevalent Solution

Existing Solution: Create a new schema, export and re-import data

Why Bad?o Massive unavailability of data: every second of outage

costs $1.1K at Amazon and $1.5K at Googleo Reconfiguration change caused outage at Foursquareo Manual change of primary key at Google: took 2 years

and involved 2 dozen teams

Need a solution that is automated, seamless and efficient

5

Goals

Fast completion time Minimize the amount of data transfer required

Highly available CRUD operations should be answered as much as

possible with little impact on latency

Network Aware Reconfiguration should adapt to underlying network

latencies

6

Assumptions

Master Slave Replication

Range-based Partitioning

Flexibility in data assignment

MongoDB, RethinkDB, CouchDB, etc.

Optimal Algorithms

8

Notations

S1

S2

S3

𝑪𝟏

𝑪𝟐

𝑪𝟑

Ko 1 2 3

Kn 2 4 8

Ko 4 5 6

Kn 1 6 3

Ko 7 8 9

Kn 9 5 7

Old Arrangement

𝑪 ′𝟏

𝑪 ′𝟐

𝑪 ′𝟑

Ko 4 1 6

Kn 1 2 3

Ko 2 8 5

Kn 4 5 6

Ko 9 3 7

Kn 7 8 9

New Arrangement

9

Greedy Approach

S1

S2

S3

𝑪 ′𝟏

𝑪 ′𝟐

𝑪 ′𝟑

11

1

21

0

01

2

Lemma 1: The greedy algorithm is optimal in total network

transfer volume

10

Unbalanced Greedy

S1

S2

S3

𝑪𝟏

𝑪𝟐

𝑪𝟑

Ko 1 2 3

Kn 2 4 8

Ko 4 5 6

Kn 1 6 3

Ko 7 8 9

Kn 9 5 7

Old Arrangement

𝑪 ′𝟏

𝑪 ′𝟐

𝑪 ′𝟑

Ko 4 1 6

Kn 1 2 3

Ko 2 8 5

Kn 4 5 6

Ko 9 3 7

Kn 7 8 9

New Arrangement

11

Bipartite Matching Approach

S1

S2

S3

𝑪 ′𝟏

𝑪 ′𝟐

𝑪 ′𝟑

12

3

21

0

00

0

Lemma 2: Among all load-balanced strategies that assignat most V = new chunks to any server, the Hungarian algorithm is optimal in total network transfer volume

GreedyHungarian Assignment

System Design

13

Typical Sharded Deployment

RS0 Primary

RS1 Primary

RS2 Primary

Config

Front End

Front End

select * from table where user_id = 20

user_id = 20

RS1

select * from table where product_id = 20

14

Isolation Phase

RS 0 Secondar

yRS0

Primary

RS 1 Secondar

yRS1

Primary

RS 2 Secondar

yRS2

Primary

Config

Front End

Front End

db.collection.changeShardKey(product_id:1)

15

Execution Phase

RS 0 Secondar

y

RS0 Primary

RS 1 Secondar

y

RS1 Primary

RS 2 Secondar

y

RS2 Primary

Config

Front End

Front End


1. Runs Algorithm2. Generates placement plan update table set price = 20

where user_id = 20

16

Recovery Phase

RS 0 Secondar

y

RS0 Primary

RS 1 Secondar

y

RS1 Primary

RS 2 Secondar

y

RS2 Primary

Config

Front End

Front End


Iteration 1Iteration 2

17

Commit Phase

RS0 Secondar

y

RS 0 Secondar

y

RS0 Primary

RS1 Secondar

y

RS 1 Secondar

y

RS1 Primary

RS2 Secondar

y

RS 2 Secondar

y

RS2 Primary

Config

Front End

Front End


update table set price = 20 where product_id = 20

RS0 Primary

RS1 Primary

RS2 Primary

RS 0 Secondar

y

RS 1 Secondar

y

RS 2 Secondar

y

Network Awareness

19

Assigns a socket per chunk per source-destination pair of communicating servers (Chunk-Based)

Hierarchical Topology Experiment

Inter Rack Latency

Unequal Data Size

20

Weighted Fair Sharing

Between a pair of source server i and destination server j, number of sockets assigned,

: Total amount of data that i needs to transfer to j : Observed round trip latency between i and j

This scheme is in contrast to Orchestra[Chowdhury et al] which only fair shares based on data size

21

Improvement

WFS strategy 30% better than naïve chunk-based scheme and 9% better

than Orchestra [Chowdhury et al]

22

Geo-Distributed Optimization

Morphus chooses slaves for reconfiguration during first Isolation phase

In a geo-distributed setting, naïve choice can lead to bulk transfers over wide area network

Solution: Localize bulk transfer by choosing replicas in the same datacentero Morphus extracts the datacenter information from

replica’s metadata.

2x-3x improvement observed in experiments

Evaluation

24

Setup

Dataset: Amazon Reviews [Snap].

Cluster: Emulab d710 nodes, 100 Mbps LAN switch and Google Cloud (n1-standard-4 VMs), 1 Gbps network

Workload: Custom generator similar to YCSB. Implements Uniform, Zipfian and Latest distribution for key access

Morphus: Implemented on top of MongoDB

25

Data Availability

Access Distribution

Read Success Rate

Write Success Rate

Read Only 99.9 -

Uniform 99.9 98.5

Latest 97.2 96.8

Zipf 99.9 98.3

26

Impact on Read Latency

27

Data Availability - Summary

Access Distribution

Read Success Rate

Write Success Rate

Read Only 99.9 -

Uniform 99.9 98.5

Latest 97.2 96.8

Zipf 99.9 98.3Morphus has a small impact on data

availability

28

Algorithm Comparison

Hungarian performs well in both scenarios and should be preferred over Greedy and Random schemes

29

Scalability

Sub-linear increase in reconfiguration timeas data and cluster size increases

30

Related Work

Online schema change [Rae et al.]: Resultant availabilities smaller.

Live data migration: Attempted in databases [Albatross, ShuttleDB] and VMs [Bradford et al.]. Similar approach as ours. o Albatross and ShuttleDB ship whole state to a set of

empty servers.

Reactive reconfiguration proposed by Squall [Elmore et al]: Fully available but takes longer.

31

Takeaways

Morphus is a system which allows live reconfiguration of a sharded NoSQL database.

Morphus is implemented on top of MongoDB Morphus uses network efficient algorithms for

placing chunks while changing shard key Morphus minimally affects data availability during

reconfiguration Morphus mitigates stragglers during data

migration by optimizing for the underlying network latencies.

mainak ghosh, wenting wang, gopalakrishna holla, indranil gupta

Documents