mainak ghosh, wenting wang, gopalakrishna holla, indranil gupta

Morphus: Supporting Online Reconfigurations

in Sharded NoSQL Systems

Mainak Ghosh, Wenting Wang, Gopalakrishna Holla, Indranil Gupta

NoSQL Databases

Predicted to become a $3.4B industry by 2018

Database Reconfiguration

Problem: Changing database or table-level configuration parameterso Primary/Shard Key – MongoDB, Cassandrao Ring size - Cassandra

Challenge: Affects a lot of data at once

Motivation:o Initially DB configured based on guesseso Sys admin needs to play around with different parameters

• Seamlessly, and efficientlyo Later, as workload changes and business/use case evolves:

• Change parameters, but do it in a live database

Today's Prevalent Solution

Existing Solution: Create a new schema, export and re-import data

Why Bad?o Massive unavailability of data: every second of outage

costs $1.1K at Amazon and $1.5K at Googleo Reconfiguration change caused outage at Foursquareo Manual change of primary key at Google: took 2 years

and involved 2 dozen teams

Need a solution that is automated, seamless and efficient

Fast completion time Minimize the amount of data transfer required

Highly available CRUD operations should be answered as much as

possible with little impact on latency

Network Aware Reconfiguration should adapt to underlying network

latencies

Assumptions

Master Slave Replication

Range-based Partitioning

Flexibility in data assignment

MongoDB, RethinkDB, CouchDB, etc.

Optimal Algorithms

Notations

𝑪𝟏

𝑪𝟐

𝑪𝟑

Ko 1 2 3

Kn 2 4 8

Ko 4 5 6

Kn 1 6 3

Ko 7 8 9

Kn 9 5 7

Old Arrangement

𝑪 ′𝟏

𝑪 ′𝟐

𝑪 ′𝟑

Ko 4 1 6

Kn 1 2 3

Ko 2 8 5

Kn 4 5 6

Ko 9 3 7

Kn 7 8 9

New Arrangement

Greedy Approach

𝑪 ′𝟏

𝑪 ′𝟐

𝑪 ′𝟑

Lemma 1: The greedy algorithm is optimal in total network

transfer volume

Unbalanced Greedy

𝑪𝟏

𝑪𝟐

𝑪𝟑

Ko 1 2 3

Kn 2 4 8

Ko 4 5 6

Kn 1 6 3

Ko 7 8 9

Kn 9 5 7

Old Arrangement

𝑪 ′𝟏

𝑪 ′𝟐

𝑪 ′𝟑

Ko 4 1 6

Kn 1 2 3

Ko 2 8 5

Kn 4 5 6

Ko 9 3 7

Kn 7 8 9

New Arrangement

Bipartite Matching Approach

𝑪 ′𝟏

𝑪 ′𝟐

𝑪 ′𝟑

Lemma 2: Among all load-balanced strategies that assignat most V = new chunks to any server, the Hungarian algorithm is optimal in total network transfer volume

GreedyHungarian Assignment

System Design

Typical Sharded Deployment

RS0 Primary

RS1 Primary

RS2 Primary

Config

Front End

select * from table where user_id = 20

user_id = 20

select * from table where product_id = 20

Isolation Phase

RS 0 Secondar

Primary

RS 1 Secondar

Primary

RS 2 Secondar

Primary

Config

Front End

db.collection.changeShardKey(product_id:1)

Execution Phase

RS 0 Secondar

RS0 Primary

RS 1 Secondar

RS1 Primary

RS 2 Secondar

RS2 Primary

Config

Front End

1. Runs Algorithm2. Generates placement plan update table set price = 20

where user_id = 20

Recovery Phase

RS 0 Secondar

RS0 Primary

RS 1 Secondar

RS1 Primary

RS 2 Secondar

RS2 Primary

Config

Front End

Iteration 1Iteration 2

Commit Phase

RS0 Secondar

RS 0 Secondar

RS0 Primary

RS1 Secondar

RS 1 Secondar

RS1 Primary

RS2 Secondar

RS 2 Secondar

RS2 Primary

Config

Front End

update table set price = 20 where product_id = 20

RS0 Primary

RS1 Primary

RS2 Primary

RS 0 Secondar

RS 1 Secondar

RS 2 Secondar

Network Awareness

Assigns a socket per chunk per source-destination pair of communicating servers (Chunk-Based)

Hierarchical Topology Experiment

Inter Rack Latency

Unequal Data Size

Weighted Fair Sharing

Between a pair of source server i and destination server j, number of sockets assigned,

: Total amount of data that i needs to transfer to j : Observed round trip latency between i and j

This scheme is in contrast to Orchestra[Chowdhury et al] which only fair shares based on data size

Improvement

WFS strategy 30% better than naïve chunk-based scheme and 9% better

than Orchestra [Chowdhury et al]

Geo-Distributed Optimization

Morphus chooses slaves for reconfiguration during first Isolation phase

In a geo-distributed setting, naïve choice can lead to bulk transfers over wide area network

Solution: Localize bulk transfer by choosing replicas in the same datacentero Morphus extracts the datacenter information from

replica’s metadata.

2x-3x improvement observed in experiments

Evaluation

Dataset: Amazon Reviews [Snap].

Cluster: Emulab d710 nodes, 100 Mbps LAN switch and Google Cloud (n1-standard-4 VMs), 1 Gbps network

Workload: Custom generator similar to YCSB. Implements Uniform, Zipfian and Latest distribution for key access

Morphus: Implemented on top of MongoDB

Data Availability

Access Distribution

Read Success Rate

Write Success Rate

Read Only 99.9 -

Uniform 99.9 98.5

Latest 97.2 96.8

Zipf 99.9 98.3

Impact on Read Latency

Data Availability - Summary

Access Distribution

Read Success Rate

Write Success Rate

Read Only 99.9 -

Uniform 99.9 98.5

Latest 97.2 96.8

Zipf 99.9 98.3Morphus has a small impact on data

availability

Algorithm Comparison

Hungarian performs well in both scenarios and should be preferred over Greedy and Random schemes

Scalability

Sub-linear increase in reconfiguration timeas data and cluster size increases

Related Work

Online schema change [Rae et al.]: Resultant availabilities smaller.

Live data migration: Attempted in databases [Albatross, ShuttleDB] and VMs [Bradford et al.]. Similar approach as ours. o Albatross and ShuttleDB ship whole state to a set of

empty servers.

Reactive reconfiguration proposed by Squall [Elmore et al]: Fully available but takes longer.

Takeaways

Morphus is a system which allows live reconfiguration of a sharded NoSQL database.

Morphus is implemented on top of MongoDB Morphus uses network efficient algorithms for

placing chunks while changing shard key Morphus minimally affects data availability during

reconfiguration Morphus mitigates stragglers during data

migration by optimizing for the underlying network latencies.

mainak ghosh, wenting wang, gopalakrishna holla, indranil gupta

Documents

reserved bank of india (rbi) gopalakrishna committee report...

© network intelligence india pvt. ltd. report of the...

venugopal gopalakrishna-remani

mainak ganguly, chanchal mondal,joydeep chowdhury,jaya pal...

1 methods instructor: mainak chaudhuri...

magna mainak

tekst: ruud wenting fotografie: bertel kolthof m.m.v....

final report mainak bhaumik - 26

alice in deadland - mainak dhar

waste water treatment by gopalakrishna

ragavendra natarajan 1 , mainak chaudhuri 2 1 department...

opaque: a data analytics platform with strong security:...

class 12 bio chapter 2 ppt by asha gopalakrishna

discretized streams: fault-tolerant streaming computation at...

gureskola.files.wordpress.com · web view-mainak: hartu...

25. magna-mainak

indranil biswas and mainak poddar let x …arxiv:0808.3235v1...

the practice of predictive analytics in healthcare - by...

wenting ppt

marnik g. dekimpe, pierre frangois, srinath gopalakrishna