Download - A zoom on membase vng
A Zoom on Membase Dedicated to VNG
Viet-Trung TRAN ENS Cachan, INRIA/IRISA France
19/06/11 1 www.trungtv.com
What’s Membase A key/value store
Simple, fast, elastic
Membase’s API is simple but not simpler SET(key, value) Value = GET(key)
19/06/11 2 www.trungtv.com
Where’s Membase SQL database? No
No complex queries, no-schema, no ACID
NoSQL Non-relational, distributed and HORIZONTALLY scalable Key/value store
Dynamo, Membase, Voldemort, Riak, Redis, etc.
Column-oriented store BigTable, Hbase, Cassandra, etc.
Documents store MongoDB, CouchDB, Terrastore, etc.
Array-oriented store Pyramid, SciDB
19/06/11 3 www.trungtv.com
Why NoSQL For over 40 years, mostly used RDMS
So good but so COMPLEX Hard to SCALE
2005: “One size fits all”: An idea whose time has come and gone
Called for “Scale OUT” design Cheap, easy
Why Membase Membase = So-called Memcached + persistent storage Membase = A Distributed caching system + persistent storage
19/06/11 4 www.trungtv.com
Why Membase
19/06/11 www.trungtv.com 5
Membase = So-called Memcached + persistent storage Membase = A Distributed caching system + persistent
storage
Membase speaking Memcached languages
MEMBASE = SIMPLE, FAST, ELASTIC Simple
2 primitives GET, SET (key, value)
Fast Cost for I/O routing: O(1) Give me a key, I know exactly where to go
Elastic Free scalle UP and DOWN Scale from 1 to thousands machines Fault-tolerance
19/06/11 6 www.trungtv.com
Membase deployment
19/06/11 www.trungtv.com 7
Data flow
19/06/11 www.trungtv.com 8
Map(Key, vbucket) Map(vbucket, node)
Data flow [cont’]
19/06/11 www.trungtv.com 9
Internal data flow + replication schema
Membase arch
19/06/11 www.trungtv.com 10
Symmetric design: identical software on every nodes Data management Membership management
Thinking on Membase
Personal view
19/06/11 www.trungtv.com 11
Membase’s design choices
19/06/11 www.trungtv.com 12
CAP theorem: Pick 2 out of 3 Consistency Availability Patition-tolerance
Membase is CA Do we really need strong consistency ?
Strong consistency
19/06/11 www.trungtv.com 13
Pessimistic replication may be costly A write is blocking until data is completely replicated
1 single master node coordinates reads and writes Lower I/O performance in concurrency
Synchronous replication schema One replica failed, I/O failed
Proposal: using different consistency models depending on applications
Data migration & replication
19/06/11 www.trungtv.com 14
LRU algorithm Replication factor is configurable per (key, value)?
Vbucket
Re-replication in case of failure? “Anti-entropy” replica synchronisation?
Proposal: Application-aware migration is the best
Cluster management
19/06/11 www.trungtv.com 15
One single node is elected as cluster leader Only running efficiently in single cluster environment High load on the leader at large-scale
Rebalancing? Permanent failure vs temporary failure?
“Node capacity-aware” load balancing? Heartbeat frequency should be well configured
Depending on cluster size and network type
Efficiency of leader election algorithm?
Conclusion
19/06/11 www.trungtv.com 16
Pros In production for many companies Well known API
Cons Not so well documented May be better in source code? Some key techniques should be well clarified
One size fit all has come and gone: Design patterns Application-aware Infrastructure-aware Human resource-aware
Thank you!
19/06/11 www.trungtv.com 17