amazon's highly available key-value store
TRANSCRIPT
Briefly about DynamoDeveloped by Amazon
Used for primary key access to data
«Reliability at massive scale»
Tight control over tradeoffs
Data is stored using consistent hashing
Performed «without any downtime during thebusy holiday shopping season»
Each service running Dynamo runs a separate instance
Assumptions and Requirements1. Data is uniquely identified by a primary key
2. The ACID properties are important – save for some slack in C
3. Dynamo must function on commodity hardware
4. Dynamo should only be used internally – non-hostile environment
Key principles for designThe call stack for a client request usually has more than one level
Designed as an eventually consistent system
Writes are never rejected
Incremental scaleability
Symmetry
Decentrailization
Heterogenecity
Dynamo compared to other systemsThis section compares Dynamo to several other systems (in terms of system requirements)
Most important:◦ Always writeable
◦ Key/value access
The 99.9th percentile of read and write operations should be «a few hundred milliseconds»
The Dynamo interface (1)Two operations
◦ get(key)
◦ put(key, context, object)
All keys are hashed to a 128 bit number, creating a «ring»
Dynamo nodes are spread out on this ring, and responsible for a part of the ring
The context contains metadata about theobject
The Dynamo interface (2)Some important notation
◦ 𝑁 is the number of replicas to store in thesystem
◦ 𝑆 is the total number of nodes in the system
A vector clock of length 𝑁 keeps track ofversioning
Object 1 is the ancestor of object 2 iff theentire vector clock of 1 is less-than-or-equal to the clock of 2
Calling get() and put() (1)All nodes are able to accept get and put for all keys
Two strategies for selecting a node◦ Generic load balancer – separate node for forwarding requests to the right position in the ring
◦ Partition-aware client library - every node on the ring forwards a request to the right node
A read/write is successful when a certain number of nodes has responded◦ 𝑊 is the total number of nodes that must accept a write
◦ 𝑅 is the total number of nodes that must respond before responding to a read. If several object versions are in the response, they are all returned to the caller
Implemented in Java
A coordinator handles read and write requests◦ Related to (the previously mentioned) preference list
“Typical” (N,R,W) values are (3,2,2)
Some key takeawaysUse an internal buffer
Split the nodes evenly out on the Dynamo ring -> Removes the need of a load balancer / coordinator
Divergent versions are not a problem in practice
Give priority to read / write requests
Each application can (and should!) fine tune (N,W,R) setting
Dynamo has been very successful