dynamo: amazon’s highly available key-value store

24
DYNAMO: AMAZON’S HIGHLY AVAILABLE KEY-VALUE STORE Presenters : Pourya Aliabadi Boshra Ardallani Paria 1 Professor : Dr Sheykh Esmaili

Upload: ronna

Post on 05-Jan-2016

52 views

Category:

Documents


3 download

DESCRIPTION

Dynamo: Amazon’s Highly Available Key-value Store. Professor : Dr Sheykh Esmaili. Presenters: Pourya Aliabadi Boshra Ardallani Paria Rakhshani. Introduction. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Dynamo: Amazon’s Highly Available Key-value Store

1

DYNAMO: AMAZON’S HIGHLY AVAILABLE KEY-

VALUE STORE

Presenters: Pourya Aliabadi Boshra Ardallani

Paria Rakhshani

Professor : Dr Sheykh Esmaili

Page 2: Dynamo: Amazon’s Highly Available Key-value Store

2

INTRODUCTION Amazon runs a world-wide e-commerce

platform that serves tens of millions customers at peak times using tens of thousands of servers located in many data centers around the world

Reliability at massive scale is one of the biggest challenges we face at Amazon.com, one of the largest e-commerce operations in the world; even the slightest outage has significant financial consequences and impacts customer trust

Page 3: Dynamo: Amazon’s Highly Available Key-value Store

3

INTRODUCTION One of the lessons our organization has

learned from operating Amazon’s platform is that the reliability and scalability of a system is dependent on how its application state is managed

To meet the reliability and scaling needs, Amazon has developed a number of storage technologies, of which the Amazon Simple Storage Service (S3)

There are many services on Amazon’s platform that only need primary-key access to a data store

Page 4: Dynamo: Amazon’s Highly Available Key-value Store

4

SYSTEM ASSUMPTIONS AND REQUIREMENTS Query Model

Operations to a data item that is uniquely identified by a key

State is stored as binary objects No operations span multiple data items Dynamo targets applications that need to store

objects that are relatively small (less than 1 MB)

Page 5: Dynamo: Amazon’s Highly Available Key-value Store

5

SYSTEM ASSUMPTIONS AND REQUIREMENTS ACID Properties

ACID (Atomicity, Consistency, Isolation, Durability)

ACID is a set of properties that guarantee that database transactions are processed reliably

Dynamo targets applications that operate with weaker consistency

Dynamo does not provide any isolation guarantees and permits only single key updates

Page 6: Dynamo: Amazon’s Highly Available Key-value Store

6

SYSTEM ASSUMPTIONS AND REQUIREMENTS Efficiency

The system needs to function on a commodity hardware infrastructure

Services must be able to configure Dynamo such that they consistently achieve their latency and throughput requirements.

The tradeoffs are in performance, cost efficiency, availability, and durability guarantees.

Page 7: Dynamo: Amazon’s Highly Available Key-value Store

7

SYSTEM ASSUMPTIONS AND REQUIREMENTS

Dynamo is used only by Amazon’s internal services

We will discuss the scalability limitations of Dynamo and possible scalability related extensions

Page 8: Dynamo: Amazon’s Highly Available Key-value Store

8

SERVICE LEVEL AGREEMENTS (SLA) To guarantee that the application can deliver its

functionality in a bounded time, each and every dependency in the platform needs to deliver its functionality with even tighter bounds

An example of a simple SLA is a service guaranteeing that it will provide a response within 300ms for 99.9% of its requests for a peak client load of 500 requests per second

For example a page request to one of the e-commerce sites typically requires the rendering engine to construct its response by sending requests to over 150 services

These services often have multiple dependencies

Page 9: Dynamo: Amazon’s Highly Available Key-value Store

9Figure shows an abstract view of the architecture of Amazon’s platform

Page 10: Dynamo: Amazon’s Highly Available Key-value Store

10

DESIGN CONSIDERATIONS Incremental scalability: Dynamo

should be able to scale out one storage host (henceforth, referred to as “node”) at a time, with minimal impact on both operators of the system and the system itself

Symmetry: Every node in Dynamo should have the same set of responsibilities as its peers; there should be no distinguished node or nodes that take special roles or extra set of responsibilities

Page 11: Dynamo: Amazon’s Highly Available Key-value Store

11

DESIGN CONSIDERATIONS Decentralization: An extension of symmetry,

the design should favor decentralized peer-to-peer techniques over centralized control. In the past, centralized control has resulted in outages and the goal is to avoid it as much as possible. This leads to a simpler, more scalable, and more available system.

Heterogeneity: The system needs to be able to exploit heterogeneity in the infrastructure it runs on. e.g. the work distribution must be proportional to the capabilities of the individual servers. This is essential in adding new nodes with higher capacity without having to upgrade all hosts at once.

Page 12: Dynamo: Amazon’s Highly Available Key-value Store

12

SYSTEM ARCHITECTURE

The Dynamo data storage system contains items that are associated with a single key

Operations that are implemented: get( ) and put( ) get(key): locates object with key and returns

object or list of objects with a context put(key, context, object): places an object at a

replica along with the key and context Context: metadata about object

Page 13: Dynamo: Amazon’s Highly Available Key-value Store

13

PARTITIONING

Provides mechanism to dynamically partition the data over the set of nodes

Use consistent hashing Similar to Chord

Each node gets an ID from the space of keys Nodes are arranged in a ring Data stored on the first node clockwise of the

current placement of the data key

Page 14: Dynamo: Amazon’s Highly Available Key-value Store

14

VIRTUAL NODE (single node) -> multiple points in the ring

i.e. virtual nodes

Advantages of virtual nodes: Graceful handling of failure of a node Easy accommodation of a new node Heterogeneity in physical infrastructure can be

exploited

Page 15: Dynamo: Amazon’s Highly Available Key-value Store

15

REPLICATION

Each data item replicated at N hosts N is configured per-instance Each node is responsible for the region of the

ring between it and its Nth predecessor Preference list: List of nodes responsible for

storing a particular key

Page 16: Dynamo: Amazon’s Highly Available Key-value Store

16

VERSIONING

Multiple versions of an object can be present in the system at same time

Vector clock is used for version control

Vector clock size issue

Page 17: Dynamo: Amazon’s Highly Available Key-value Store

17

EXECUTION OF GET() AND PUT() OPERATIONS

Operations can originate at any node in the system

Coordinator: node handing read or write operation

The coordinator contacts R nodes for reading and W nodes for writing, where R + W > N

Page 18: Dynamo: Amazon’s Highly Available Key-value Store

18

HANDLING FAILURES

Temporary failures: Hinted Handoff Mechanism to ensure that the read and write

operations are not failed due to temporary node or network failures.

Handling Permanent Failures: Replica Synchronization Synchronize with another node Use Merkle Trees

Page 19: Dynamo: Amazon’s Highly Available Key-value Store

19

MEMBERSHIP AND FAILURE DETECTION

Explicit mechanism available to initiate the addition and removal of nodes from a Dynamo ring

To prevent logical partitions, some Dynamo nodes play the role of seed nodes

Gossip-based distributed failure detection and membership protocol

Page 20: Dynamo: Amazon’s Highly Available Key-value Store

IMPLEMENTATION

20

Storage NodeStorage Node

Request Coordination

Request Coordination

Membership & Failure DetectionMembership &

Failure DetectionLocal Persistence

EngineLocal Persistence

Engine

Pluggable Storage Engines• Berkeley Database (BDB) Transactional Data Store• BDB Java Edition• MySQL•In-memory buffer with persistent backing store•Chosen based on application’s object size distribution

Pluggable Storage Engines• Berkeley Database (BDB) Transactional Data Store• BDB Java Edition• MySQL•In-memory buffer with persistent backing store•Chosen based on application’s object size distribution

• Built on top of event-driven messaging substrate

• Coordinator executes client read & write requests

• State machines created on nodes serving requests

• Built on top of event-driven messaging substrate

• Coordinator executes client read & write requests

• State machines created on nodes serving requests

• Each state machine instance handles exactly one client request

• State machine contains entire process and failure handling logic

• Each state machine instance handles exactly one client request

• State machine contains entire process and failure handling logic

Page 21: Dynamo: Amazon’s Highly Available Key-value Store

21

EXPERIENCES, RESULTS & LESSONS LEARNT

Main Dynamo Usage Patterns

1. Business logic specific reconciliation E.g. Merging different versions of a customer’s shopping cart

2. Timestamp based reconciliation E.g. Maintaining customer’s session information

3. High performance read engine E.g. Maintaining product catalog and promotional items

Client applications can tune parameters to achieve specific objectives: N: Performance {no. of hosts a data item is replicated at} R: Availability {min. no. of participating nodes in a successful read

opr} W: Durability {min. no. of participating nodes in a successful write

opr} Commonly used configuration (N,R,W) = (3,2,2)

Page 22: Dynamo: Amazon’s Highly Available Key-value Store

22

EXPERIENCES, RESULTS & LESSONS LEARNT

Balancing Performance and Durability

Average & 99.9th percentile latencies of

Dynamo’s read and write operations during

a period of 30 days

Comparison of performance of 99.9th

percentile latencies for buffered vs. non-buffered

writes over 24 hours

Page 23: Dynamo: Amazon’s Highly Available Key-value Store

23

CONCLUSION

Dynamo: Is a highly available and scalable data store Is used for storing state of a number of core services of

Amazon.com’s e-commerce platform Has provided desired levels of availability and performance

and has been successful in handling: Server failures Data center failures Network partitions

Is incrementally scalable Sacrifices consistency under certain failure scenarios Extensively uses object versioning Combination of decentralized techniques can be combined

to provide a single highly-available system.

Page 24: Dynamo: Amazon’s Highly Available Key-value Store

24

thanks