fawn - a fast array of wimpy nodesiwanicki/courses/ds/2010/... · 2011. 1. 13. · tomasz dubrownik...
Post on 27-Feb-2021
8 Views
Preview:
TRANSCRIPT
IntroductionDesign and Architecture
FAWN-DSFAWN-KVEvaluation
FAWN - a Fast Array of Wimpy Nodes
Tomasz Dubrownik
University of Warsaw
January 12, 2011
Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes
IntroductionDesign and Architecture
FAWN-DSFAWN-KVEvaluation
Outline
1 Introduction
2 Design and Architecture
3 FAWN-DS
4 FAWN-KV
5 Evaluation
Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes
IntroductionDesign and Architecture
FAWN-DSFAWN-KVEvaluation
Key issues
Growing CPU vs. I/O gap
Contemporary systems must serve millions of users
Electricity consumed adds up to significant costs
Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes
IntroductionDesign and Architecture
FAWN-DSFAWN-KVEvaluation
Key issues
Is there a way to exploit the CPU vs. I/O gap to the users’advantage?
Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes
IntroductionDesign and Architecture
FAWN-DSFAWN-KVEvaluation
Observations
Many industry problems exhibit massive data parallelism withrelatively small computational demands
A fair amount of real-life problems heavily depends onefficient, distributed key-value stores that span severalgigabytes
Such stores often contain millions of small items (on the orderof kilobytes)
Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes
IntroductionDesign and Architecture
FAWN-DSFAWN-KVEvaluation
A motivating example
A wonderfully popular service, Twitter has all the above-mentionedproperties. Each tweet is limited to 140B. There is fairly littleprocessing performed on the tweets, yet just the search system isstressed by an average of 12000 queries per second. There is astream of over a thousand tweets per second entering the system.A high-performance key-value store is crucial to the operation. Atthe same time the cost of running a conventional cluster capable ofmeeting this demand is extremely high.
Disclaimer
To my knowledge, FAWN is not being used in Twitter. But itwould probably make a lot of sense if it were. Thank you.
Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes
IntroductionDesign and Architecture
FAWN-DSFAWN-KVEvaluation
The problem, defined
To engineer a fast, scalable key-value store for small (hundreds tothousands of bytes) itemsThis store is expected to:
respond to upwards from thousands of random queries persecond (QPS)
conserve power as much as possible
meet service level agreements regarding latency
scale well upwards as the system grows
scale well downwards as demand fluctuates during operatinghours
Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes
IntroductionDesign and Architecture
FAWN-DSFAWN-KVEvaluation
Possible solutions (1)
A cluster of traditional servers with HDD as storage.Problems:
very poor performance for random accesses, unless RAID or asimilar disk array is used
if RAID is to be used, both initial price and total cost ofownership skyrocket
most of the power consumption is fixed — not much power isconserved during low load periods
Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes
IntroductionDesign and Architecture
FAWN-DSFAWN-KVEvaluation
Possible solutions (2)
A cluster of traditional servers with RAM as storage (thinkmemcached)Problems:
very high cost in terms of $/GB
robustness is lost unless additional systems are employed
power consumption is just as bad as before
Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes
IntroductionDesign and Architecture
FAWN-DSFAWN-KVEvaluation
Possible solutions (3)
A cluster of traditional servers with SSD as storageProblems:
while random reads are great, random writes are terrible(BerkleyDB running on SSD averages just 0.07MBps)
power consumption is just as bad as before
Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes
IntroductionDesign and Architecture
FAWN-DSFAWN-KVEvaluation
Possible solutions (4)
A combination of the aboveProblems:
a combination of the above :)
Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes
IntroductionDesign and Architecture
FAWN-DSFAWN-KVEvaluation
Introducing FAWN
A slightly different approach:
Let’s use energy-efficient, wimpy processors coupled with fastSSD storage.
Design a custom key-value store exploiting the characteristicsof flash storage.
That way power consumption can be kept to a minimumwhile retaining high performance and robustness.
The resulting system has a lower total cost of ownership andgood scalability.
Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes
IntroductionDesign and Architecture
FAWN-DSFAWN-KVEvaluation
Outline
1 Introduction
2 Design and Architecture
3 FAWN-DS
4 FAWN-KV
5 Evaluation
Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes
IntroductionDesign and Architecture
FAWN-DSFAWN-KVEvaluation
Anatomy of a key-value data store
A request can be either a get, put or delete
Keys are 160-bit integers
Values are small blobs (typically between 256B and 1KB)
Each request pertains to a single key-value pair — there is norelational overlay at this level
Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes
IntroductionDesign and Architecture
FAWN-DSFAWN-KVEvaluation
Overview
Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes
IntroductionDesign and Architecture
FAWN-DSFAWN-KVEvaluation
Overview
The cluster is composed of Front-ends and Back-ends
Front-ends forward requests to appropriate back-ends andreturn responses to clients
The front-ends are responsible for maintaining order in thecluster
Back-ends run the FAWN-DS datastores (one per key-range)
Together the machines form a single FAWN-KV key-valuestore
Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes
IntroductionDesign and Architecture
FAWN-DSFAWN-KVEvaluation
Front-end
Responsibilities:
passing requests and responses
keeping track of back-ends’ Virtual IDs and their mapping tokey ranges
managing joins and leaves.
Example configuration used for evaluation:
Intel Atom CPU (27 W)
Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes
IntroductionDesign and Architecture
FAWN-DSFAWN-KVEvaluation
Back-end
A back-end runs one FAWN-DS data store per key range.Each data store supports the basic key-value requests, as well asmaintance operations (Split, Merge, Compact)Example configuration used for evaluation:
AMD Geode LX CPU (500MHz)
256MB DDR SDRAM (400MHz)
100Mbps Ethernet
Sandisk Extreme IV CompactFlash (4GB)
Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes
IntroductionDesign and Architecture
FAWN-DSFAWN-KVEvaluation
Back-ends, cont.
Back-ends are organized in a logical ring which coincides withthe key space (mod 2160)
Each back-end is assigned a fixed number of Virtual IDs inhopes of maintaining balance
Virtual IDs are the lowest keys a node handles
This allows for a well-defined successor relation on keys andvirtual nodes
Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes
IntroductionDesign and Architecture
FAWN-DSFAWN-KVEvaluation
Outline
1 Introduction
2 Design and Architecture
3 FAWN-DS
4 FAWN-KV
5 Evaluation
Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes
IntroductionDesign and Architecture
FAWN-DSFAWN-KVEvaluation
Peculiarities of flash storage
Flash media differ from traditional HDDs in a number of ways,some of which seriously impact persistent data store designs.
Random reads are nearly as fast as sequential reads
Random writes are very inefficient (owing to the fact that awhole page needs to be flashed)
Sequential writes perform admirably
On modern devices, semi-random writes (random appends toa small number of files) are nearly as fast as sequential writes
These features can be exploited by using a log-structured datastore.
Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes
IntroductionDesign and Architecture
FAWN-DSFAWN-KVEvaluation
FAWN-DS
To take advantage of the properties of flash storage, FAWN-DS isstructured as follows:
The key-value mappings are stored in a Data Log on the flashmedium. This store is append-only.
To provide fast random access, a hash index map into the datalog is kept in RAM. In order to reduce the memory footprint,keys are reduced, inflicting as a trade-off a (configurable)chance of necessitating more than one flash access.
To reclaim unused storage space, a Compact operation isintroduced. It is designed to be as efficient as possible onflash, using only bulk sequential writes.
In order to facilitate reconstruction of the in-memory index,checkpointing is utilized.
Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes
IntroductionDesign and Architecture
FAWN-DSFAWN-KVEvaluation
Lookup
Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes
IntroductionDesign and Architecture
FAWN-DSFAWN-KVEvaluation
Lookup cont.
Two smaller numbers are extracted from the key:
The index bits — the lowest i bitskey fragment — the next lowest k bits
The index bits serve as an index into the first in-memory hashindex.
If the bucket pointed to by the index bits is valid and the keyfragments match, the data log entry is retrieved and the fullkeys compared.
If keys match, the record is returned, otherwise the nextbucket in the hash chain is examined as above.
If nothing is found, an appropriate response is generated.
Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes
IntroductionDesign and Architecture
FAWN-DSFAWN-KVEvaluation
Lookup, now in pseudocode!
Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes
IntroductionDesign and Architecture
FAWN-DSFAWN-KVEvaluation
Store and Delete
When a value is inserted into the store, it is simply appended tothe data log and the corresponding bucket are changed to point tothe new record. The valid bit is set to true.When a record is to be deleted, a delete entry is appended to thelog (for fault-tolerance) and the valid bit in the correspondingbucket is set to false.Actual storage space is not reclaimed until a Compact is performed.
Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes
IntroductionDesign and Architecture
FAWN-DSFAWN-KVEvaluation
Maintenance operations
Split is issued when the key range is divided as a new virtualnode joins the ring. It scans the data log sequentially andwrites out the appropriate entries into a new one.
Merge is responsible for merging two data stores into one,encompassing the combined key range. It achieves this bycopying entries from one log into the other.
Compact copies the valid data store entries into a new log,skipping those that have been orphaned by puts and thosethat were actively deleted.
Owing to the append-only design it is possible to perform theseoperations concurrently with normal requests, only locking toswitch data stores while finalizing maintenance.
Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes
IntroductionDesign and Architecture
FAWN-DSFAWN-KVEvaluation
Outline
1 Introduction
2 Design and Architecture
3 FAWN-DS
4 FAWN-KV
5 Evaluation
Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes
IntroductionDesign and Architecture
FAWN-DSFAWN-KVEvaluation
In order to provide a robust, scalable service the back-ends runningFAWN-DS instances are joined together and managed by front-endnodes, which in turn in industry applications would be connectedto a master node.
Fault-tolerance is introduced via replication
Each front-end is ideally responsible for some 80 back-endsand manages joins and leaves, exposing a simple put, get,delete interface
Additionally, front-ends can route requests betweenthemselves and cache responses, leaving the master node asan optimization and a convenience without leaving it a singlepoint of failure
Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes
IntroductionDesign and Architecture
FAWN-DSFAWN-KVEvaluation
Life-cycle of a request
Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes
IntroductionDesign and Architecture
FAWN-DSFAWN-KVEvaluation
Life-cycle of a request, elaborated
Each front-end is assigned a contiguous portion of the keyspace
Upon receiving a request it either processes it using itsmanaged back-ends or forwards it if the key belongs to adifferent front-end
Front-ends maintain a list of virtual nodes and theircorresponding addresses, and thus can instantly translate therequest to the appropriate FAWN-DS calls
While the request is processed by back-ends, the front-endensures replication is maintained
Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes
IntroductionDesign and Architecture
FAWN-DSFAWN-KVEvaluation
Replication in Chains
Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes
IntroductionDesign and Architecture
FAWN-DSFAWN-KVEvaluation
Replication in Chains, cont.
Each key defines a chain in the virtual node ring
A fixed number of nodes maintains copies of the mapping
The nodes are obtained by iterating the successor function ofthe key
The first node that contains a replica is the head of the chain
The last node is the tail
Every put request is issued to the head of the chain and waits foran acknowledgement from the tail. Every get is passed to the tail.This ensures consistency and proper ordering of changesthroughout the change.
Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes
IntroductionDesign and Architecture
FAWN-DSFAWN-KVEvaluation
Replication of a put
After receiving the put request, the head forwards the putalong the chain and waits for an acknowledgement.
If all goes well, the tail acknowledges both to the front-endand recursively to its predecessor.
Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes
IntroductionDesign and Architecture
FAWN-DSFAWN-KVEvaluation
How a join is handled
When a (virtual) node joins the FAWN-KV ring precisely one keyrange is split in two. To maintain replication the followinghappens:
The current tail transmits its whole log to the new node(pre-copy)
The front-end informs the nodes in the chain of the join via achain membership message
In response to said message, nodes flush updates receivedduring pre-copy down the chain
Please refer to the paper for details on how updates arriving duringthe flush are handled, as well as the special cases of joining as heador tail.
Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes
IntroductionDesign and Architecture
FAWN-DSFAWN-KVEvaluation
What happens when a node leaves
When a node leaves the ring, each node that is supposed to takeover the replicas in essence joins the replica chain at a differentposition in the key space, so the protocol is essentially the same asfor a join.At this stage failure detection is achieved by a heartbeat. If a nodemisses a set number of heartbeat signals, the front-end initiates aleave and appropriate action is taken.
Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes
IntroductionDesign and Architecture
FAWN-DSFAWN-KVEvaluation
Outline
1 Introduction
2 Design and Architecture
3 FAWN-DS
4 FAWN-KV
5 Evaluation
Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes
IntroductionDesign and Architecture
FAWN-DSFAWN-KVEvaluation
Procedure description
FAWN’s performance was evaluated under a number of criteria:
Single node efficiency (compared to baseline hardwarecapabilities)
Cluster performance (tested on a 21 back-end/1 front-endsystem)
Energy efficiency
The results were then compared with a number of more traditionalconfigurations.
Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes
IntroductionDesign and Architecture
FAWN-DSFAWN-KVEvaluation
Single node performance
Baseline:Seq. read Rand. read Seq. write Rand. write
28.5 MBps 1424 QPS 24 MBps 110 QPS
FAWN:Data size Rand read (1KB) Rand read (256B)
125MB 51968 QPS 65412 QPS1GB 1595 QPS 1964 QPS
3.5GB 1150 QPS 1298 QPS
Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes
IntroductionDesign and Architecture
FAWN-DSFAWN-KVEvaluation
Gets vs Puts
Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes
IntroductionDesign and Architecture
FAWN-DSFAWN-KVEvaluation
Cluster — performance and power consumption
Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes
IntroductionDesign and Architecture
FAWN-DSFAWN-KVEvaluation
Important points on power consumption
The plot displayed does not take into account the front-end(further 27W)
The networking hardware used takes 20W to operate(included in the plotted figure)
Even factoring in the front-end, the system achieved 330queries per Joule. A desktop computer can provide about 50Q/J using SSD.
Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes
IntroductionDesign and Architecture
FAWN-DSFAWN-KVEvaluation
CDF of Query Latency
Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes
IntroductionDesign and Architecture
FAWN-DSFAWN-KVEvaluation
Comparison with alternative approaches (projected)
Important point
The FAWN entries in this table are expected performancemeasurements of systems built using state of the art components.
Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes
IntroductionDesign and Architecture
FAWN-DSFAWN-KVEvaluation
Solution space for system builders (projected)
Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes
IntroductionDesign and Architecture
FAWN-DSFAWN-KVEvaluation
Conclusions
FAWN is demonstrated to be a viable approach to providingcost-efficient data stores
Using wimpy processors in an array can reduce powerconsumption while retaining performance
Barring breakthrough discoveries, FAWN-like technologies areexpected to deliver the lowest TCO for a large portion of theproblem space
Larger scale testing is necessary to establish the correctness ofthese claims and to demonstrate scalability
Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes
IntroductionDesign and Architecture
FAWN-DSFAWN-KVEvaluation
References
[FAWN] D. G. Andersen, J. Franklin, M. Kaminsky, A.Phanishayee, L. Tan, and V. VasudevanFAWN: A Fast Array of Wimpy NodesProceedings ACM SOSP 2009, Big Sky, MT, USA, October2009.
All images are taken from the FAWN paper.
Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes
top related