fawn - a fast array of wimpy nodesiwanicki/courses/ds/2010/... · 2011. 1. 13. · tomasz dubrownik...

IntroductionDesign and Architecture

FAWN-DSFAWN-KVEvaluation

FAWN - a Fast Array of Wimpy Nodes

Tomasz Dubrownik

University of Warsaw

January 12, 2011

Tomasz Dubrownik FAWN - a Fast Array of Wimpy Nodes

Outline

1 Introduction

2 Design and Architecture

3 FAWN-DS

4 FAWN-KV

5 Evaluation

Key issues

Growing CPU vs. I/O gap

Contemporary systems must serve millions of users

Electricity consumed adds up to significant costs

Key issues

Is there a way to exploit the CPU vs. I/O gap to the users’advantage?

Observations

Many industry problems exhibit massive data parallelism withrelatively small computational demands

A fair amount of real-life problems heavily depends onefficient, distributed key-value stores that span severalgigabytes

Such stores often contain millions of small items (on the orderof kilobytes)

A motivating example

Twitter

A wonderfully popular service, Twitter has all the above-mentionedproperties. Each tweet is limited to 140B. There is fairly littleprocessing performed on the tweets, yet just the search system isstressed by an average of 12000 queries per second. There is astream of over a thousand tweets per second entering the system.A high-performance key-value store is crucial to the operation. Atthe same time the cost of running a conventional cluster capable ofmeeting this demand is extremely high.

Disclaimer

To my knowledge, FAWN is not being used in Twitter. But itwould probably make a lot of sense if it were. Thank you.

The problem, defined

To engineer a fast, scalable key-value store for small (hundreds tothousands of bytes) itemsThis store is expected to:

respond to upwards from thousands of random queries persecond (QPS)

conserve power as much as possible

meet service level agreements regarding latency

scale well upwards as the system grows

scale well downwards as demand fluctuates during operatinghours

Possible solutions (1)

A cluster of traditional servers with HDD as storage.Problems:

very poor performance for random accesses, unless RAID or asimilar disk array is used

if RAID is to be used, both initial price and total cost ofownership skyrocket

most of the power consumption is fixed — not much power isconserved during low load periods

A cluster of traditional servers with RAM as storage (thinkmemcached)Problems:

very high cost in terms of $/GB

robustness is lost unless additional systems are employed

power consumption is just as bad as before

A cluster of traditional servers with SSD as storageProblems:

while random reads are great, random writes are terrible(BerkleyDB running on SSD averages just 0.07MBps)

power consumption is just as bad as before

A combination of the aboveProblems:

a combination of the above :)

Introducing FAWN

A slightly different approach:

Let’s use energy-efficient, wimpy processors coupled with fastSSD storage.

Design a custom key-value store exploiting the characteristicsof flash storage.

That way power consumption can be kept to a minimumwhile retaining high performance and robustness.

The resulting system has a lower total cost of ownership andgood scalability.

Outline

1 Introduction

3 FAWN-DS

4 FAWN-KV

5 Evaluation

Anatomy of a key-value data store

A request can be either a get, put or delete

Keys are 160-bit integers

Values are small blobs (typically between 256B and 1KB)

Each request pertains to a single key-value pair — there is norelational overlay at this level

Overview

The cluster is composed of Front-ends and Back-ends

Front-ends forward requests to appropriate back-ends andreturn responses to clients

The front-ends are responsible for maintaining order in thecluster

Back-ends run the FAWN-DS datastores (one per key-range)

Together the machines form a single FAWN-KV key-valuestore

Front-end

Responsibilities:

passing requests and responses

keeping track of back-ends’ Virtual IDs and their mapping tokey ranges

managing joins and leaves.

Example configuration used for evaluation:

Intel Atom CPU (27 W)

Back-end

A back-end runs one FAWN-DS data store per key range.Each data store supports the basic key-value requests, as well asmaintance operations (Split, Merge, Compact)Example configuration used for evaluation:

AMD Geode LX CPU (500MHz)

256MB DDR SDRAM (400MHz)

100Mbps Ethernet

Sandisk Extreme IV CompactFlash (4GB)

Back-ends, cont.

Back-ends are organized in a logical ring which coincides withthe key space (mod 2160)

Each back-end is assigned a fixed number of Virtual IDs inhopes of maintaining balance

Virtual IDs are the lowest keys a node handles

This allows for a well-defined successor relation on keys andvirtual nodes

fawn - a fast array of wimpy nodesiwanicki/courses/ds/2010/... · 2011. 1. 13. · tomasz dubrownik...

Documents

wimpy kid list

fawn shoppe gift guide

fawn river township st. joseph county, michigan fawn … ·...

prince coat in fawn copy

garlock gylon fawn 3500 data sheet

fawn: a fast array of wimpy nodes - cornell university

discussion guide - diary of a wimpy · pdf filediary of a...

fawn a fast array of wimpy nodes* bogdan eremia, scpd *by...

diary of a wimpy kid 1 · diary of a wimpy kid: rodrick...

written by jeff kinney - wordpress.com · 2018-05-14 ·...

fawn...

fawn: fast array of wimpy nodes a technical paper...

03 fawn and thistle catalogue

fawn woe rcs presentation

fawn article

fawn: a fast array of wimpy nodes - school of computer...

the fawn lakes press - senior pub · fawn lakes press...

wimpy kid · 2019. 9. 27. · wimpy kid / ΕΚΔΟΣΕΙΣ...

double down (diary of a wimpy kid book 11)...diary of a...

diary wimpy