simplifying wide-area application development with wheelfs

38
Simplifying Wide-Area Application Development with WheelFS Jeremy Stribling In collaboration with Jinyang Li, Frans Kaashoek, Robert Morris MIT CSAIL & New York University

Upload: melvyn

Post on 09-Jan-2016

23 views

Category:

Documents


1 download

DESCRIPTION

Simplifying Wide-Area Application Development with WheelFS. Jeremy Stribling In collaboration with Jinyang Li, Frans Kaashoek, Robert Morris MIT CSAIL & New York University. PlanetLab. China grid. Google datacenters. Resources Spread over Wide-Area Net. planetlab. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Simplifying Wide-Area Application Development with WheelFS

Simplifying Wide-Area Application Development with

WheelFS

Jeremy Stribling

In collaboration with Jinyang Li, Frans Kaashoek, Robert Morris

MIT CSAIL & New York University

Page 2: Simplifying Wide-Area Application Development with WheelFS

2

Resources Spread over Wide-Area Net

PlanetLabGoogle datacenters

China grid

Page 3: Simplifying Wide-Area Application Development with WheelFS

3

Grid Computations Share Data

Nodes in a distributed computation share:– Program binaries– Initial input data– Processed output from one node as

intermediary input to another node

Page 4: Simplifying Wide-Area Application Development with WheelFS

4

So Do Users and Distributed Apps

• Apps aggregate disk/computing at hundreds of nodes

• Example apps– Content distribution networks (CDNs)– Backends for web services – Distributed digital research library

All applications need distributed storage

Page 5: Simplifying Wide-Area Application Development with WheelFS

5

State of the Art in Wide-Area Storage

• Existing wide-area file systems are inadequate– Designed only to store files for users– E.g., No hundreds of nodes can write files to the same dir– E.g., Strong consistency at the cost of availability

• Each app builds its own storage!– Distributed Hash Tables (DHTs)– ftp, scp, wget, etc.

Page 6: Simplifying Wide-Area Application Development with WheelFS

6

Current Solutions

Usual drawbacks:– All data flows through one node – File systems are too transparent

• Mask failures• Incur long delays

Node NodeNode

NodeNode Node

Testbed/Grid

CentralFile Server

Copyfoo File

foo

Page 7: Simplifying Wide-Area Application Development with WheelFS

7

If We Had a Good Wide-Area FS?

Wide-area FS

• FS makes building apps simpler

CDN

write(“/a/foo”)

CDN

write(“/a/bar”)

Filebar

Filefoo

read(“/a/foo”)

Page 8: Simplifying Wide-Area Application Development with WheelFS

8

Why Is It Hard?

• Apps care a lot about performance

• WAN is often the bottleneck– High latency, low bandwidth– Transient failures

• How to give app control without sacrificing ease of programmability?

Page 9: Simplifying Wide-Area Application Development with WheelFS

9

Our Contribution: WheelFS

• Suitable for wide-area apps

• Gives app control through cues over:– Consistency vs. availability tradeoffs– Data placement– Timing, reliability, etc.

• Prototype implementation

Page 10: Simplifying Wide-Area Application Development with WheelFS

10

Talk Outline

• Challenges & our approach

• Basic design

• Application control

• Running a Grid computation over WheelFS

Page 11: Simplifying Wide-Area Application Development with WheelFS

11

What Does a File System Buy You?

• Re-use existing software

• Simplify the construction of new applications– A hierarchical namespace– A familiar interface– Language-independent usage

Page 12: Simplifying Wide-Area Application Development with WheelFS

12

Why Is It Hard To Build a FS on WAN?

• High latency, low bandwidth – 100s ms instead of 1s ms latency– 10s Mbps instead of 1000s Mbps bandwidth

• Transient failures are common – 32 outages over 64 hours across 132 paths [Andersen’01]

Page 13: Simplifying Wide-Area Application Development with WheelFS

13

What If Grid Uses AFS over WAN?

GRID node(AFS client)

a.dat

Potentially unnecessary data transfer

a.dat

Blocks forever under failure despite

available cached copy

GRID node(AFS client)

GRID node(AFS client) AFS server

Page 14: Simplifying Wide-Area Application Development with WheelFS

14

Design Challenges

• High latency – Store data close to where it is needed

• Low wide-area bandwidth – Avoid wide-area communication if possible

• Transient failures are common – Cannot block all access during partial failures

Only applications have the needed information!

Page 15: Simplifying Wide-Area Application Development with WheelFS

15

WheelFS Gives Apps Control

AFS,NFS,

GFSWheelFS

Goal Total network

transparency

Application control

Can apps control how to handle failures? X

Can apps control data placement? X

Page 16: Simplifying Wide-Area Application Development with WheelFS

16

WheelFS: Main Ideas

• Apps control– Apps embed semantic cues to inform FS

about failure handling, data placement ...

• Good default policy– Write locally, strict consistency

Page 17: Simplifying Wide-Area Application Development with WheelFS

17

Talk Outline

• Challenges & our approach

• Basic design

• Application control

• Running a Grid computation over WheelFS

Page 18: Simplifying Wide-Area Application Development with WheelFS

18

File Systems 101• Basic FS operations:

– Name resolution: hierarchical name flat id

– Data operations: read/write file data

– Namespace operations: add/remove files or dirs

open(“/wfs/a/foo”, …) id: 1235

read(1235, …)

write(1235, …)

mkdir(“/wfs/b”, …)

Page 19: Simplifying Wide-Area Application Development with WheelFS

19

File Systems 101

a: 246 b: 357

id: 0

id: 135

file2: 468file3: 579

file1: 135

id: 357id: 246

Page 20: Simplifying Wide-Area Application Development with WheelFS

20

Distribute a FS across nodes

a: 246 b: 357

file2: 468file3: 579

file1: 135

id: 0

id: 357

id: 135

id: 246

NodeNode

Node

Node

NodeMust locate files/dirs using ids

Must automatically configurenode membership

Must keep files versioned to distinguish new and old

Page 21: Simplifying Wide-Area Application Development with WheelFS

21

Basic Design of WheelFS

Node653

Node076

Node150 Node

554

Node402

Node257

id135

135

135135

id135v2

id135v3

135v2

135v2

135v3

135v3

Consistency Servers

076 150257 402554 653

Page 22: Simplifying Wide-Area Application Development with WheelFS

22

Default Behavior:Write Locally, Strict

Consistency• Write locally: Store newly created files at the

writing node–Writes are fast –Transfer data lazily for reads when necessary

• Strict consistency: data behaves as in a local FS– Once new data is written and the file is closed, the next open will see the new data

Page 23: Simplifying Wide-Area Application Development with WheelFS

23

Write Locally

Node653 Node

076

Node150

Node554

Node402

Node257

Createfoo/bar

1. Choose an ID

2. Create dir entry

3. Write local file

550

Dir209(foo)

File550(bar) bar = 550

Readfoo/bar

Page 24: Simplifying Wide-Area Application Development with WheelFS

24

Strict Consistency• All writes go to the same node

• All reads do tooNode653 Node

076

Node150

Node554

Node402

Node257

WriteFile 135

File135

WriteFile 135

File135v2

File 135?

Page 25: Simplifying Wide-Area Application Development with WheelFS

25

Talk Outline

• Challenges & our approach

• Basic design

• Application control

• Running a Grid computation over WheelFS

Page 26: Simplifying Wide-Area Application Development with WheelFS

26

WheelFS Gives Apps Control with Cues

• Apps want to control consistency, data placement ...• How? Embed cues in path names

– Flexible and minimal interface change

Coarse-grained:Cues apply recursively over an entire subtree of files

/wfs/cache/.cue/a/b/ /wfs/cache/a/b/.cue/foo

Fine-grained:Cues can apply to a single file

Page 27: Simplifying Wide-Area Application Development with WheelFS

27

Eventual Consistency: Reads

File 135?

Node653 Node

076

Node150

Node554

Node402

Node257

File135

Cached135

• Read any version of the file you can find• In a given time limit

Page 28: Simplifying Wide-Area Application Development with WheelFS

28

Eventual Consistency: Writes• Write to any replica of the file

Node653 Node

076

Node150

Node554

Node402

Node257

WriteFile 135

File135

135

WriteFile 135

File135v2

135v2

Page 29: Simplifying Wide-Area Application Development with WheelFS

29

Handle Read Hotspots

Node653 Node

076

Node150

Node554

Node402

Node257

Read file 135

File135

Cached135 Cached

135

076653

Chunk

Chunk

Cached135

1. Contact node

2. Receive list

3. Get chunks

076653

076554653

Chunk

Read file 135

File135

Page 30: Simplifying Wide-Area Application Development with WheelFS

30

WheelFS Cues

Name Purpose

Eventual-Consistency

Control whether reads

must see fresh data, and whether writes must be serialized

MaxTime= Specify time limit for operations

Site=, Node= Hint which node or group of nodes a file should be stored

Controlconsistency

Hint about data placement

Other types of cues: Durability, system information, etc.

HotSpot This file will be read simultaneously by many nodes, so use p2p caching

Large reads

Page 31: Simplifying Wide-Area Application Development with WheelFS

31

Example Use of Cues: Content Distribution Networks

• CDNs prefer availability over consistency

wfsnode

wfsnode wfs

node

wfsnode

ApacheCaching

Proxy

ApacheCaching

Proxy

ApacheCaching

Proxy

ApacheCaching

Proxy If $url exists in cache dir read $url from WheelFSelse get page from web server store page in WheelFS

One line change in Apache config file:/wfs/cache/$URL

blocks under failure with default strong consistency

Page 32: Simplifying Wide-Area Application Development with WheelFS

32

Example Use of Cues: CDN

• Apache proxy handles potentially stale files well – The freshness of cached web pages can be determined from saved HTTP headers

Cache dir: /wfs/cache/ .EventualConsistency

Tells WheelFS to read a cached file even when

the corresponding file server cannot be contacted

Tells WheelFS to write the file data anywhere even when the corresponding file server cannot be contacted

/.HotSpot

Tells WheelFS to read data from the nearest client cache it can find

Page 33: Simplifying Wide-Area Application Development with WheelFS

33

Example Use of Cues:BLAST Grid Computation

• DNA alignment tool run on Grids

• Copy separate DB portions and queries to many nodes

• Run separate computations

• Later fetch and combine results

• Read binary using .HotSpot

• Write output using .EventualConsistency

Page 34: Simplifying Wide-Area Application Development with WheelFS

34

Talk Outline

• Challenges & our approach

• Basic design

• Application control

• Running a Grid computation over WheelFS

Page 35: Simplifying Wide-Area Application Development with WheelFS

35

Experiment Setup

• Up to 16 nodes run WheelFS on Emulab– 100Mbps access links– 12ms delay– 3 GHz CPUs

• “nr” protein database (673 MB), 16 partitions

• 19 queries sent to all nodes

Page 36: Simplifying Wide-Area Application Development with WheelFS

36

BLAST Achieves Near-Ideal Speedup on WheelFS

0

2

4

6

8

10

12

14

16

0 2 4 6 8 10 12 14 16

BLAST on WheelFS

Ideal speedup

Number of nodes

Sp

eed

up

Page 37: Simplifying Wide-Area Application Development with WheelFS

37

Related Work

• Cluster FS: Farsite, GFS, xFS, Ceph

• Wide-area FS: JetFile, CFS, Shark

• Grid: LegionFS, GridFTP, IBP

• POSIX I/O High Performance Computing Extensions

Page 38: Simplifying Wide-Area Application Development with WheelFS

38

Conclusion

• A WAN FS simplifies app construction

• FS must let app control data placement & consistency

• WheelFS exposes such control via cues

Building appsis easy with WheelFS