building distributed, wide-area applications with wheelfs jeremy stribling, emil sit, frans...

30
Building Distributed, Wide-Area Applications with WheelFS Jeremy Stribling, Emil Sit, Frans Kaashoek, Jinyang Li, and Robert Morris MIT CSAIL and NYU

Upload: mckenna-carl

Post on 30-Mar-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Building Distributed, Wide-Area Applications with WheelFS Jeremy Stribling, Emil Sit, Frans Kaashoek, Jinyang Li, and Robert Morris MIT CSAIL and NYU

Building Distributed, Wide-Area Applications with

WheelFS

Jeremy Stribling, Emil Sit, Frans Kaashoek, Jinyang Li, and Robert Morris

MIT CSAIL and NYU

Page 2: Building Distributed, Wide-Area Applications with WheelFS Jeremy Stribling, Emil Sit, Frans Kaashoek, Jinyang Li, and Robert Morris MIT CSAIL and NYU

2

Grid Computations Share Data

Nodes in a distributed computation share:– Program binaries– Initial input data– Processed output from one node as

intermediary input to another node

Page 3: Building Distributed, Wide-Area Applications with WheelFS Jeremy Stribling, Emil Sit, Frans Kaashoek, Jinyang Li, and Robert Morris MIT CSAIL and NYU

3

So Do Users and Distributed Apps

• Shared home directory for testbeds (e.g., PlanetLab, RON)

• Distributed apps reinvent the wheel:– Distributed digital research library– Wide-area measurement experiments– Cooperative web cache

• Can we invent a shared data layer once?

Page 4: Building Distributed, Wide-Area Applications with WheelFS Jeremy Stribling, Emil Sit, Frans Kaashoek, Jinyang Li, and Robert Morris MIT CSAIL and NYU

4

Our Goal• Distributed file system for testbeds/Grids

• App can share data between nodes

• Users can easily access data

• Simple-to-build distributed apps

NodeNodeNode

NodeNode Node

Filefoo

Testbed/Grid

Filefoo

Filefoo

Page 5: Building Distributed, Wide-Area Applications with WheelFS Jeremy Stribling, Emil Sit, Frans Kaashoek, Jinyang Li, and Robert Morris MIT CSAIL and NYU

5

Current Solutions

Usual drawbacks:– All data flows through one node – File systems are too transparent

• Mask failures• Incur long delays

Node NodeNode

NodeNode Node

Testbed/Grid

CentralFile Server

Copyfoo File

foo

Page 6: Building Distributed, Wide-Area Applications with WheelFS Jeremy Stribling, Emil Sit, Frans Kaashoek, Jinyang Li, and Robert Morris MIT CSAIL and NYU

6

Our Proposal: WheelFS

• A decentralized, wide-area FS

• Main contributions:

1) Provide good performance according to Read Globally, Write Locally

2) Give apps control with semantic cues

Page 7: Building Distributed, Wide-Area Applications with WheelFS Jeremy Stribling, Emil Sit, Frans Kaashoek, Jinyang Li, and Robert Morris MIT CSAIL and NYU

7

Talk Outline

1. How to decentralize your file system

2. How to control your files

Page 8: Building Distributed, Wide-Area Applications with WheelFS Jeremy Stribling, Emil Sit, Frans Kaashoek, Jinyang Li, and Robert Morris MIT CSAIL and NYU

8

What Does a File System Buy You?

• A familiar interface

• Language-independent usage model

• Hierarchical namespace useful for apps

• Quick-prototyping for apps

Page 9: Building Distributed, Wide-Area Applications with WheelFS Jeremy Stribling, Emil Sit, Frans Kaashoek, Jinyang Li, and Robert Morris MIT CSAIL and NYU

9

File Systems 101

• File system (FS) API:– Open <filename> <file_id>– {Close/Read/Write} <file_id>

• Directories translate file names to IDs

App 1 App 2

Operating System File System

API call

Localhard disk

Node

Page 10: Building Distributed, Wide-Area Applications with WheelFS Jeremy Stribling, Emil Sit, Frans Kaashoek, Jinyang Li, and Robert Morris MIT CSAIL and NYU

10

Distributed File Systems

App 1 App 2

Operating System File System

API call

Localhard disk

Node

Node Node Node Node Node

Testbed/Grid

File135

Dir 500:“foo” 135

Page 11: Building Distributed, Wide-Area Applications with WheelFS Jeremy Stribling, Emil Sit, Frans Kaashoek, Jinyang Li, and Robert Morris MIT CSAIL and NYU

11

Basic Design of WheelFS

Node653

Node076

Node150 Node

554

Node402

Node257

File 135?File135

135

135135

File135v2

File135v3

135v2

135v2

135v3

135v3

Consistency Servers

076 150257 402554 653

Page 12: Building Distributed, Wide-Area Applications with WheelFS Jeremy Stribling, Emil Sit, Frans Kaashoek, Jinyang Li, and Robert Morris MIT CSAIL and NYU

12

Read Globally, Write Locally

• Perform writes at local disk speeds

• Efficient bulk data transfer

• Avoid overloading nodes w/ popular files

Page 13: Building Distributed, Wide-Area Applications with WheelFS Jeremy Stribling, Emil Sit, Frans Kaashoek, Jinyang Li, and Robert Morris MIT CSAIL and NYU

13

Write Locally

Node653 Node

076

Node150

Node554

Node402

Node257

Createfoo/bar

1. Choose an ID

2. Create dir entry

3. Write local file

550

Dir209(foo)

File550(bar) bar = 550

Readfoo/bar

Page 14: Building Distributed, Wide-Area Applications with WheelFS Jeremy Stribling, Emil Sit, Frans Kaashoek, Jinyang Li, and Robert Morris MIT CSAIL and NYU

14

Read Globally

Node653 Node

076

Node150

Node554

Node402

Node257

Read file 135

File135

Cached135 Cached

135

076653

Chunk

Chunk

Cached135

1. Contact node

2. Receive list

3. Get chunks

076653

076554653

Chunk

Read file 135

File135

Page 15: Building Distributed, Wide-Area Applications with WheelFS Jeremy Stribling, Emil Sit, Frans Kaashoek, Jinyang Li, and Robert Morris MIT CSAIL and NYU

15

Example: BLAST

• DNA alignment tool run on Grids

• Copy separate DB portions and queries to many nodes

• Run separate computations

• Later fetch and combine results

Page 16: Building Distributed, Wide-Area Applications with WheelFS Jeremy Stribling, Emil Sit, Frans Kaashoek, Jinyang Li, and Robert Morris MIT CSAIL and NYU

16

Example: BLAST

• With WheelFS, however:– No explicit DB copying necessary– Efficient initial DB transfers– Automatic caching for reused DBs and queries

• Could be better since data is never updated

Page 17: Building Distributed, Wide-Area Applications with WheelFS Jeremy Stribling, Emil Sit, Frans Kaashoek, Jinyang Li, and Robert Morris MIT CSAIL and NYU

17

Example: Cooperative Web Cache

Collection of nodes that:– Serve redirected web requests– Fetch web content from original web servers– Cache web content and serve it directly– Find cached content on other CWC nodes

Page 18: Building Distributed, Wide-Area Applications with WheelFS Jeremy Stribling, Emil Sit, Frans Kaashoek, Jinyang Li, and Robert Morris MIT CSAIL and NYU

18

Example: Cooperative Web Cache

• Avoid hotspots

if [ -f /wfs/cwc/$URL ]; then if notexpired /wfs/cwc/$URL; then cat /wfs/cwc/$URL exit fifiwget $URL –O - | tee /wfs/cwc/$URL

Page 19: Building Distributed, Wide-Area Applications with WheelFS Jeremy Stribling, Emil Sit, Frans Kaashoek, Jinyang Li, and Robert Morris MIT CSAIL and NYU

19

if [ -f /wfs/cwc/$URL ]; then if notexpired /wfs/cwc/$URL; then cat /wfs/cwc/$URL exit fifiwget $URL –O - | tee /wfs/cwc/$URL

Example: Cooperative Web Cache

Node653 Node

076

Node150

Node554

Node402

Node257

File135

Cached135

Client $URL

“$URL”?135

135?135 = v1402

Chunk

Chunk

Chunk

Cached135

No!

$URL

File550

“$URL” == 550

Dir070

(/wfs/cwc)

Page 20: Building Distributed, Wide-Area Applications with WheelFS Jeremy Stribling, Emil Sit, Frans Kaashoek, Jinyang Li, and Robert Morris MIT CSAIL and NYU

20

Talk Outline

1. How to decentralize your file system

2. How to control your files

Page 21: Building Distributed, Wide-Area Applications with WheelFS Jeremy Stribling, Emil Sit, Frans Kaashoek, Jinyang Li, and Robert Morris MIT CSAIL and NYU

21

Example: Cooperative Web Cache

• Would rather fail and refetch than wait

• Perfect consistency isn’t crucial

if [ -f /wfs/cwc/$URL ]; then if notexpired /wfs/cwc/$URL; then cat /wfs/cwc/$URL exit fifiwget $URL –O - | tee /wfs/cwc/$URL

Page 22: Building Distributed, Wide-Area Applications with WheelFS Jeremy Stribling, Emil Sit, Frans Kaashoek, Jinyang Li, and Robert Morris MIT CSAIL and NYU

22

Explicit Semantic Cues

• Allow direct control over system behavior

• Meta-data that attach to files, dirs, or refs

• Apply recursively down dir tree

• Possible impl: intra-path component– /wfs/cwc/.cue/foo/bar

Page 23: Building Distributed, Wide-Area Applications with WheelFS Jeremy Stribling, Emil Sit, Frans Kaashoek, Jinyang Li, and Robert Morris MIT CSAIL and NYU

23

Semantic Cues: Writability• Applies to files

• WriteMany (default)

• WriteOnce Node653 Node

076

Node150

Node554

Node402

Node257

File 135?

File135

File135v2

File135v3

Cached135v3

Cached135

Page 24: Building Distributed, Wide-Area Applications with WheelFS Jeremy Stribling, Emil Sit, Frans Kaashoek, Jinyang Li, and Robert Morris MIT CSAIL and NYU

24

Semantic Cues: Freshness• Applies to file references

• LatestVersion (default)

• AnyVersion

• BestVersion

Node653 Node

076

Node150

Node554

Node402

Node257

File 135?

File135

Cached135

Page 25: Building Distributed, Wide-Area Applications with WheelFS Jeremy Stribling, Emil Sit, Frans Kaashoek, Jinyang Li, and Robert Morris MIT CSAIL and NYU

25

Semantic Cues: Write Consistency• Applies to files or directories

• Strict (default)

• Lax Node653 Node

076

Node150

Node554

Node402

Node257

WriteFile 135

File135

135

WriteFile 135

File135v2

135v2

Page 26: Building Distributed, Wide-Area Applications with WheelFS Jeremy Stribling, Emil Sit, Frans Kaashoek, Jinyang Li, and Robert Morris MIT CSAIL and NYU

26

Example: BLAST

• WriteOnce for all:– DB files– Query files– Result files

• Improves cachability of these files

Page 27: Building Distributed, Wide-Area Applications with WheelFS Jeremy Stribling, Emil Sit, Frans Kaashoek, Jinyang Li, and Robert Morris MIT CSAIL and NYU

27

Example: Cooperative Web Cache

• Reading an older version is ok:– cat /wfs/cwc/.maxtime=250,bestversion/foo

• Writing conflicting versions is ok:– wget http://foo > /wfs/cwc/.lax,writemany/foo

if [ -f /wfs/cwc/.maxtime=250,bestversion/$URL ]; then if notexpired /wfs/cwc/.maxtime=250,bestversion/$URL; then cat /wfs/cwc/.maxtime=250,bestversion/$URL exit fifiwget $URL –O - | tee /wfs/cwc/.lax,writemany/$URL

Page 28: Building Distributed, Wide-Area Applications with WheelFS Jeremy Stribling, Emil Sit, Frans Kaashoek, Jinyang Li, and Robert Morris MIT CSAIL and NYU

28

Discussion• Must break data up into files small enough

to fit on one disk

• Stuff we swept under the rug:– Security– Atomic renames across dirs– Unreferenced files

Page 29: Building Distributed, Wide-Area Applications with WheelFS Jeremy Stribling, Emil Sit, Frans Kaashoek, Jinyang Li, and Robert Morris MIT CSAIL and NYU

29

Related Work

• Every FS paper ever written

• Specifically:– Cluster FS: Farsite, GFS, xFS, Ceph– Wide-area FS: JetFile, CFS, Shark– Grid: LegionFS, GridFTP, IBP– POSIX I/O High Performance Computing

Extensions

Page 30: Building Distributed, Wide-Area Applications with WheelFS Jeremy Stribling, Emil Sit, Frans Kaashoek, Jinyang Li, and Robert Morris MIT CSAIL and NYU

30

Conclusion

• WheelFS: distributed storage layer for newly-written applications

• Performance by reading globally and writing locally

• Control through explicit semantic cues