group a5-3 rd paper presentation network file system designed for low-bandwidth networks

46
Group A5-3 rd paper presentation Network File System designed for low-bandwidth networks Group Members: Daniel Saenz Gilbert Rahme Sandeep george Mohan

Upload: carys

Post on 12-Jan-2016

35 views

Category:

Documents


0 download

DESCRIPTION

Group A5-3 rd paper presentation Network File System designed for low-bandwidth networks. Group Members: Daniel Saenz Gilbert Rahme Sandeep george Mohan. Presentation Outline. Introduction Design Indexing Protocol Implementation & Evaluation References. Introduction. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Group A5-3 rd  paper presentation Network File System designed for low-bandwidth networks

Group A5-3rd paper presentationNetwork File System designed for low-bandwidth networks Group A5-3rd paper presentationNetwork File System designed for low-bandwidth networks

Group Members:

Daniel Saenz

Gilbert Rahme

Sandeep george Mohan

Page 2: Group A5-3 rd  paper presentation Network File System designed for low-bandwidth networks

Presentation OutlinePresentation Outline

Introduction Design

IndexingProtocol

Implementation & Evaluation References

Page 3: Group A5-3 rd  paper presentation Network File System designed for low-bandwidth networks

IntroductionIntroduction Exploits similarities between files or versions of

the same file. Avoids sending redundant data over the

network. Can be used in conjunction with conventional

compression and caching. Focuses on reducing bandwidth without

changing accepted consistency guarantees.

Page 4: Group A5-3 rd  paper presentation Network File System designed for low-bandwidth networks

Exploiting cross-file similaritiesExploiting cross-file similarities At the server, files are stored in

chunks, which are indexed by hash value.

The client similarly indexes a large persistent file cache.

Assumes clients will have enough cache to contain a user’s entire working set of files.

If possible, reconstructs files using chunks of existing data in the file system and client cache.

Index Table

FilesFile Chunks

Page 5: Group A5-3 rd  paper presentation Network File System designed for low-bandwidth networks

File TransferFile Transfer

A B C D A B C D

B

Client Server

A B C D A B C D

A B C D

Client Server

Page 6: Group A5-3 rd  paper presentation Network File System designed for low-bandwidth networks

Close-to-open ConsistencyClose-to-open Consistency After a client has written and

closed a file, another client opening the same file will always see the new contents.

Once the file is successfully written and closed the data resides safely on the server.

Clients see the server’s latest version when they open a file.

A B C D

AA

A

A

Server

Client 2Client 1

Page 7: Group A5-3 rd  paper presentation Network File System designed for low-bandwidth networks

Related WorkRelated Work

AFS – Andrew File System Leases NFS – Network File System CODA

Page 8: Group A5-3 rd  paper presentation Network File System designed for low-bandwidth networks

A

AFSAFS Uses user

callbacks to inform clients when other clients have modified cached files.

Users can often access cached AFS files without requiring any network traffic.

A B C D

AA

A

A

Client 2Client 2

Server

Page 9: Group A5-3 rd  paper presentation Network File System designed for low-bandwidth networks

LeasesLeases Modified AFS on which the obligation of the

server to inform a client of changes expires after a certain period of time.

Advantages:Free the server from contacting clients who haven’t touched a file in a while. Avoid problems when a client to which the server has promised a callback, has crashed or gone of the network.

Page 10: Group A5-3 rd  paper presentation Network File System designed for low-bandwidth networks

NFSNFS

Reduces network round trips by batching file system operations.

LBFS is based on NFS.

Page 11: Group A5-3 rd  paper presentation Network File System designed for low-bandwidth networks

CODACODA

Avoids transferring files to the server when they are deleted or overwritten quickly on the client.

LBFS does not support this, it simply reduces the bandwidth required for each transfer.

Page 12: Group A5-3 rd  paper presentation Network File System designed for low-bandwidth networks

DesignDesign

Indexing

Page 13: Group A5-3 rd  paper presentation Network File System designed for low-bandwidth networks

IndexingIndexing LBFS indexes a set of

files to recognize their data chunks.

Rely on the collision resistant properties of the SHA-1 hash function to save chunk transfers.

If the client and server both have data chunks producing the same SHA-1 hash, they assume the two are really the same chunk and avoid transferring it’s contents over the network.

Page 14: Group A5-3 rd  paper presentation Network File System designed for low-bandwidth networks

Dividing files into data chunksDividing files into data chunks A data chunk is considered to

be: every (overlapping) 48-byte region of the file and probability 2-13 over each region’s contents.

Boundary regions (breakpoints) are selected using Rabin Fingerprints.

When the low-order 13 bits of a region’s fingerprint equal a chosen (SHA-1 hash) value, the region constitutes a breakpoint.

8 KB 48 B

Data chunk 1

6 KB

Data chunk 2

Assuming random data, the expected chunk size is 213 = 8KB.

Page 15: Group A5-3 rd  paper presentation Network File System designed for low-bandwidth networks

Chunks of file before and after various editsChunks of file before and after various edits

C1 C3C2

C1 C4 C3

C1 C4 C5 C6

C1 C7 C6

Modification on which breakpoint is eliminated.

Inserting on C2

Inserting data that contains breakpoints

Page 16: Group A5-3 rd  paper presentation Network File System designed for low-bandwidth networks

Requirements/ RestrictionsRequirements/ Restrictions LBFS imposes a minimum (2K) and maximum

(64K) chunk size. Any 48 byte region hashing to a magic value in

the first 2K after a breakpoint does not constitute a new breakpoint.

If the file contents does not produce a breakpoint every 64K, an artificial chunk boundary will be inserted.

Page 17: Group A5-3 rd  paper presentation Network File System designed for low-bandwidth networks

Chunk DatabaseChunk Database Used to identify and locate duplicate data

chunks. Indexes each chunk by the first 64 bits of it’s

SHA-1 hash. Database maps these 64 bit keys to (file, offset,

count) triples. Mapping must be updated whenever a file is

modified.

Page 18: Group A5-3 rd  paper presentation Network File System designed for low-bandwidth networks

Chunk DatabaseChunk Database

LBFS does not rely on database correctness. It recomputes the SHA-1 hash of any data chunk before using it to reconstruct a file.

The recomputed SHA-1 hash value is used to detect collisions in the database.

The worst a corrupt database can do is degrade performance.

Page 19: Group A5-3 rd  paper presentation Network File System designed for low-bandwidth networks

Protocol for low-bandwidth NFSProtocol for low-bandwidth NFS

Page 20: Group A5-3 rd  paper presentation Network File System designed for low-bandwidth networks

The Protocol The Protocol

LBFS protocol -based on NFS ver3.

All files are named by server chosen opaque handles.

Operations on handles include reading and writing data at specific offsets.

Page 21: Group A5-3 rd  paper presentation Network File System designed for low-bandwidth networks

Protocol issuesProtocol issues

File Consistency

File Reads

File Writes

Page 22: Group A5-3 rd  paper presentation Network File System designed for low-bandwidth networks

File ConsistencyFile Consistency

The LBFS client performs whole file caching as of now.

When a user opens a file, if the file is not in the local cache or the cached version is not upto date, the client fetches a new version from the server

Page 23: Group A5-3 rd  paper presentation Network File System designed for low-bandwidth networks

File Consistency, Cont.File Consistency, Cont.

How do you know if the file is upto date or not?

LBFS uses a three-tiered scheme to determine if a file is up to date.

Whenever a client makes any RPC on a file in LBFS, it gets back a read lease on the file.

Page 24: Group A5-3 rd  paper presentation Network File System designed for low-bandwidth networks

File Consistency,Cont.File Consistency,Cont.

The lease is a commitment on the part of the server to notify the client of any modifications made to that file during the term of the lease.

When a user opens a file, if the lease on the file has not expired and the version of the file is up to date, then the open succeeds immediately.

Page 25: Group A5-3 rd  paper presentation Network File System designed for low-bandwidth networks

File Consistency,Cont.File Consistency,Cont.

What if that’s not the case? If a user opens a file and the lease on it has

expired, then client asks server for the attributes.

This request gives the client a lease.

Page 26: Group A5-3 rd  paper presentation Network File System designed for low-bandwidth networks

File Consistency,Cont.File Consistency,Cont.

When client gets attributes , if the modification and inode change times are the same as when the file was stored in cache, then client uses its own version in the cache.

If the file times have changed, server transfers new contents to client.

Page 27: Group A5-3 rd  paper presentation Network File System designed for low-bandwidth networks

File Consistency,Cont.File Consistency,Cont.

Only close to open consistency is provided

Hence no write leases required.

Clashing writes prevented by atomic write operation at the server.

Page 28: Group A5-3 rd  paper presentation Network File System designed for low-bandwidth networks

File Consistency,Cont.File Consistency,Cont. When multiple clients are writing the same file, LBFS writes back data whenever any of the process closes the file.

Does that mean anything to the currently using process?NO.The currently using processes of course will see their version only.

Page 29: Group A5-3 rd  paper presentation Network File System designed for low-bandwidth networks

File ReadsFile Reads File reads uses a RPC procedure not in NFS

protocol- The GETHASH.

GETHASH retrieves hashes of data chunks in a file, so as to identify any chunks that exists in the clients cache.

Arguments taken are file handle, offset and size. GETHASH returns a vector of (SHA-1 hash, size)

pairs.

Page 30: Group A5-3 rd  paper presentation Network File System designed for low-bandwidth networks

File ReadsFile Reads

File not in cacheSend GETHASH

SERVERCLIENT

File broken to chunks

,@offset + countSha1 not in database,send readSha2 in database.

GETHASH(fh,offset,count)

(sha1,size1)

(sha2,size2)

Eof=true

Return data associated with sha1

READ(fh, sha1-off,size1)

Put sha1 in database

File reconstructed. Return to user.

Data of

sha1

Page 31: Group A5-3 rd  paper presentation Network File System designed for low-bandwidth networks

File ReadsFile ReadsFor files larger than 1024 chunks, the client must issue multiple GETHASH calls and may incur multiple round trips.

However network latency can be overlapped with transmission and disk I/O.

Page 32: Group A5-3 rd  paper presentation Network File System designed for low-bandwidth networks

File WritesFile WritesUpdated atomically at file close time.

Several reasons are there for keeping the old file till the and and then later atomically updating it.

Keeping the old version helps to explain commanilty.

Files being written back may have confusing intermediate states and of course it also avoids mismash from simulataneously writing processes.

Page 33: Group A5-3 rd  paper presentation Network File System designed for low-bandwidth networks

LDFS uses temporary files to implement atomic updates.

Four RPC’s implement this update protocol. MKTMPFILE,TMPWRITE,CONDWRITE, COMMITTMP.

File WritesFile Writes

Page 34: Group A5-3 rd  paper presentation Network File System designed for low-bandwidth networks

File WritesFile WritesCLIENT SERVER

User closes file

Pick fd

Break file into chunks

Send SHA-1 hashes to server

Server has sha1

Server needs sha2, send data

Server has sha3

Server has everything,commit

File closed,return to user

Create tmp file,map(client,fd)

to file

Sha1 in database,write data

to tmp file.

Sha2 not in database

Sha3 in database, write data into tmp

file.

Put sha2 into database

Write data into tmp file

No error copy data from tmp file

into the target file.

OK

ok

OKok

okHash not fo

und

MKMTPFILE(fd,fhandle)

Condwrite(fd,offset1,count1,sha1)

Condwrite(fd,offset2,count2,sha2)

Condwrite(fd,offset 3,count3,sha3)

Page 35: Group A5-3 rd  paper presentation Network File System designed for low-bandwidth networks

Low-bandwidth Network File SystemLow-bandwidth Network File System

Implementation

Page 36: Group A5-3 rd  paper presentation Network File System designed for low-bandwidth networks

ImplementationImplementation

Figure 1: Overview of the LBFS implementation

• Both the client and server run at user-level

• The client implements the file system using xfs

• The server accesses files through NFS

Page 37: Group A5-3 rd  paper presentation Network File System designed for low-bandwidth networks

Chunk IndexChunk Index

LBFS client and server both maintain chunk indexes. The two share the same indexing code. LBFS never relies on chunk database correctness

nor is concerned with crash recoverability. LBFS avoids any synchronous database updates.

Page 38: Group A5-3 rd  paper presentation Network File System designed for low-bandwidth networks

Server ImplementationServer Implementation

Main goal to build a system that could be installed on an already running file system

Accesses the file system by pretending to be an NFS client, translating LBFS requests into NFS

NFS advantages:Simplifies the implementationNo need to implement access controlChunk index more resilient to outside file system changes

Page 39: Group A5-3 rd  paper presentation Network File System designed for low-bandwidth networks

Client ImplementationClient Implementation

Uses the xfs device driver xfs is suitable to whole-file caching Responsible for fetching remote files and storing

them in the local cache Informs xfs of the bindings between files users

have opened and files in the local cache xfs then satisfies read and write requests directly

from the cache

Page 40: Group A5-3 rd  paper presentation Network File System designed for low-bandwidth networks

Low-bandwidth Network File System

Low-bandwidth Network File System

Evaluation

Page 41: Group A5-3 rd  paper presentation Network File System designed for low-bandwidth networks

Repeated Data in FilesRepeated Data in Files

Data Given Data size New data Overlapemacs 20.7 source emacs 20.6 52.1 MB 12.6 MB 76%

Tree of emacs 20.7 __ 20.2 MB 12.5 MB 38%

emacs 20.7 + printf ex emacs 20.7 6.4 MB 2.9 MB 55%

emacs 20.7 exec emacs 20.6 6.4 MB 5.1 MB 21%

Inst. of emacs 20.7 emacs 20.6 43.8 MB 16.9 MB 61%

Elisp doc. + new page Postscript 4.1 MB 0.4 MB 90%

MSWord doc. + edits MSWord 1.4 MB 0.4 MB 68%

Table 1: Amount of new data in a file or directory, given an older version

Page 42: Group A5-3 rd  paper presentation Network File System designed for low-bandwidth networks

Application PerformanceApplication Performance

Figure 2: Performance over various bandwidths

Page 43: Group A5-3 rd  paper presentation Network File System designed for low-bandwidth networks

ConclusionsConclusions

LBFS is a network file system that saves bandwidth

LBFS breaks files into chunks based on contents It indexes file chunks by their hash values Looks up chunks to reconstruct files that

contains same data without sending that data over the network

Page 44: Group A5-3 rd  paper presentation Network File System designed for low-bandwidth networks

Conclusions (cont)Conclusions (cont)

LBFS consumes less bandwidth than traditional file systems

Practical for situations where other file systems cannot be used

Makes transparent remote file access a viable alternative to running interactive programs on remote machines

Page 45: Group A5-3 rd  paper presentation Network File System designed for low-bandwidth networks

FIPS 180-1. Secure Hash Standard. U.S. Department of Commerce/N.I.S.T., National Technical Information Service, Springfield, VA, April 1995.

Gary G. Gary and David R. Cheriton. “Leases: An efficient fault-tolerant mechanism for distributed file cache consistency”. In Proceedings of the 12th ACM Symposium of Operating Systems Principles, pages 202-210, Litchfield Park, AZ, December 1989.

John H. Howard, Michael L. Kazar, Sherri G. Menees, David A. Nichols, M. Satyanarayanan, Robert N. Sidebotham, and Michael J. West. “Scale and performance in a distributed file system”. ACM Transactions on Computer Systems, 6(1):51-81, February 1988.

James J. Kistler and M. Satyanarayana. “Disconnected operation in the coda file system”. ACM Transactions on Computer Systems, 10(1):3-25, February 1992.

Michael O. Rabin. “Fingerprinting by random polynomials”. Technical Report TR-15-81, Center of Research in Computing Technology, Harvard University, 1981.

ReferencesReferences

Page 46: Group A5-3 rd  paper presentation Network File System designed for low-bandwidth networks

QUESTIONS ??