november 2005distributed systems: general services1 distributed systems: general services

163
November 2005 Distributed systems: gene ral services 1 Distributed Systems: General Services

Upload: ashlie-burke

Post on 11-Jan-2016

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 1

Distributed Systems:

General Services

Page 2: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 2

Overview of chapters

• Introduction• Co-ordination models and languages• General services

– Ch 11 Time Services, 11.1-11.3– Ch 9 Name Services– Ch 8 Distributed File Systems– Ch 10 Peer-to-peer Systems

• Distributed algorithms

• Shared data

Page 3: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 3

This chapter: overview

• Introduction

• Time services

• Name services

• Distributed file systems

• Peer-to-peer systems

Page 4: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 4

Introduction • Objectives

– Study existing subsystems• Services offered

• Architecture

• Limitations

– Learn about non-functional requirements• Techniques used

• Results

• Limitations

Page 5: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 5

This chapter: overview

• Introduction

• Time services

• Name services

• Distributed file systems

• Peer-to-peer systems

Page 6: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 6

Time services: Overview

• What & Why

• Clock synchronization

• Logical clocks (see chapter on algorithms)

Page 7: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 7

Time services

• Definition of time– International Atomic Time (TAI)

A second is defined as 9.192.631.770 periods of transition between two hyperfine levels of the ground state of Caesium-133

– Astronomical time Based on the rotation of the earth on its axis and its rotation about the sun. But due to the tidal friction the earth’s rotation is getting longer

– Co-ordinated Universal Time (UTC)International standard based Atomic time, but a leap second is occasionally inserted or deleted to keep up with Astronomical time

Page 8: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 8

Time services (cont.)

• Why is time important?

– Accounting (e.g. logins, files,…)

– Performance measurement

– Resource management

– some algorithms depend on it:

• Transactions using timestamps

• Authentication (e.g. Kerberos)

• Example Make

Page 9: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 9

Time services (cont.) • An example: Unix tool “make”

– large program:• multiple source files: graphics.c

• multiple object files: graphics.o

• change of one file recompilation of all files

– use of make to handle changes:• change of graphics.c at 3h45

• make examines creation time of graphics.o

• e.g. 3h43: recompiles graphics.c to create an up-to-date graphics.o

Page 10: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 10

Time services (cont.) • Make in a distributed environment:

• source file on system A

• object file on system B

Clock B

3h.. 47 48 49 50

Graphics.o created

Clock A

3h.. 41 42 43 44

Graphics.c modified

time

Equal times

Make graphics.o

no effect!!

Page 11: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 11

Time services (cont.) • Hardware clocks

– each computer has a physical clock• H(t): value of hardware clock/counter• different clocks different values• e.g. crystal-based clocks drift due to

– different shapes, temperatures, …

• Typical drift rates: – 1sec/sec 1 sec in 11.6 days– High precision quartz clocks: 0.1-0.01 sec/sec

• Software clocks– C(t) = H(t) +

• Ex. nanosecs elapsed since reference time

Page 12: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 12

Time services (cont.) • Changing local time

– Hardware: influence H(t)– Software: change and/or – Forward:

• Set clock forward

• some clock ticks appear to have been missed

– Backward:• Set backward? unacceptable

• Let clock run slower for a short period of time

Page 13: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 13

Time services (cont.) • Properties

– External synchronization to UTC (S)

| S(t) - C(t) | < D

– Internal synchronization| Ci(t) – Cj(t)| < D

– CorrectnessDrift rate <

– Monotonicityt’ > t C(t’) > C(t)

Page 14: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 14

Time services: Overview

• What & Why

• Synchronising Physical clocks

– Synchronising with UTC

– Christian’s algorithm

– The Berkeley algorithm

– Network Time Protocol

• Logical clocks (see chapter on algorithms)

Page 15: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 15

Time services synchronising with UTC

• Broadcast by Radio-stations– WWV (USA) (accuracy 0.1 - 10 msec.)

• Broadcast by satellites– Geos (0.1 msec.)– GPS (1 msec)

• Receivers to workstations

Page 16: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 16

Time services Christian’s algorithm

• assumption: time server process Ts on a

computer with UTC receiver

• process P: request/reply with time server

P Ts

T-request

reply (t)

Page 17: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 17

Time services Christian’s algorithm

• how to set clock at P?

– time t inserted in Ts

– time for P: t + Ttrans

– Ttrans = min + x

• min = minimum transmission time

• x? depends on

– network delay

– process scheduling

Page 18: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 18

Time services Christian’s algorithm

• In a synchronous system– Upper bound on delay: max– P sets time to t + (max – min) / 2– Accuracy: (max – min) / 2

• In an asynchronous system?

Page 19: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 19

Time services Christian’s algorithm

• practical approach– P measures round-trip time: Tround

– P sets time to t + Tround/2– accuracy?

• time at Ts when message arrives at P is in range: [t + min, t + Tround - min]

• accuracy = ((Tround/2) - min)

• problems:– single point of failure– impostor time server

Page 20: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 20

Time services The Berkeley algorithm

• Used on computers running Berkeley Unix

• active master elected among computers

whose clocks have to be synchronised

– no central time server

– upon failure of master: re-election

Page 21: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 21

Time services The Berkeley algorithm

• algorithm of master:– periodically polls the clock value of other

computers– estimates its local clock time (based on round-

trip delays)– averages the obtained values– returns the amount by which each individual’s

slave clock needs adjustment– fault-tolerant average (strange values excluded)

may be used

Page 22: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 22

Time services Network Time Protocol

• Standard for the Internet

• Design aims and features:

– Accurate synchronization despite variable message

delays

• Statistical techniques to filter timing data from different

servers

– Reliable service that can survive losses of connectivity

• Redundant servers

• hierarchy of servers in synchronised subnets

Page 23: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 23

Time services Network Time Protocol

1

2 2

333

Level = stratumStratum1 have UTC receiver

synchronisation

Page 24: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 24

Time services Network Time Protocol

• Three working modes for NTP servers– Multicast mode

• used on high speed LAN

• slave sets clock assuming small delay

– Procedure-call mode• similar to Christian’s algorithm

– Symmetric Mode• used by master servers and higher levels

• association between servers

Page 25: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 25

Time services Network Time Protocol

• Procedure-call and Symmetric mode

– each message bears timestamps of recent

message events

Server B T i-2

Server A

T i-1

T i-3 T i

timem’m

Page 26: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 26

Time services Network Time Protocol

– Compute approximations of offset and delay

Server B T i-2

Server A

T i-1

T i-3 T i

o true offsetm’m

Transmission times t t’

T i-2 = T i-3 + t + o

T i = T i-1 + t’ - o

a = T i-2 - T i-3 = t + o

b = T i-1 - T i = o – t’

estimates

d i = a - b

o i = (a + b) / 2

o i - d i / 2 o o i + d i / 2

d delay

total transmission time

Page 27: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 27

Time services Network Time Protocol

• Procedure-call and Symmetric mode

– data filtering applied to successive pairs

– NTP has complex peer-selection algorithm; favoured

are peers with :

• a lower stratum number

• a lower synchronisation dispersion

– Accuracy:

• 10 msec on Internet

• 1 msec on LAN

Page 28: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 28

This chapter: overview

• Introduction

• Time services

• Name services

• Distributed file systems

• Peer-to-peer systems

Page 29: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 29

Name services• Overview

– Introduction

– Basic service

– Case study: Domain name system

– Directory services

Page 30: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 30

Name services Introduction

• Types of name:– users– files, devices– service names (e.g. RPC)– port, process, group identifiers

• Attributes associated with names– Email address, login name, password– disk, block number– network address of server– ...

Page 31: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 31

Name services Introduction

• Goal:– binding between name and attributes

• History– originally:

• quite simple problem• scope: a single LAN• naïve solution: all names + attributes in a single file

– now:• interconnected networks• different administrative domains in a single name

space

Page 32: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 32

Name services Introduction

• naming service separate from other services:– unification

• general naming scheme for different objects

• e.g. local + remote files in same scheme

• e.g. files + devices+ pipes in same scheme

• e.g. URL

– integration• scope of sharing difficult to predict

• merge different administrative domains

• requires extensible name space

Page 33: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 33

Name services Introduction

• General requirements:

– scalable

– long lifetime

– high availability

– fault isolation

– tolerance of mistrust

Page 34: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 34

Name services• Overview

– Introduction

– Basic service

– Domain name system

– Directory services

Page 35: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 35

Name services Basic service

• Operations:identifier lookup (name, context)

register (name, context, identifier)

delete (name, context)

• From a single server to multiple servers:– hierarchical names

– efficiency

– availability

Page 36: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 36

Name services Basic service

• Hierarchical name space– introduction of directories or subdomains

• /users/pv/edu/distri/services.ppt

• cs.kuleuven.ac.be

– easier administration• delegation of authority

• less name conflicts: unique per directory

• unit of distribution: server per directory + single root server

Page 37: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 37

Name services Basic service

• Hierarchical name space

– navigation

• process of locating naming data from more than

one server

• iterative or recursive

– result of lookup:

• attributes

• references to another name server

Page 38: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 38

Name services Basic service

root

srcusers

sys pv ann

Lookup(“/users/pv/edu/…”,…)P

Ref to server-users

Page 39: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 39

Name services Basic service

root

srcusers

sys pv ann

PLookup(“pv/edu/…”,…)

Ref to server-pv

Page 40: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 40

Name services Basic service

• Efficiency caching– hierarchy introduces multiple servers

more communication overhead more processing capacity

– caching at client side:• <name, attributes> are cached

• fewer request lookup performance higher

• inconsistency? – Data is rarely changed

– limit on validity

Page 41: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 41

Name services Basic service

• Availability replication– at least two failure independent servers

– primary server: accepts updates

– secondary server(s): • get copy of data of primary

• period check for updates at primary

– level of consistency?

Page 42: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 42

Name services• Overview

– Introduction

– Basic service

– Case study: Domain name system

– Directory services

Page 43: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 43

Name services Domain name system

• DNS = Internet name service• originally: an ftp-able file

– not scalable– centralised administration

• a general name service for computers and domains– partitioning– delegation– replication and caching

Page 44: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 44

Name services Domain name system

• Domain name– hierarchy

nix.cs.kuleuven.ac.be

country (België)

univ. (academic)

K.U.Leuven

dept. computer science

name of computersystem

Page 45: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 45

Name services Domain name system

• 3 kinds of Top Level Domains (TLD)– 2-letter country codes (ISO 3166)– generic names (similar organisations)

• com commercial organisations• org non-profit organisations (bv. vzw)• int international organisations (nato, EU, …)• net network providers

– USA oriented names • edu universities• gov American government• mil American army

– new generic names• biz, info, name, aero, museum, ...

Page 46: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 46

Name services Domain name system

• For each TLD:– administrator (assign names within domain)

– “be”: DNS BE vzw (previous: dept. computer science)

• Each organisation with a name :– responsible for new (sub)names

– e.g. cs.kuleuven.ac.be

• Hierarchical naming + delegation workable structure

Page 47: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 47

Name services Domain name system

• Name servers– root name server +

– server per (group of) subdomains

+ replication high availability

+ caching acceptable performance– time-to-live to limit inconsistencies

Page 48: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 48

Name services Domain name system

Name server cs.kuleuven.ac.be

Systems/subdomainsSystems/subdomains typetype IP-adresIP-adresof cs.kuleuven.ac.beof cs.kuleuven.ac.be

nixnix AA 134.58.42.36134.58.42.36

idefixidefix AA 134.58.41.7134.58.41.7

droopydroopy AA 134.58.41.10134.58.41.10

stevinstevin AA 134.58.41.16134.58.41.16

......

A = AddressA = Address

Page 49: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 49

Name services Domain name system

Name server kuleuven.ac.be

Machines/subdomeinenMachines/subdomeinen typetype IP-adresIP-adresvan kuleuven.ac.bevan kuleuven.ac.be

cscs NSNS 134.58.39.1134.58.39.1

esatesat NSNS ……

wwwwww AA ……

......

NS = NameServerNS = NameServer

Page 50: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 50

Name services Domain name system

Example : www.cs.vu.nl

Lokale NSLokale NS

(cs.kuleuven.ac.be(cs.kuleuven.ac.be))

www.www.cs.vu.nlcs.vu.nl

Root-NSRoot-NS

NS (nl)NS (nl)

NS (vu.nl)NS (vu.nl)

NS (cs.vu.nl)NS (cs.vu.nl)130.37.24.1130.37.24.111

130.37.24.1130.37.24.111

Page 51: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 51

Name services Domain name system

• Extensions:– mail host location

• used by electronic mail software

• requests where mail for a domain should be delivered

– reverse resolution (IP address -> domain name)– host information

• Weak points:– security

Page 52: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 52

Name services Domain name system

• Good system design: – partitioning of data

• multiple servers

– replication of servers • high availability• limited inconsistencies• NO load balancing; NO server preference

– caching• acceptable performance• limited inconsistencies

Page 53: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 53

Name services• Overview

– Introduction

– Basic service

– Case study: Domain name system

– Directory services

Page 54: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 54

Name services Directory services

• Name serviceexact name attributes

• Directory servicesome attributes of object all attributes

• Examples:– request: E-mail address of Janssen at KULeuven

– result: list of names of all persons called Janssen

– select person and read attributes

Page 55: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 55

Name services Directory services

• Directory information base– list of entries– each entry

• set of type/list-of-values pairs

• optional and mandatory pairs

• some pairs determine name

– distributed implementation

Page 56: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 56

Name services Directory services

• X500– CCITT standard

• LDAP– light weight directory access protocol– Internet standard

Page 57: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 57

This chapter: overview

• Introduction

• Time services

• Name services

• Distributed file systems

• Peer-to-peer systems

Page 58: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 58

Distributed file systems• Overview

– File service architecture

– case study: NFS

– case study AFS

– comparison NFS <> AFS

Page 59: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 59

Distributed file systemsFile service architecture

• Definitions:– file– directory – file system

cf. Operating systems

Page 60: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 60

Distributed file systemsFile service architecture

• Requirements:– addressed by most file systems:

• access transparency• location transparency• failure transparency• performance transparency

– related to distribution:• concurrent updates• hardware and operating system heterogeneity• scalability

Page 61: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 61

Distributed file systemsFile service architecture

• Requirements (cont.) :– scalable to a very large number of nodes

• replication transparency

• migration transparency

– future:• support for fine-grained distribution of data

• tolerance to network partitioning

Page 62: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 62

Distributed file systemsFile service architecture

• File service components

Flat file service

Directory service

Network

Client module: API

User program User program

UFID

names

Page 63: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 63

Distributed file systemsFile service architecture

• Flat file service– file = data + attributes

– data = sequence of items

– operations: simple read & write

– attributes: in flat file & directory service

– UFID: unique file identifier

Page 64: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 64

Distributed file systemsFile service architecture

• Flat file service: operations

– Read (FileID, Position, n) Data

– Write (FileID, Position, Data)

– Create () FileID

– Delete (FileID)

– GetAttributes (FileId) Attr

– SetAttributes (FileID, Attr)

Page 65: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 65

Distributed file systemsFile service architecture

• Flat file service: fault tolerance

– straightforward for simple servers

– idempotent operations

– stateless servers

Page 66: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 66

Distributed file systemsFile service architecture

• Directory service

– translate file name in UFID

– substitute for open

– responsible for access control

Page 67: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 67

Distributed file systemsFile service architecture

• Directory service: operations

– Lookup (Dir, Name, AccessMode, UserID)

UFID

– AddName (Dir, Name, FileID, UserID)

– UnName (Dir, Name)

– ReName (Dir, OldName, NewName)

– GetNames (Dir, Pattern) list-of-names

Page 68: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 68

Distributed file systemsFile service architecture

• Implementation techniques:– known techniques from OS experience– remain important– distributed file service: comparable in

• performance

• reliability

Page 69: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 69

Distributed file systemsFile service architecture

• Implementation techniques: overview– file groups– space leaks– capabilities and access control– access modes– file representation– file location– group location– file addressing– caching

Page 70: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 70

Distributed file systemsFile service architecture

• Implementation techniques: file groups– (similar: file system, partition)= collection of files mounted on a server

computer– unit of distribution over servers– transparent migration of file groups– once created file is locked in file group– UFID = file group identifier + file identifier

Page 71: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 71

Distributed file systemsFile service architecture

• Implementation techniques: space leaks– 2 steps for creating a file

• create (empty) file and get new UFID• enter name + UFID in directory

– failure after step 1:• file exists in file server• unreachable: UFID not in any directory

lost space on disk– detection requires co-operation between

• file server• directory server

Page 72: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 72

Distributed file systemsFile service architecture

• Implementation techniques: capabilities= digital key: access to resource granted on

presentation of the capability– request to directory server: file name + user id

+ mode of accessUFID including permitted access modes – construction of UFID– unique– encode access– unforgeable

Page 73: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 73

Distributed file systemsFile service architecture

• Implementation techniques: capabilities

File group id File nr

File group id File nr Random nr

File group id File nr Random nr

Access bits

File group id File nr

Access bits Encrypted (access bits + random number)

Reuse?

Access check?

Forgeable?

Page 74: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 74

Distributed file systemsFile service architecture

• Implementation techniques: file location– from UFID location of file server– use of replicated group location database file group id, PortId

– why replication?– why location not encoded in UFID?

Page 75: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 75

Distributed file systemsFile service architecture

• Implementation techniques: caching– server cache: reduce delay for disk I/O

• selecting blocks for release

• coherence: – dirty flags

– write-through caching

– client cache: reduce network delay• always use write-through

• synchronisation problems with multiple caches

Page 76: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 76

Distributed file systems• Overview

– File service model

– case study: Network File System -- NFS

– case study: AFS

– comparison NFS <> AFS

Page 77: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 77

Distributed file systemsNFS

• Background and aims– first file service product– emulate UNIX file system interface– de facto standard

• key interfaces published in public domain

• source code available for reference implementation

– supports diskless workstations• not important anymore

Page 78: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 78

Distributed file systemsNFS

• Design characteristics– client and server modules can be in any node

Large installations include a few servers– Clients:

• On Unix: emulation of standard UNIX file system

• for MS/DOS, Windows, Apple, …

– Integrated file and directory service– Integration of remote file systems in a local

one: mount remote mount

Page 79: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 79

Distributed file systemsNFS: configuration

• Unix mount system call– each disk partition contains hierarchical FS– how integrate?

• Name partitions a:/usr/students/john

• glue partitions together– invisible for user

– partitions remain useful for system managers

Page 80: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 80

Distributed file systemsNFS: configuration

• Unix mount system call

/ (root)

vmunix usr...

students staffx

Root partition

/ (root)

pierre ann...

network proje

Partition c

Directory staff root of c: /usr/staff/ann/network

Page 81: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 81

Distributed file systemsNFS: configuration

• Remote mount

/ (root)

vmunix usr...

students staffx

client

/ (root)

users ......

ann pierre

Server 1

Directory staff users: /usr/staff/ann/...

Page 82: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 82

Distributed file systemsNFS: configuration

• Mount service on server– enables clients to integrate (part of) remote file

system in the local name space– exported file systems in /etc/exports + access

list (= hosts permitted to mount; secure?)

• On client side– file systems to mount enumerated in /etc/rc– typically mount at start up time

Page 83: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 83

Distributed file systemsNFS: configuration

• Mounting semantics– hard

• client waits until request for a remote file succeeds

• eventually forever

– soft• failure returned if request does not succeed after n

retries

• breaks Unix failure semantics

Page 84: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 84

Distributed file systemsNFS: configuration

• Automounter– principle:

• empty mount points on clients

• mount on first request for remote file

– acts as a server for a local client• gets references to empty mount points

• maps mount points to remote file systems

• referenced file system mounted on mount point via a symbolic link, to avoid redundant requests to automounter

Page 85: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 85

Distributed file systemsNFS: implementation

• In UNIX: client and server modules

implemented in kernel

• virtual file system:

– internal key interface, based on file handles for

remote files

Page 86: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 86

Distributed file systemsNFS: implementation

UNIX kernel

Virtual file system

UNIX file

system

NFSclient

User client

process System call

Client computer

Net-work

UNIX kernel

Virtual file system

UNIX file

system

NFSserver

server computer

NFS protocol

Page 87: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 87

Distributed file systemsNFS: implementation

• Virtual file system– Added to UNIX kernel to make distinction

• Local files

• Remote files

– File handles: file ID in NFS• Base: inode number

– File ID in partition on UNIX system

• Extended with:– File system identifier

– inode generation number (to enable reuse)

Page 88: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 88

Distributed file systemsNFS: implementation

• Client integration:– NFS client module integrated in kernel

• offers standard UNIX interface

• no client recompilation/reloading

• single client module for all user level processes

• encryption in kernel

• server integration– only for performance reasons– user level = 80% of kernel level version

Page 89: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 89

Distributed file systemsNFS: implementation

• Directory service– name resolution co-ordinated in client– step-by-step process for multi-part file names– mapping tables in server: high overhead

reduced by caching• Access control and authentication

– based on UNIX user ID and group ID– included and checked for every NFS request– secure NFS 4.0 thanks to use of DES

encryption

Page 90: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 90

Distributed file systemsNFS: implementation

• Caching– Unix caching

• based on disk blocks• delayed write (why?)• read ahead (why?)• periodically sync to flush buffers in cache

– Caching in NFS• Server caching• Client caching

Page 91: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 91

Distributed file systemsNFS: implementation

• Server caching in NFS– based on standard UNIX caching: 2 modes– write-through (instead of delayed write)

• failure semantics • performance

– delayed write• Data stored in cache, till commit operation is

received– Close on client commit operation on server

• Failure semantics?• Performance

Page 92: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 92

Distributed file systemsNFS: implementation

• Client caching– cached are results of

• read, write, getattr, lookup, readdir

– problem: multiple copies of same data at different NFS clients

– NFS clients use read-ahead and delayed write

Page 93: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 93

Distributed file systemsNFS: implementation

• Client caching (cont.)– handling of writes

• block of file is fetched and updated• changed block is marked as dirty• dirty pages of files are flushed to server

asynchronously– on close of file– sync operation on client– by bio-daemon (when block is filled)

• dirty pages of directories are flushed to server– by bio-daemon without further delay

Page 94: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 94

Distributed file systemsNFS: implementation

• Client caching (cont.)

– consistency checks• based on time-stamps indicating last modification

of file on server

• validation checks– when file is opened

– when a new block is fetched

– assumed to remain valid for a fixed time (3 sec for file, 30 sec for directory)

– next operation causes another check

• costly procedure

Page 95: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 95

Distributed file systemsNFS: implementation

• Caching – Cached entry is valid

(T – Tc) < t or (Tmclient = Tmserver)– T: current time– Tc: time when cache entry was last validated– t: freshness interval (3 .. 30 secs in Solaris)– Tm: time when block was last modified at …

– consistency level• acceptable

• most UNIX applications do not depend critically on synchronisation of file updates

Page 96: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 96

Distributed file systemsNFS: implementation

• Performance– reasonable performance

• remote files on fast disk better than local files on slow disk

• RPC packets are 9 Kb to contain 8Kb disk blocks– Lookup operation covers about 50% of server

calls– Drawbacks:

• frequent getattr calls for time-stamps (cache validation)

• poor performance of (relative infrequent) writes (because of write-through)

Page 97: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 97

Distributed file systems• Overview

– File service model

– case study: NFS

– case study: Andrew File System -- AFS

– comparison NFS <> AFS

Page 98: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 98

Distributed file systemsAFS

• Background and aims– Base: observation of UNIX file systems

• files are small ( < 10 Kb)• read is more common than write• sequential access is common, random access is not• most files are not shared• shared files are often modified by one user• file references come in bursts

– Aim: combine best of personal computers and time-sharing systems

Page 99: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 99

Distributed file systemsAFS

• Background and aims (cont.)

– Assumptions about environment• secured file servers

• public workstations

• workstations with local disk

• no private files on local disk

– Key targets: scalability and security• CMU 1991: 800 workstations, 40 servers

Page 100: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 100

Distributed file systemsAFS

• Design characteristics– whole-file serving– whole-file cachingentire files are transmitted, not blocks– client cache realised on local disk (relatively

large)lower number of open requests on the network– separation between file and directory service

Page 101: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 101

Distributed file systemsAFS

• Configuration– single (global) name space– local files

• temporary files

• system files for start-up

– volume = unit of configuration and management

– each server maintains a replicated location database (volume-server mappings)

Page 102: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 102

Distributed file systemsAFS

• File name space

/ (root)

bin afstmp

local

/ (root)

users bin...

ann pierre

shared

Directory afs root of shared file system

–Symbolic link

Page 103: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 103

Distributed file systemsAFS

• Implementation– Vice = file server

• secured systems, controlled by system management• understands only file identifiers• runs in user space

– Venus = user/client software• runs on workstations• workstations keep autonomy• implements directory services

– kernel modifications for open and close

Page 104: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 104

Distributed file systemsAFS

User

program

Unix kernel

Venus

User

program

Unix kernel

Venus

Unix kernel

Vice

Unix kernel

Vice

workstations servers

Page 105: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 105

Distributed file systemsAFS

User

program

Unix kernel

Venus

Unix file

system call Non-local file operations

Unix file system

Page 106: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 106

Distributed file systemsAFS

• Implementation of file system calls– open, close:

• UNIX kernel of workstation

• Venus (on workstation)

• VICE (on server)

– read, write:• UNIX kernel of workstation

Page 107: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 107

place copy of file in local file system and store FileName

open local file and return descriptor to user process

Distributed file systemsAFS

• Open system call open( FileName,…)

if FileName in shared space /afs then pass request to Venus kernel

if file not present in local cache OR file present with

invalid callback then pass request to Vice server venus

network

transfer copy of file and valid callback vice

Page 108: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 108

Distributed file systemsAFS

• Read Write system call

perform normal UNIX read op local copy of file kernel

read( FileDescriptor,…)

….

Page 109: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 109

Distributed file systemsAFS

• Close system call

close local copy and inform Venus about close kernel

if local copy is changed then send copy to Vice server venus network

Store copy of file and send callback to other

clients holding callback promise vice

close( FileDescriptor,…)

...

Page 110: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 110

Distributed file systemsAFS

• Caching– callback principle

• service supplies callback promise at open

• promise is stored with file in client cache

– callback promise can be valid or cancelled• initially valid

• server sends message to cancel callback promise – to all clients that cache the file

– whenever file is updated

Page 111: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 111

Distributed file systemsAFS

• Caching maintenance– when client workstation reboots

• cache validation necessary because of missed messages

• cache validation requests are sent for each valid promise

– valid callback promises are renewed • on open

• when no communication has occurred with the server during a period T

Page 112: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 112

Distributed file systemsAFS

• Update semantics– guarantee after successful open of file F on server S:

latest(F, S, 0)orlostcallback(S, T) and incache(F) and latest(F, S, T)

– no other concurrency control

• 2 copies can be updated at different workstations

• all updates except from last close are (silently) lost

• <> normal UNIX operation

Page 113: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 113

Distributed file systemsAFS

• Performance: impressive <> NFS– benchmark: load on server

• 40% AFS• 100% NFS

– whole-file caching: • reduction of load on servers• minimises effect of network latency

– read-only volumes are replicated (master copy for occasional updates)

Andrew optimised for a specific pattern of use!!

Page 114: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 114

Distributed file systems• Overview

– File service model

– case study: NFS

– case study AFS

– comparison NFS <> AFS

Page 115: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 115

Distributed file systemsNFS <>AFS

• Access transparency– both in NFS and AFS– Unix file system interface is offered

• Location transparency– uniform view on shared files in AFS– in NFS

• mounting freedom

• same view possible if same mounting; discipline!

Page 116: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 116

Distributed file systemsNFS <>AFS

• Failure transparency– NFS

• no state of clients stored in servers

• idempotent operations

• transparency limited by soft mounting

– AFS• state about clients stored in servers

• cache maintenance protocol handles server crashes

• limitations?

Page 117: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 117

Distributed file systemsNFS <>AFS

• Performance transparency– NFS

• acceptable performance degradation

– AFS• only delay for first open operation on file

• better than NFS for small files

Page 118: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 118

Distributed file systemsNFS <>AFS

• Migration transparency– limited: update of locations required– NFS

• update configuration files on all clients

– AFS• update of replicated database

Page 119: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 119

Distributed file systemsNFS <>AFS

• Replication transparency– NFS

• not supported

– AFS• limited support; for read-only volumes

• one master for exceptional updates; manual procedure for propagating changes to other volumes

Page 120: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 120

Distributed file systemsNFS <>AFS

• Concurrency transparency– not supported (in UNIX)

• Scalability transparency– AFS better than NFS

Page 121: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 121

Distributed file systems• Overview

– File service architecture

– case study: NFS

– case study AFS

– comparison NFS <> AFS

Page 122: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 122

Distributed file systems

• Conclusions– Standards <> quality– evolution to standards and common key

interface• AFS-3 incorporates Kerberos, vnode interface,

large messages for file blocks (64Kb)

– network (mainly access) transparency causes inheritance of weakness: e.g. no concurrency control

– evolution mainly performance driven

Page 123: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 123

Distributed file systems

• Good system design: – partitioning

• different storage volumes– replication of servers

• high availability• limited inconsistencies• load balancing?• server preference?

– caching• acceptable performance• limited inconsistencies

Page 124: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 124

This chapter: overview

• Introduction

• Time services

• Name services

• Distributed file systems

• Peer-to-peer systems

Page 125: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 125

Peer-to-peer systems• Overview

– Introduction – Napster– Middleware– Routing algorithms– OceanStore –Pond file store

Page 126: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 126

Peer-to-peer systems Introduction

• Definition 1:– Support useful distributed services– Using data and computer resources in the PCs

and workstations available on the Internet

• Definition 2:– Applications exploiting resources available at

the edges of the Internet

Page 127: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 127

Peer-to-peer systems Introduction

• Characteristics:– Each user contributes resources to the system

– All nodes have the same functional capabilities and responsibilities

– Correct operation does not depend on the existence of any centrally administered systems

– Can offer limited degree of anonymity

– Efficient algorithms for the placement of data across many hosts and subsequent access

– Efficient algorithms for load balancing and availability of data

Availability of partic

ipating computers is unpredictabel

Page 128: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 128

Peer-to-peer systems Introduction

• 3 generations of peer-to-peer systems and application development:– Napster: music exchange service– File-sharing applications offering greater

scalability, anonymity and fault tolerance: Freenet, Gnutella, Kazaa

– Middleware for the application-independent management of distributed resources on a global scale: Pastry, Tapestry,…

Page 129: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 129

Peer-to-peer systems Napster

• Download digital music files

• Architecture:– Centralised indices– Files stored and accessed on PCs

• Method of operation (next slide)

Page 130: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 130

Peer-to-peer systems Napster

Napster serverIndex1. File location

2. List of peers

request

offering the file

peers

3. File request

4. File delivered5. Index update

Napster serverIndex

– step 5: file sharing expected!

Page 131: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 131

Peer-to-peer systems Napster

• Conclusion:– Feasibility demonstrated– Simple load balancing techniques– Replicated unified index of all available music

files– No updates of files– Availability not a hard concern– Legal issues: anonymity

Page 132: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 132

Peer-to-peer systems• Overview

– Introduction – Napster– Middleware– Routing algorithms– OceanStore –Pond file store

Page 133: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 133

Peer-to-peer systems middleware

• Goal:– Automatic placement– Subsequent location of distributed objects

• Functional requirements:– Locate and communicate with any resource– Add and remove resources– Add and remove hosts– API independent of type of resource

}

Page 134: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 134

Peer-to-peer systems middleware

• Non-functional requirements:– Global scalability (millions of object on hundreds of thousands of

hosts)

– Optimization for local interactions between

neighbouring peers

– Accommodating to highly dynamic host availability

– Security of data in an environment with heterogeneous

trust

– Anonymity, deniability and resistance to censorship

Page 135: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 135

Peer-to-peer systems middleware

• Approach:– Knowledge of locations of objects

• Partitioned

• Distributed

• Replicated (x16)

throughout the network═Routing overlay:

• Distributed algorithm for locating objects and nodes

Page 136: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 136

Peer-to-peer systems middleware

Object:

Node:

D

CÕs routing knowledge

DÕs routing knowledgeAÕs routing knowledge

BÕs routing knowledge

C

A

B

Page 137: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 137

Peer-to-peer systems middleware

• Resource identification:– GUID = globally unique identifier– Secure hash from

• all state – self certifying

– for immutable objects

• part of state– e.g. name + …

Page 138: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 138

Peer-to-peer systems middleware

• API for a DHT in Pastry

DHT = distributed hash table

put(GUID, data) The data is stored in replicas at all nodes responsible for the object identified by GUID.remove(GUID)Deletes all references to GUID and the associated data.value = get(GUID)The data associated with GUID is retrieved from one of the nodes responsible it.

Page 139: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 139

Peer-to-peer systems middleware

• API for a DOLR in Tapestry

DOLR = distributed object location and routing

publish(GUID ) This function makes the node performing a publish operation the host for the object corresponding to GUID.

unpublish(GUID)Makes the object corresponding to GUID inaccessible.

sendToObj(msg, GUID, [n])an invocation message is sent to an object in order to access it. The optional parameter [n], requests the delivery of the same message to n replicas of the object.

Page 140: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 140

Peer-to-peer systems• Overview

– Introduction – Napster– Middleware– Routing algorithms– OceanStore –Pond file store

Page 141: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 141

Peer-to-peer systems Routing algorithms

• prefix routing – Used in Pastry, Tapestry

• GUID: 128-bit– Host: secure hash on public key– Object: secure hash on name or part of state

• N participating nodes– O(log N) steps to route a message to any GUID– O(log N) messages to integrate a new node

Page 142: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 142

Peer-to-peer systems Routing algorithms

• Simplified algorithm: circular routing– Leaf set in each active node– Element in leaf set:

GUID – IP address of nodes whose GUIDS are numerically closest

on either side of its own GIUD– Size L = 2*l– Leaf sets are updated when nodes

• Join• Leave

Page 143: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 143

Peer-to-peer systems Routing algorithms

• Circular space for GUIDs

• Dot = live node

• Leaf size = 8

• Shown: routing of a message from node 65A1FC to D46A1C

0 FFFFF....F (2128-1)

65A1FC

D13DA3

D471F1

D467C4D46A1C

Page 144: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 144

Peer-to-peer systems Routing algorithms

• Full algorithm (Pastry)– Tree-structured routing table

• GUID – IP pairs spread throughout entire range of GUIDs

• Increased density of coverage for GUIDs numerically close to GUID of local node

• GUIDs viewed as hexadecimal values

• Table classifies GUIDs based on hexadecimal prefices

• As many rows as hexadecimal digits

Page 145: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 145

Peer-to-peer systems Routing algorithms

• Full algorithm (Pastry)

Page 146: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 146

Peer-to-peer systems Routing algorithms

• Routing a message

from 65A1FC

to D46A1C

• Well-populated table:

~log16(N) hops

0 FFFFF....F (2128-1)

65A1FC

D13DA3

D4213F

D462BA

D471F1

D467C4D46A1C

Page 147: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 147

Peer-to-peer systems Routing algorithms

• Full algorithm (Pastry)To handle a message M addressed to a node D (where R[p,i] is the element at column i,row p of the routing table):

1. If (L-l < D < Ll) { // the destination is within the leaf set or is the current node.2. Forward M to the element Li of the leaf set with GUID closest to D or the current

node A.3. } else { // use the routing table to despatch M to a node with a closer GUID4. find p, the length of the longest common prefix of D and A. and i, the (p+1)th

hexadecimal digit of D .5. If (R[p,i] ° null) forward M to R[p,i] // route M to a node with a longer common

prefix.6. else { // there is no entry in the routing table7. Forward M to any node in L or R with a common prefix of length i, but a

GUID that is numerically closer.}

}

Page 148: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 148

Peer-to-peer systems Routing algorithms

• Full algorithm (Pastry): host integration– New node X:

• computes GUIDX

• X should know at least one nearby Pastry node A with GUIDA

• X sends special join message to A (destination GUIDX)

– A despatches the join message via Pastry

• Route of message: A, B, ….Z (GUIDZ GUIDX)

– Each node sends relevant part of its routing table and

leaf sets to X

Page 149: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 149

Peer-to-peer systems Routing algorithms

• Full algorithm (Pastry): host integration– Route of message: A, B, ….Z

– Routing table of X:• Row 0 of A row 0 of X

• B and X share hexadecimal digits: so row i of B row I of X

• …

• Choice for entry? proximity neighbour selection algorithm (metric: IP hops, measured latency,…)

– Leaf set of Z leaf set of X

– X sends its leaf set and routing table to all nodes in its routing table and leaf set

Page 150: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 150

Peer-to-peer systems Routing algorithms

• Full algorithm (Pastry): host failure– Repair actions to update leafsets: leaf set requested

from node(s) close to failed node

– Routing table: repairs on a “when discovered” basis

Page 151: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 151

Peer-to-peer systems Routing algorithms

• Full algorithm (Pastry): Fault tolerance– All nodes in leaf set life?

• Each node sends heartbeat messages to nodes in leaf set

– Malicious nodes?• Clients can use at least once delivery mechanism

• Use limited randomness in node selection in routing algorithm

Page 152: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 152

Peer-to-peer systems Routing algorithms

• Full algorithm (Pastry): dependability– Include acks in routing algorithm: No ack

• alternative route• node: suspected failure

– Heartbeat messages– Suspected failed nodes in routing tables

• Probed• Fail: alternative nodes

– Simple Gossip protocol to exchange routing table info between nodes

Page 153: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 153

Peer-to-peer systems Routing algorithms

• Full algorithm (Pastry): evaluation– Message loss

– Performance: Relative Delay Penalty (RDP)= Pastry delivery / IP/UDP delivery

IP loss Pastry loss Pastry wrong node RDP

100. 000 messages

0% 1.5 0 1.8

5% 3.3 1.6 2.2

Page 154: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 154

Peer-to-peer systems• Overview

– Introduction – Napster– Middleware– Routing algorithms– OceanStore –Pond file store

Page 155: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 155

Distributed file systems OceanStore

• Objectives:– Very large scale, incrementally-scalable,

persistent storage facility – Mutable data objects with long-term

persistence and reliability– Environment of network and computing

resources constantly changing

• Intended use

• Mechanisms used

Page 156: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 156

Distributed file systems OceanStore

• Intended use:– NFS-like file system– Electronic mail hosting

• Mechanism needed:– Consistency between replicas

• Tailored to application needs by a Bayou like system

– Privacy and integrity• Encryption of data• Byzantine agreement protocol for updates

Page 157: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 157

Distributed file systems OceanStore

• Storage organization

d1 d2 d3 d5d4

root block

version i indirection blocks

d2

version i+1

d1 d3

certificate VGUID of currentversion

VGUID ofversion i

AGUID

VGUID of version i-1

data blocks

BG

UID

(co

py

on

wri

te)

Version i+1 has been updated in blocks d1, d2 and d3. The certificate and the root blocks include some metadata not shown. All unlabelled arrows are BGUIDs.

Page 158: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 158

Distributed file systems OceanStore

• Storage organization (cont)

Name Meaning Description

BGUID block GUID Secure hash of a data block

VGUID version GUID BGUID of the root block of a version

AGUID active GUID Uniquely identifies all the versions of an object

Page 159: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 159

Distributed file systems OceanStore

• Storage organization (cont)

– New object AGUID• small set of hosts act as inner ring

• publish (AGUID)

• Refers to signed certificate recording the sequence of versions of the object

– Update of object• Byzantine agreement between host in inner ring

(primary copy)

• Result disseminated to secondary replicas

Page 160: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 160

Distributed file systems OceanStore

• PerformanceLAN WAN Predominant

operations inbenchmarkPhase Linux NFS Pond Linux NFS Pond

1 0.0 1.9 0.9 2.8 Read and write

2 0.3 11.0 9.4 16.8 Read and write

3 1.1 1.8 8.3 1.8 Read

4 0.5 1.5 6.9 1.5 Read

5 2.6 21.0 21.5 32.0 Read and write

Total 4.5 37.2 47.0 54.9

Page 161: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 161

Distributed file systems OceanStore

• Performance (cont)

– Benchmarks:• Create subdirectories recursively

• Copy a source tree

• Examine the status of all files in the tree

• Examine every byte of data in all files

• Compile and link the files

Page 162: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 162

This chapter: overview

• Introduction

• Time services

• Name services

• Distributed file systems

• Peer-to-peer systems

Page 163: November 2005Distributed systems: general services1 Distributed Systems: General Services

November 2005 Distributed systems: general services 163

Distributed Systems:

General Services