last time - university of cambridge · 2017-01-15 · last time •distributed systems are...

13
1/13/17 1 Distributed systems Lecture 2: The Network File System (NFS) and Object Oriented Middleware (OOM) Dr. Robert N. M. Watson 1 Last time Distributed systems are everywhere Challenges including concurrency, delays, failures The importance of transparency Simplest distributed systems are client/server Client sends request as message Server gets message, performs operation, and replies Some care required handling retry semantics, timeouts One popular model is Remote Procedure Call (RPC) Client calls functions on the server via network Middleware generates stub code which can marshal / unmarshal arguments/return values – e.g. SunRPC/XDR Transparency for the programmer, not just the user 2

Upload: others

Post on 13-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

1/13/17

1

DistributedsystemsLecture2:TheNetworkFileSystem(NFS)and

ObjectOrientedMiddleware(OOM)

Dr.RobertN.M.Watson

1

Lasttime• Distributedsystemsareeverywhere– Challengesincludingconcurrency,delays,failures– Theimportanceoftransparency

• Simplestdistributedsystemsareclient/server– Clientsendsrequestasmessage– Servergetsmessage,performsoperation,andreplies– Somecarerequiredhandlingretrysemantics,timeouts

• OnepopularmodelisRemoteProcedureCall(RPC)– Clientcallsfunctionsontheservervianetwork– Middleware generatesstubcodewhichcanmarshal/unmarshal arguments/returnvalues– e.g.SunRPC/XDR

– Transparencyfortheprogrammer,notjusttheuser

2

1/13/17

2

Firstcasestudy:NFS• NFS =NetworkedFileSystem(developedSun)– Aimedtoprovidedistributedfilingbyremoteaccess

• Keydesigndecisions:– Distributedfilesystemvs.remotedisks– Client-servermodel– Highdegreeoftransparency– Tolerantofnodecrashesornetworkfailure

• Firstpublicversion,NFSv2(1989),didthisvia:– Unixfilesystemsemantics(oralmost)– Integrationintokernel(includingmount)– Simplestatelessclient/serverarchitecture

• A setofRPC“programs”:mountd,nfsd,lockd,statd,...

Transparencyforusersand

applications,butalsoNFSprogrammers:henceSunRPC

3

NFS:Client/ServerArchitecture

UserProgram

Syscall Level

Clientside

RPCRequest

VFSLayer

LocalFS NFSClient

Syscall Level

ServerSide

VFSLayer

NFSServer LocalFS

RPCResponse

1

2

3 4

5

• Clientusesopaquefilehandlestorefertofiles• Servertranslatesthesetolocalinode numbers• SunRPC withXDRrunningoverUDP(originally)

4

1/13/17

3

NFS:mountingremotefilesystems

NFSClient

NFSServer

/tmp

/

/home

x y

/home

/

/bin

foo bar

• NFSRPCsaremethodsonfilesidentifiedbyfilehandle(s)• Bootstrapviadedicatedmount RPC‘program’that:

– Performsauthentication(ifany);– Negotiatesanyoptionalsessionparameters;and– Returnsrootfilehandle

5

NFSfilehandlesandscoping

• Purenames exposenovisiblesemantics(e.g.,NFShandle)• Impurenames haveexposesemantics(e.g.,filepaths)

UserProgram

Syscall Level

VFSLayer

NFSClient

VFSLayer

NFSServer

LocalFS

int

file*vnode *vnode *

nfsnode *fhandle_t fhandle_t

vnode *

fhandle_tvnode *inode *

fsid

len

Exampleserver-definedfhandle_t

pad ino

NFS

Localfilesystemgen

6

• Argumentsateachlayerarewithspecificscopes– Layerstranslatebetweennamespacesforencapsulation– Contentsofnamesbetweenlayersoftenopaque

1/13/17

4

NFSisstateless

• KeyNFSdesigndecisiontoeasefaultrecovery– Obviously,filesystemsaren’tstateless,so…

• Statelessmeanstheprotocoldoesn’trequire:– Keepinganyrecordofcurrentclients– Keepinganyrecordofcurrentopenfiles

• Servercancrash+reboot,andclientsdonothavetodoanything(exceptwait!)

• Clientscancrash,andserversdonotneedtodoanything(nocleanupetc)

7

Implicationsofstateless-ness• No“open”or“close”operations

– fh = lookup(<directory fh>, <filename>)– Allfileoperationsareviaper-filehandles

• NoimpliedstatelinkingmultipleRPCs;e.g.,– UNIXfiledescriptorhas“currentoffset”forI/O:

read(fd, buf, 2048)– NFSfilehandlehasnooffset;operationsareexplicit:

read(fh, buf, offset, 2048)• Thismakesmanyoperationsidempotent– ThisuseofSunRPC givesat-least-oncesemantics– Toleratemessageduplicationinnetwork,RPCretries

• ChallengesinprovidingUnixFSsemantics…

8

1/13/17

5

Semantictricks(andmesses)• rename(<old filename>, <new filename>)– Fundamentallynon-idempotent– Strongexpectationofatomicity– Servers-side“cache”recentRPCrepliesforreplay

• unlink(<old filename>)– UNIXrequiresopenfilestopersistafterunlink()– Whatiftheserverremovesafilethatisopenonaclient?– Sillyrename:clientstranslateunlink() torename()– Onlywithinclient(notserverdelete,norforotherclients)– Otherclientswillhaveastale filehandle:ESTALE

• Statelessfilelocking seemsimpossible– Problemavoided(?):separateRPCprotocols

9

Performanceproblems

• Neithersideknowsifotherisaliveordead– Allwritesmustbesynchronouslycommittedonserverbeforeitreturnssuccess

• Verylimitedclientcaching…– Riskofinconsistentupdatesifmultipleclientshavefileopenforwritingatthesametime

• ThesetwofactsalonemeantthatNFSv2hadtrulydreadful performance

10

1/13/17

6

NFSv3(1995)

• Mostlyminorprotocolenhancements– Scalability

• Removelimitsonpath- andfile-namelengths• Allow64-bitoffsetsforlargefiles• Allowlarge(>8KB)transfer-sizenegotiation

– Explicitasynchrony• Servercandoasynchronouswrites(write-back)• Clientsendsexplicitcommit aftersome#writes• Filetimestampspiggybackedonserverrepliesallowclientstomanagecache:close-to-openconsistency

– OptimizedRPCs(readdirplus,symlink)• Buthadmajor impactonperformance

11

NFSv3readdirplus

• NFSv2behaviour for“ls –l”– readdir() triggersNFS_READDIR torequestnamesandhandles

– stat() oneachfiletriggersoneNFS_GETATTR RPC

• NFS3_READDIRPLUS returnsanames,handles,andattributes– Eliminatesavastnumberof

round-triptimes• Principle:masknetworklatencyby

batchingsynchronousoperations

NFSv2Client

NFSv2Server

NFSv3Client

NFSv3Server

NFSv3

READDIR

GETATTR

GETATTR

GETATTR

READDIRPLUS

4xRTT(Round-TripTime)

1xRTT

drwxr-xr-x 55 al565 al565 12288 Feb 8 15:47 al565/drwxr-xr-x 115 am21 am21 49152 Feb 10 18:19 am21/drwxr-xr-x 214 atm26 atm26 36864 Feb 1 17:09 atm26/

12

1/13/17

7

Distributedfilesystemconsistency• Canadistributedapplicationexpectdatawrittenonclient

A tobevisibletoclientB?– Afterwrite() onA,willaread() onB seeit?– WhatifaprocessonA writestoafile,andthensendsa

messagetoaprocessonBtoreadthefile?• InNFSv3,no!

– Amayhavefreshlywrittendatainitscache thatithasnotyetsenttotheserverviaawriteRPC

– TheserverwillreturnstaledatatoB’sreadRPC• Or:

– Bmayreturnstaledatainitscachefromapriorread• Thisproblemisknownasinconsistency:

– Clientsmayseedifferentversionsofthesamesharedobject

13

NFSclose-to-openconsistency(1)• Guaranteeingglobalvisibilityforeverywrite()requiredsynchronousRPCsandpreventedcaching

• NFSv3implementsclose-to-open consistency,whichreducessynchronousRPCsandpermitscaching1. Foreachfileitstores,theservermaintainsa timestamp

ofthelastwriteperformed2. Whenafileisopened,theclientreceivesthetimestamp;

ifthetimestamphaschangedsincedatawascached,theclientinvalidates itsreadcache,forcingfreshreadRPCs

3. Whilethefileisopen,datareads/writesforthefilecanbecachedontheclient,andwriteRPCscanbedeferred

4. Whenthefileisclosed,pendingwritesmustbesenttotheserver(andack’d)beforeclose() canreturn

14

1/13/17

8

NFSclose-to-openconsistency(2)• Wenowhaveaconsistencymodelthatprogrammerscan

usetoreasonaboutwhenwriteswillbevisibleinNFS:– IfaprogramonhostA needswritestoafiletobevisibletoa

programonhostB,itmustclose() thefile– IfaprogramonhostB needsreadsfromafiletoincludethose

writes,itmustopen() itafter thecorrespondingclose()• Thisworksquitewellforsomeapplications– E.g.,distributedbuilds:inputs/outputsarewholefiles– E.g.,UNIXmaildir format(eachemailinitsownfile)

• Itworksverybadlyforothers– E.g.,long-runningdatabasesthatmodifyrecordswithinafile– E.g.,UNIXmbox format(allemailsinonelargefile)

• ApplicationsusingNFStosharedatamustbedesignedforthesesemantics,ortheywillbehaveverybadly!

15

NFSv4(2003)

• Timeforamajorrethink– Singlestateful protocol(includingmount,lock)– TCP(oratleastreliabletransport)only– Explicitopen andclose operations– Sharereservations– Delegation– Arbitrarycompoundoperations–ManylessonslearnedfromAFS(laterinterm)

• Nowseeingwidespreaddeployment

16

1/13/17

9

ImprovingoverSunRPC• SunRPC (now“ONCRPC”)verysuccessfulbut– Clunky(manualprogram,procedurenumbers,etc)– Limitedtypeinformation(evenwithXDR)– Hardtoscalebeyondsimpleclient/server

• OneimprovementwasOSFDCE(early90’s)– AnotherprojectthatlearnedfromAFS– DCE=“DistributedComputingEnvironment”– Largermiddlewaresystemincludingadistributedfilesystem,adirectoryservice,andDCERPC

– Dealswithacollectionofmachines– acell – ratherthanjustwithindividualclientsandservers

17

DCERPCversusSunRPC• Quitesimilarinmanyways– InterfaceswritteninInterfaceDefinitionNotation(IDN),andcompiledtoskeletonsandstubs

– NDRwireformat:little-endianbydefault!– Canoperateovervarioustransportprotocols

• Bettersecurity,andlocationtransparency– Servicesidentifiedby128-bit“Universally”UniqueIdentifiers(UUIDs),generatedbyuuidgen

– ServerregistersUUIDwithcell-widedirectoryservice– Clientcontactsdirectoryservicetolocateserver…whichsupportsservicemove,orreplication

18

1/13/17

10

Object-OrientedMiddleware• SunRPC /DCERPCforwardfunctions,anddonothavesupportformorecomplextypes,exceptions,orpolymorphism

• Object-OrientedMiddleware(OOM)aroseintheearly90stoaddressthis– AssumeprogrammeriswritinginOO-style– ’Remoteobjects’willbehavelikelocalobjects,buttheymethodswillbeforwardedoverthenetworkalaRPC

– Referencestoobjectscanbepassedasargumentsorreturnvalues– e.g.,passingadirectoryobjectreference

• Makesitmucheasiertoprogram– especiallyifyourprogramisobjectoriented!

19

CORBA(1989)

• FirstOOMsystemwasCORBA– CommonObjectRequestBrokerArchitecture– specifiedbytheOMG:ObjectManagementGroup

• OMA(ObjectManagementArchitecture)isthegeneralmodelofhowobjectsinteroperate– Objectsprovideservices.– Clientsmakesarequesttoanobjectforaservice.– Clientdoesn’tneedtoknowwheretheobjectis,oranythingabouthowtheobjectisimplemented!

– Objectinterfacemustbeknown(public)

20

1/13/17

11

ObjectRequestBroker(ORB)

• TheORBisthecoreofthearchitecture– Connectsclientstoobjectimplementations– Conceptuallyspansmultiplemachines(inpractice,ORBsoftwarerunsoneachmachine)

ORB

ObjectImplementationClient

GeneratedStubCode

GeneratedSkeletonCode

21

InvokingObjects

• Clientsobtainanobjectreference– Typicallyviathenamingservice ortradingservice– (Objectreferencescanalsobesavedforuselater)

• InterfacesdefinedbyCORBAIDL• Clientscancallremotemethodsin2ways:

1. StaticInvocation:usingstubsbuiltatcompiletime(justlikewithRPC)

2. DynamicInvocation:actualmethodcalliscreatedonthefly.Itispossibleforaclienttodiscovernewobjectsatruntimeandaccesstheobjectmethods

22

1/13/17

12

CORBAIDL• Definitionoflanguage-independentremoteinterfaces– LanguagemappingstoC++,Java,Smalltalk,…– TranslationbyIDLcompiler

• Typesystem– basictypes:long(32bit),longlong (64bit),short,float,char,boolean,octet,any,…

– constructedtypes:struct,union,sequence,array,enum– objects (commonsupertypeObject)

• Parameterpassing– in,out,inout (=sendremote,modify,update)– basic&constructedtypespassedbyvalue– objectspassedbyreference

23

CORBAProsandCons• CORBAhassomeuniqueadvantages– Industrystandard(OMG)– Language&OSagnostic:mixandmatch– RicherthansimpleRPC(e.g.interfacerepository,implementationrepository,DIIsupport,…)

– Manyadditionalservices(trading&naming,events&notifications,security,transactions,…)

• However:– Really,reallycomplicated/ugly/buzzwordy– Poorinteroperability,atleastatfirst– Generallytobeavoidedunlessyouneedit!

24

1/13/17

13

Summary+nexttime• NFSasanRPC,distributed-filesystemcasestudy– Retrysemanticsvs.RPCsemantics– Scoping,purevs.impurenames– Close-to-openconsistency– Batchingtomasknetworklatency

• DCERPC• Object-OrientedMiddleware(OOM)• CORBA

• Javaremotemethodinvocation(RMI)• XML-RPC,SOAP,etc,etc,etc.• Startingtotalkaboutdistributedtime

25