distributed systems - university of cambridge · recommended reading • “distributed systems:...
TRANSCRIPT
![Page 1: Distributed systems - University of Cambridge · Recommended Reading • “Distributed Systems: Concepts and Design”, (5th Ed) Coulouriset al, Addison-Wesley 2012 • “DistributedSystems:PrinciplesandParadigms”](https://reader033.vdocuments.site/reader033/viewer/2022042307/5ed3824f62e058372d439bcf/html5/thumbnails/1.jpg)
DistributedsystemsLecture1:Introductiontodistributedsystems;RPC
Lent2016Dr RobertN.M.Watson
(WiththankstoDr StevenHand)
1
![Page 2: Distributed systems - University of Cambridge · Recommended Reading • “Distributed Systems: Concepts and Design”, (5th Ed) Coulouriset al, Addison-Wesley 2012 • “DistributedSystems:PrinciplesandParadigms”](https://reader033.vdocuments.site/reader033/viewer/2022042307/5ed3824f62e058372d439bcf/html5/thumbnails/2.jpg)
RecommendedReading
• “DistributedSystems:ConceptsandDesign”,(5th Ed)Coulouris etal,Addison-Wesley2012
• “Distributed Systems: Principles and Paradigms”(2nd Ed),Tannenbaum etal,PrenticeHall,2006
• “OperatingSystems,ConcurrentandDistributedS/WDesign“,Bacon&Harris,Addison-Wesley2003– or“ConcurrentSystems”,(2nd Ed),JeanBacon,Addison-Wesley1997
2
![Page 3: Distributed systems - University of Cambridge · Recommended Reading • “Distributed Systems: Concepts and Design”, (5th Ed) Coulouriset al, Addison-Wesley 2012 • “DistributedSystems:PrinciplesandParadigms”](https://reader033.vdocuments.site/reader033/viewer/2022042307/5ed3824f62e058372d439bcf/html5/thumbnails/3.jpg)
WhatareDistributedSystems?
• Asetofdiscretecomputers(“nodes”)thatcooperatetoperformacomputation– Operates“asif”itwereasinglecomputingsystem
• Examplesinclude:– Computeclusters(e.g.CERN,HPCF)– BOINC(akaSETI@Homeandfriends)– Distributedstoragesystems(e.g.NFS,Dropbox,…)– TheWeb(client/server;CDNs;andback-endtoo!)– Peer-to-peersystemssuchasTor– Vehicles,factories,buildings(?)
3
![Page 4: Distributed systems - University of Cambridge · Recommended Reading • “Distributed Systems: Concepts and Design”, (5th Ed) Coulouriset al, Addison-Wesley 2012 • “DistributedSystems:PrinciplesandParadigms”](https://reader033.vdocuments.site/reader033/viewer/2022042307/5ed3824f62e058372d439bcf/html5/thumbnails/4.jpg)
Concurrentsystemsreminder• Foundationsofconcurrency:processor(s),ISAs,threads• Mutualexclusion:locks,semaphores,monitors,etc.• Producer-consumer,activeobjects,messagepassing• Races,deadlock,livelock,starvation,priorityinversion• Transactions,ACID,isolation,serialisability,schedules• 2-phaselocking,rollback,time-stampordering(TSO),optimisticconcurrencycontrol(OCC)
• Durability,write-aheadlogging,crashrecovery• Lock-freealgorithms,transactionalmemory• Operating-systemcasestudy
4
Theseproblemswerenotdifficultenough– distributedsystemsadd:lossofglobalvisibility;lossofglobalordering;newfailuremodes
![Page 5: Distributed systems - University of Cambridge · Recommended Reading • “Distributed Systems: Concepts and Design”, (5th Ed) Coulouriset al, Addison-Wesley 2012 • “DistributedSystems:PrinciplesandParadigms”](https://reader033.vdocuments.site/reader033/viewer/2022042307/5ed3824f62e058372d439bcf/html5/thumbnails/5.jpg)
DistributedSystems:Advantages• Scaleandperformance– Cheapertobuy100PCsthanasupercomputer…– …andeasiertoincrementallyscaleuptoo!
• SharingandCommunication– Allowaccesstosharedresources(e.g.aprinter)andinformation(e.g.distributedFSorDBMS)
– Enableexplicitcommunicationbetweenmachines(e.g.EDI,CDNs)orpeople(e.g.email,twitter)
• Reliability– Canhopefullycontinuetooperateevenifsomepartsofthesystemareinaccessible,orsimplycrash
5
![Page 6: Distributed systems - University of Cambridge · Recommended Reading • “Distributed Systems: Concepts and Design”, (5th Ed) Coulouriset al, Addison-Wesley 2012 • “DistributedSystems:PrinciplesandParadigms”](https://reader033.vdocuments.site/reader033/viewer/2022042307/5ed3824f62e058372d439bcf/html5/thumbnails/6.jpg)
DistributedSystems:Challenges
• DistributedSystemsareConcurrentSystems– Needtocoordinateindependentexecutionateachnode(c/ffirstpartofcourse)
• Failureofanycomponents(nodes,network)– Atanytime,foranyreason
• Networkdelays– Can’tdistinguishcongestionfromcrash/partition
• Noglobaltime– Trickytocoordinate,orevenagreeonordering!
6
![Page 7: Distributed systems - University of Cambridge · Recommended Reading • “Distributed Systems: Concepts and Design”, (5th Ed) Coulouriset al, Addison-Wesley 2012 • “DistributedSystems:PrinciplesandParadigms”](https://reader033.vdocuments.site/reader033/viewer/2022042307/5ed3824f62e058372d439bcf/html5/thumbnails/7.jpg)
Kernel
Localnetwork/OSservices
Kernel
Localnetwork/OSservices
Middleware
• Middleware helpsapplicationauthorswritesoftwareintendedtorunonmorethanonemachineatatime. 7
E.g.,TCP/IP,Ethernet
MachineBMachineA MachineB
Kernel
Localnetwork/OSservices
Middlewareservices
Distributedapplications
Network
E.g.,Linux,BSD,
Windows
E.g.,Javaruntime
E.g.,JavaRMI
Whatyouactuallywantedto
do!
![Page 8: Distributed systems - University of Cambridge · Recommended Reading • “Distributed Systems: Concepts and Design”, (5th Ed) Coulouriset al, Addison-Wesley 2012 • “DistributedSystems:PrinciplesandParadigms”](https://reader033.vdocuments.site/reader033/viewer/2022042307/5ed3824f62e058372d439bcf/html5/thumbnails/8.jpg)
Transparency&Middleware• Recalladistributedsystemshouldappear“asif”itwereexecutingonasinglecomputer
• Weoftencallthistransparency:– Userisunawareofmultiplemachines– Programmerisunawareofmultiplemachines
• How“unaware”canvaryquiteabit– e.g.webuserawarethatthere’snetworkcommunication...butnotthenumberorlocationofthemachinesinvolved
– e.g.programmermayexplicitlycodecommunication,ormayhavelayersofabstraction:middleware
8
![Page 9: Distributed systems - University of Cambridge · Recommended Reading • “Distributed Systems: Concepts and Design”, (5th Ed) Coulouriset al, Addison-Wesley 2012 • “DistributedSystems:PrinciplesandParadigms”](https://reader033.vdocuments.site/reader033/viewer/2022042307/5ed3824f62e058372d439bcf/html5/thumbnails/9.jpg)
ClassicaltypesofTransparencyTransparency Description
Access Hide differences in data representation and how a resource is accessed
Location Hide where a resource is located
Migration Hide that a resource may move to another location
Relocation Hide that a resource may be moved to another location while in use
Replication Hide that a resource may be provided by multiple cooperating systems
Concurrency Hide that a resource may be simultaneously shared by several competitive users
Failure Hide the failure and recovery of a resource
Persistence Hide whether a (software) resource is in memory or on disk
9Scalability increasinglyimportant– “performancetransparency”?
![Page 10: Distributed systems - University of Cambridge · Recommended Reading • “Distributed Systems: Concepts and Design”, (5th Ed) Coulouriset al, Addison-Wesley 2012 • “DistributedSystems:PrinciplesandParadigms”](https://reader033.vdocuments.site/reader033/viewer/2022042307/5ed3824f62e058372d439bcf/html5/thumbnails/10.jpg)
InthisCourse• Wewilllookattechniques,protocols&algorithmsusedindistributedsystems– inmanycases,thesewillbeprovidedforyoubyamiddlewaresoftwaresuite
– butknowinghowthingsworkwillstillbeuseful!• AssumeOS&networkingsupport– processes,threads,synchronization– basiccommunicationviamessages– (willseelaterhowassumptionsaboutmessageswillinfluencethesystemswe[can]build)
• Let’sstartwithasimpleclient-serversystems
10
![Page 11: Distributed systems - University of Cambridge · Recommended Reading • “Distributed Systems: Concepts and Design”, (5th Ed) Coulouriset al, Addison-Wesley 2012 • “DistributedSystems:PrinciplesandParadigms”](https://reader033.vdocuments.site/reader033/viewer/2022042307/5ed3824f62e058372d439bcf/html5/thumbnails/11.jpg)
Client-ServerModel• 1970s:developmentofLocalAreaNetworks(LANs)• 1980s:standarddeploymentinvolvessmallnumberofservers,plusmanyworkstations– Servers:always-on,powerfulmachines– Workstations:personalcomputers
• Workstationsrequest‘service’fromserversoverthenetwork,e.g.accesstoasharedfile-system:
11
![Page 12: Distributed systems - University of Cambridge · Recommended Reading • “Distributed Systems: Concepts and Design”, (5th Ed) Coulouriset al, Addison-Wesley 2012 • “DistributedSystems:PrinciplesandParadigms”](https://reader033.vdocuments.site/reader033/viewer/2022042307/5ed3824f62e058372d439bcf/html5/thumbnails/12.jpg)
Request-ReplyProtocols• Basicscheme:– Clientissuesarequestmessage– Serverperformsoperation,andsendsreply
• Simplestversionissynchronous:– clientblocksawaitingreply
• Example:HTTP1.0– Client(browser)sends“GET/index.html”– Webserverfetchesfileandreturnsit– BrowserdisplaysHTMLwebpage
• Laterwewilltalkaboutasynchronousmodels:– Clientscancontinueworkwithoutblockingawaitingreply
12
![Page 13: Distributed systems - University of Cambridge · Recommended Reading • “Distributed Systems: Concepts and Design”, (5th Ed) Coulouriset al, Addison-Wesley 2012 • “DistributedSystems:PrinciplesandParadigms”](https://reader033.vdocuments.site/reader033/viewer/2022042307/5ed3824f62e058372d439bcf/html5/thumbnails/13.jpg)
HandlingErrors&Failures
• Errors areapplication-level things=>easy;-)– E.g.clientrequestsnon-existentwebpage– Needspecialreply(e.g.“404NotFound”)
• Failures aresystem-level things,e.g.:– lostmessage,client/servercrash,networkdown,…
• Tohandlefailure,clientmusttimeout ifitdoesn’treceiveareplywithinacertaintimeT– Ontimeout,clientcanretry request– (Q:whatshouldwesetTto?)
13
![Page 14: Distributed systems - University of Cambridge · Recommended Reading • “Distributed Systems: Concepts and Design”, (5th Ed) Coulouriset al, Addison-Wesley 2012 • “DistributedSystems:PrinciplesandParadigms”](https://reader033.vdocuments.site/reader033/viewer/2022042307/5ed3824f62e058372d439bcf/html5/thumbnails/14.jpg)
RetrySemantics• Clientcouldtimeoutbecause:
1. Requestwaslost2. Requestwassent,butservercrashedonreceipt3. Requestwassent&received,andserverperformedoperation
(orsomeofit?),butcrashedbeforereplying4. Requestwassent&received,andserverperformedoperation
correctly,andsentreply…whichwasthenlost5. As#4,butreplyhasjustbeendelayedforlongerthanT
• Forread-onlystatelessrequests(likeHTTPGET),canretryinallcases,butwhatifrequestwasanorderwithAmazon?– Incase#1,weprobablywanttore-order…andincase#5we
wanttowaitforalittlebitlonger,andotherwisewe…erm?• Worse:wedon’tknowwhatcaseitactuallywas!
14
![Page 15: Distributed systems - University of Cambridge · Recommended Reading • “Distributed Systems: Concepts and Design”, (5th Ed) Coulouriset al, Addison-Wesley 2012 • “DistributedSystems:PrinciplesandParadigms”](https://reader033.vdocuments.site/reader033/viewer/2022042307/5ed3824f62e058372d439bcf/html5/thumbnails/15.jpg)
IdealSemantics
• Whatwewantisexactly-once semantics:– Ourrequestoccursoncenomatterhowmanytimesweretry(orifthenetworkduplicatesourmessages)
• E.g.addauniqueIDtoeveryrequest– ServerremembersIDs,andassociatedresponses– Ifseesaduplicate,justreturnsoldresponse– Clientignoresduplicateresponses
• Prettytrickytoensureexactly-onceinpractice– e.g.ifserverexplodes;-)
15
![Page 16: Distributed systems - University of Cambridge · Recommended Reading • “Distributed Systems: Concepts and Design”, (5th Ed) Coulouriset al, Addison-Wesley 2012 • “DistributedSystems:PrinciplesandParadigms”](https://reader033.vdocuments.site/reader033/viewer/2022042307/5ed3824f62e058372d439bcf/html5/thumbnails/16.jpg)
PracticalSemantics• Inpractice,protocolsguaranteeoneof:• All-or-nothing (atomic)semantics
– Useschemeonpreviouspage;persistentlog– (similarideatotransactionprocessing).
• At-most-once semantics– Requestcarriedoutonce,ornotatall– Ifnoreply,wedon’tknowwhichoutcomeitwas– e.g.sendonerequest;giveupontimeout
• At-least-once semantics– Retryontimeout; riskoperationoccurringagain– Okiftheoperationisread-only,oridempotent
• Note:Assumptionofnonetworkduplication
16
Serverstatenotrequired
Serverstaterequiredtosuppressretries
![Page 17: Distributed systems - University of Cambridge · Recommended Reading • “Distributed Systems: Concepts and Design”, (5th Ed) Coulouriset al, Addison-Wesley 2012 • “DistributedSystems:PrinciplesandParadigms”](https://reader033.vdocuments.site/reader033/viewer/2022042307/5ed3824f62e058372d439bcf/html5/thumbnails/17.jpg)
RemoteProcedureCall(RPC)• Request/responseprotocolsareuseful– andwidelyused– butratherclunkytouse– e.g.needtodefinethesetofrequests,includinghowtheyarerepresentedinnetworkmessages
• AnicerabstractionisRemoteProcedureCall(RPC)– Programmersimplyinvokesaprocedure…– …butitexecutesonaremotemachine(theserver)– RPCsubsystemhandlesmessageformats,sending&receiving,handlingtimeouts,etc
• Aimistomakedistribution(mostly)transparent– Certainfailurecaseswouldn’thappenlocally– Distributedandlocalfunctioncallperformancedifferent
17
![Page 18: Distributed systems - University of Cambridge · Recommended Reading • “Distributed Systems: Concepts and Design”, (5th Ed) Coulouriset al, Addison-Wesley 2012 • “DistributedSystems:PrinciplesandParadigms”](https://reader033.vdocuments.site/reader033/viewer/2022042307/5ed3824f62e058372d439bcf/html5/thumbnails/18.jpg)
MarshallingArguments
• RPCisintegratedwiththeprogramminglanguage– Someadditionalmagictospecifythingsareremote
• RPClayermarshals parameterstothecall,aswellasanyreturnvalue(s),e.g.
Caller RPCService RPCService RemoteFunction
call(…)
1)Marshalargs2)GenerateID4)Starttimer 5)Unmarshal args
6)RecordID
7)Marshalreturnvalues
9)Settimer10)Unmarshal
returnvalues11)Acknowledge
fun(…)
3)Sendmessage
18
8)Sendreply
![Page 19: Distributed systems - University of Cambridge · Recommended Reading • “Distributed Systems: Concepts and Design”, (5th Ed) Coulouriset al, Addison-Wesley 2012 • “DistributedSystems:PrinciplesandParadigms”](https://reader033.vdocuments.site/reader033/viewer/2022042307/5ed3824f62e058372d439bcf/html5/thumbnails/19.jpg)
IDLsandStubs• Tomarshal,theRPClayer(onbothsides!)mustknow:
– howmanyargumentstheprocedurehas,– howmanyresultsareexpected,and– thetypesofalloftheabove
• TheprogrammermustspecifythisbydescribingthingsinanInterfaceDefinitionLanguage(IDL)– Inhigher-levellanguages,thismayalreadybeincludedas
standard(e.g.C#,Java)– Inothers(e.g.C),IDLispartofthemiddleware
• TheRPClayercanthenautomaticallygeneratestubs– Smallpiecesofcodeatclientandserver(seeprevious)– Mayalsoprovideauthentication,encryption– Providesintegrity,confidentiality
19
![Page 20: Distributed systems - University of Cambridge · Recommended Reading • “Distributed Systems: Concepts and Design”, (5th Ed) Coulouriset al, Addison-Wesley 2012 • “DistributedSystems:PrinciplesandParadigms”](https://reader033.vdocuments.site/reader033/viewer/2022042307/5ed3824f62e058372d439bcf/html5/thumbnails/20.jpg)
Example:SunRPC• Developedmid80’sforSunUnixsystems• Simplerequest/responseprotocol:– Serverregistersoneormore“programs”(services)– Clientissuesrequeststoinvokespecificprocedureswithinaspecificservice
• Messagescanbesentoveranytransportprotocol(mostcommonlyUDP/IPandlaterTCP/IP)– RequestshaveauniquetransactionIDthatcanbeusedtodetect&handleretransmissions
– At-least-once semantics– Varioustypesofaccesstransparency includingbyte-order
20
![Page 21: Distributed systems - University of Cambridge · Recommended Reading • “Distributed Systems: Concepts and Design”, (5th Ed) Coulouriset al, Addison-Wesley 2012 • “DistributedSystems:PrinciplesandParadigms”](https://reader033.vdocuments.site/reader033/viewer/2022042307/5ed3824f62e058372d439bcf/html5/thumbnails/21.jpg)
XDR:ExternalDataRepresentation
• SunRPC usedXDR fordescribinginterfaces:
21
// file: test.xprogram test {
version testver { int get(getargs) = 1; // procedure numberint put(putargs) = 2; // procedure number
} = 1; // version number} = 0x12345678; // program number
• rpcgen generates[un]marshalingcode,stubs• Singlearguments…butrecursivelyconvertvalues• Somesupportforfollowingpointerstoo
• Dataonthewirealwaysinbig-endianformat(oops!)
![Page 22: Distributed systems - University of Cambridge · Recommended Reading • “Distributed Systems: Concepts and Design”, (5th Ed) Coulouriset al, Addison-Wesley 2012 • “DistributedSystems:PrinciplesandParadigms”](https://reader033.vdocuments.site/reader033/viewer/2022042307/5ed3824f62e058372d439bcf/html5/thumbnails/22.jpg)
UsingSunRPC1. WriteXDR,anduserpcgen togenerateskeletoncode2. Fillinblanks(i.e.writeclient/serverparts),compilecode3. Runserverprogram®isterwithportmapper (now:
rpcbind)– Mappingsfrom{prog#,ver#,proto}->port– (onLinux/UNIX,try“/usr/sbin/rpcinfo –p”)– Portmapper isitselfanRPCserviceonawell-knownport
4. Serverprocesswillthenlisten(),awaitingclients5. Whenaclientstarts,clientstubcallsclnt_create()
– Sends{prog#,ver#,proto}toportmapper onserver,receivesappropriateportnumbertouseforactualRPCconnection
– Clientinvokesremoteproceduresasneeded6. Recently:GSSauthentication/encryption– e.g.,Kerberos
22
![Page 23: Distributed systems - University of Cambridge · Recommended Reading • “Distributed Systems: Concepts and Design”, (5th Ed) Coulouriset al, Addison-Wesley 2012 • “DistributedSystems:PrinciplesandParadigms”](https://reader033.vdocuments.site/reader033/viewer/2022042307/5ed3824f62e058372d439bcf/html5/thumbnails/23.jpg)
Summary+nexttime• Aboutthiscourse• Advantagesandchallengesofdistributedsystems• Typesoftransparency(+scalability)• Middleware,theclient-servermodel• Errorsandretrysemantics• RPC,marshalling,SunRPC,andXDR
• Sun’sNetworkFileSystem(NFS)• Object-OrientedMiddleware(OOM)
23