![Page 1: Distributed Systems Communicaon and models · E.g. different types of failures occur with corresponding probabili3es 26 Node failures • Common abstract models – Stopping failure:](https://reader033.vdocuments.site/reader033/viewer/2022050104/5f4284773e501f4dd355ed42/html5/thumbnails/1.jpg)
DistributedSystems
Communica3onandmodels
RikSarkarSpring2018
UniversityofEdinburgh
![Page 2: Distributed Systems Communicaon and models · E.g. different types of failures occur with corresponding probabili3es 26 Node failures • Common abstract models – Stopping failure:](https://reader033.vdocuments.site/reader033/viewer/2022050104/5f4284773e501f4dd355ed42/html5/thumbnails/2.jpg)
Models
• Expecta3ons/assump3onsaboutthings• Everyideaorac3onanywhereisbasedonamodel
• Determineswhatcanorcannothappen
2
![Page 3: Distributed Systems Communicaon and models · E.g. different types of failures occur with corresponding probabili3es 26 Node failures • Common abstract models – Stopping failure:](https://reader033.vdocuments.site/reader033/viewer/2022050104/5f4284773e501f4dd355ed42/html5/thumbnails/3.jpg)
Communica3on&modeling• Modelingdistributedsystems:– Howwecanthinkaboutthem
• Communica3onbetweennearbynodes• Communica3onbetweendistantnodes• Communica3onwithmanynodesSometerminology:• Oneto“all”:Broadcast– All:insomesetofinterest
• Onetoone:pointtopoint3
![Page 4: Distributed Systems Communicaon and models · E.g. different types of failures occur with corresponding probabili3es 26 Node failures • Common abstract models – Stopping failure:](https://reader033.vdocuments.site/reader033/viewer/2022050104/5f4284773e501f4dd355ed42/html5/thumbnails/4.jpg)
Packets
• Networkscommunicatedatainmessagesoffixed(bounded)size–calledpackets
• Moredatarequiresmorepackets
• NumberofmessagesorpacketstransmiWedisameasureofcommunica3onused
4
![Page 5: Distributed Systems Communicaon and models · E.g. different types of failures occur with corresponding probabili3es 26 Node failures • Common abstract models – Stopping failure:](https://reader033.vdocuments.site/reader033/viewer/2022050104/5f4284773e501f4dd355ed42/html5/thumbnails/5.jpg)
Localareanetworks• Medium:Broadcast
– Messagegoesfromonecomputertoallothercomputers(restrictedtosomeset)• Forexample,allothercomputersintheLAN,orsomeothersysteminconsidera3on
– EthernetLANisabroadcastmedium• Allcomputersareconnectedtoawire.Theytransmitmessagesonthewireandallcanreceive
– WirelessLAN(WiFi)isabroadcastmedium• Electromagne3cwavesisthecommonmedium
5
![Page 6: Distributed Systems Communicaon and models · E.g. different types of failures occur with corresponding probabili3es 26 Node failures • Common abstract models – Stopping failure:](https://reader033.vdocuments.site/reader033/viewer/2022050104/5f4284773e501f4dd355ed42/html5/thumbnails/6.jpg)
Localareanetworks
Advantages:• Sendingacommonmessagetoeveryoneiseasy• Findingdes3na3oniseasy– Messagegoestoeveryone– Justhavea“des3na3on”field
Mainissue:Mediumaccess• Sincemediumisbroadcast,twopeopletransmiangatthesame3megarblesmessage
6
![Page 7: Distributed Systems Communicaon and models · E.g. different types of failures occur with corresponding probabili3es 26 Node failures • Common abstract models – Stopping failure:](https://reader033.vdocuments.site/reader033/viewer/2022050104/5f4284773e501f4dd355ed42/html5/thumbnails/7.jpg)
Mediumaccess• Onlyonetransmissionata3mecanbeallowed– Mutualexclusionproblem(sharedresourceofcommunica3onwire)
– Wecannotusemessagestosolveit– Protocols:
• TDMA:Everyonehasaperiodicslot• CSMA:Seeifanyoneelseistransmiang.Ifso,defer.• Usuallyacksarealsousedtoensuretransmission
– Retransmitifnecessary
– Bandwidthreduceswithnumberofnodestryingtotransmit.• OneLANshouldnothavetoomanynodes
7
![Page 8: Distributed Systems Communicaon and models · E.g. different types of failures occur with corresponding probabili3es 26 Node failures • Common abstract models – Stopping failure:](https://reader033.vdocuments.site/reader033/viewer/2022050104/5f4284773e501f4dd355ed42/html5/thumbnails/8.jpg)
Mediumaccess
• Wireless:morecomplicated• Hiddenterminalproblem• Morecomplexprotocolusingacks
8
![Page 9: Distributed Systems Communicaon and models · E.g. different types of failures occur with corresponding probabili3es 26 Node failures • Common abstract models – Stopping failure:](https://reader033.vdocuments.site/reader033/viewer/2022050104/5f4284773e501f4dd355ed42/html5/thumbnails/9.jpg)
OurmodelsofLAN
• Graph:Everynodehasanedgetoeveryother
• Weofenassumethattosendamessage(packet)toanodeonthesamenetworktakesoneunitof3me(or,atmostaconstant)
• ThismaynotbetrueiftherecanbemanynodesinthesameLAN– Butusuallythenumberisnotverylarge
9
![Page 10: Distributed Systems Communicaon and models · E.g. different types of failures occur with corresponding probabili3es 26 Node failures • Common abstract models – Stopping failure:](https://reader033.vdocuments.site/reader033/viewer/2022050104/5f4284773e501f4dd355ed42/html5/thumbnails/10.jpg)
Reallifenetworks
• LANsconnectedbyrouters
Ethernet/WifiEthernet/Wifi
PointtoPoint
Routers
10
![Page 11: Distributed Systems Communicaon and models · E.g. different types of failures occur with corresponding probabili3es 26 Node failures • Common abstract models – Stopping failure:](https://reader033.vdocuments.site/reader033/viewer/2022050104/5f4284773e501f4dd355ed42/html5/thumbnails/11.jpg)
1 4
2 4
3 4
4 4
1 3
2 3
3 3
5 5
1 1
2 2
4 4
5 4
2 2
3 3
4 3
5 3
Rou3ng• Findingapathinthenetwork• Everynodehasarou3ngtable• EquivalenttoaBFStreeforeverynode
1
2
4
3
5
1 1
3 3
4 3
5 3 11
![Page 12: Distributed Systems Communicaon and models · E.g. different types of failures occur with corresponding probabili3es 26 Node failures • Common abstract models – Stopping failure:](https://reader033.vdocuments.site/reader033/viewer/2022050104/5f4284773e501f4dd355ed42/html5/thumbnails/12.jpg)
1-4 4
1-3 3
5 5
1 1
2 2
4-5 42 2
3-5 3
Rou3ng:Distributedsearchforapath
• Smallerrou3ngtablesbycombiningaddresses• UsedinIP(Internet)rou3ng• Smallerrou3ngtablesarepreferable
1
2
4
3
5
1 1
3-5 3
12
![Page 13: Distributed Systems Communicaon and models · E.g. different types of failures occur with corresponding probabili3es 26 Node failures • Common abstract models – Stopping failure:](https://reader033.vdocuments.site/reader033/viewer/2022050104/5f4284773e501f4dd355ed42/html5/thumbnails/13.jpg)
Rou3ng
• Realrou3ngismorecomplicated• Withmorethanonepathtoades3na3on,backupsetc
13
![Page 14: Distributed Systems Communicaon and models · E.g. different types of failures occur with corresponding probabili3es 26 Node failures • Common abstract models – Stopping failure:](https://reader033.vdocuments.site/reader033/viewer/2022050104/5f4284773e501f4dd355ed42/html5/thumbnails/14.jpg)
Related:Loca3onbasedrou3ng
• Geographicalrou3ngusesanode’sloca3ontodiscoverpathtothatnode.
x
y
GreedyRou3ng:Forwardtotheneighborthatisnearesttothedes3na3on
14
![Page 15: Distributed Systems Communicaon and models · E.g. different types of failures occur with corresponding probabili3es 26 Node failures • Common abstract models – Stopping failure:](https://reader033.vdocuments.site/reader033/viewer/2022050104/5f4284773e501f4dd355ed42/html5/thumbnails/15.jpg)
Largenetworks• Communica3onistypicallypoint-to-pointusingrou3ng
• Broadcastisnotautoma3c– Ifweneedbroadcast,wewillhavetoarrangeaflood(orsomeothermethod)
Ethernet/WifiEthernet/Wifi
PointtoPoint
Routers
15
![Page 16: Distributed Systems Communicaon and models · E.g. different types of failures occur with corresponding probabili3es 26 Node failures • Common abstract models – Stopping failure:](https://reader033.vdocuments.site/reader033/viewer/2022050104/5f4284773e501f4dd355ed42/html5/thumbnails/16.jpg)
Transportmanagement• UDP:
– Sendapacket,hopethatthenetworkroutesanddeliversit,in3me
– NoSequencenumber• NotnecessarilyFIFO
– Usefulinstreamingaudio/video.Notforimportantdata.
• TCP:– Sendapacket(orfewpackets)– Packetshavesequencenumber
• FIFO– Ifnoacksarrive,resendpackets– Ifnoacksarefoundafermanytries,returnerror
16
![Page 17: Distributed Systems Communicaon and models · E.g. different types of failures occur with corresponding probabili3es 26 Node failures • Common abstract models – Stopping failure:](https://reader033.vdocuments.site/reader033/viewer/2022050104/5f4284773e501f4dd355ed42/html5/thumbnails/17.jpg)
TCP
• Doesdistributedconges3oncontrol– Whenpacketsdon’tgetdelivered,TCPslowsdownthestream• Assump3on:routersdroppacketswhentherearetoomany
• Difficulty– Acksmaynotarriveduetootherfactors• Someconnec3onfailedtemporarily• Usermovedfromonenetworktoanother
17
![Page 18: Distributed Systems Communicaon and models · E.g. different types of failures occur with corresponding probabili3es 26 Node failures • Common abstract models – Stopping failure:](https://reader033.vdocuments.site/reader033/viewer/2022050104/5f4284773e501f4dd355ed42/html5/thumbnails/18.jpg)
Networkstack
• Eachlayersolvesadifferentdistributedproblem
18
Formwikipedia
![Page 19: Distributed Systems Communicaon and models · E.g. different types of failures occur with corresponding probabili3es 26 Node failures • Common abstract models – Stopping failure:](https://reader033.vdocuments.site/reader033/viewer/2022050104/5f4284773e501f4dd355ed42/html5/thumbnails/19.jpg)
Communica3on:Overlaynetwork• Wemaysome3mesignorepartsofthenetwork
– Nodesthatcarrymessagesbutdonotdirectlypar3cipateeg.routers
– Oredgesthatexistbutwearenotusing– Orwedon’tknowabout
• Ofenusedinpeer-to-peernetworks– Noteverynodeknowsallothernodesinthenetwork– Butcommunicatestoknownnodesthroughrou3ng
19
![Page 20: Distributed Systems Communicaon and models · E.g. different types of failures occur with corresponding probabili3es 26 Node failures • Common abstract models – Stopping failure:](https://reader033.vdocuments.site/reader033/viewer/2022050104/5f4284773e501f4dd355ed42/html5/thumbnails/20.jpg)
Blockingandnon-blockingcommunica3on
• Blockingcommunica3on– Sendersendsmessage,waitsun3lreceiverreplies– Doesnotdoanythinginthemean3me
• Non–blockingcommunica3on– Sendersendsmessage,thencon3nuesitsownworkwithoutwai3ng
– Whenreceiverreplies,orsomeothermessagearrives,nodeinterruptscurrentworktohandlemessage
• Some3mesthesearecalledsynchronous/asynchronous,butwewilltrynottousethat
20
![Page 21: Distributed Systems Communicaon and models · E.g. different types of failures occur with corresponding probabili3es 26 Node failures • Common abstract models – Stopping failure:](https://reader033.vdocuments.site/reader033/viewer/2022050104/5f4284773e501f4dd355ed42/html5/thumbnails/21.jpg)
Computa3on• Synchronous:– Opera3oninrounds– Inaround,anodeperformssomecomputa3on,andthensendssomemessages
– Allmessagessentattheendofroundxareavailabletorecipientsatstartofroundx+1• Butnotearlier
21
Computa3on
Communica3on
Round1 Round2 Round3
![Page 22: Distributed Systems Communicaon and models · E.g. different types of failures occur with corresponding probabili3es 26 Node failures • Common abstract models – Stopping failure:](https://reader033.vdocuments.site/reader033/viewer/2022050104/5f4284773e501f4dd355ed42/html5/thumbnails/22.jpg)
Communica3on
• Synchronous– Canbeimplementedifmessagetransmission3meisboundedbysomeconstantsaym
– Computa3on3mesforallnodesareboundedbysomeconstantc
– Clocksaresynchronized(sufficiently)– Thenseteachroundtobem+cindura3on
22
![Page 23: Distributed Systems Communicaon and models · E.g. different types of failures occur with corresponding probabili3es 26 Node failures • Common abstract models – Stopping failure:](https://reader033.vdocuments.site/reader033/viewer/2022050104/5f4284773e501f4dd355ed42/html5/thumbnails/23.jpg)
AsynchronousCommunica3on
• Nosynchroniza3onorrounds– Nodescomputeatdifferentandarbitraryspeeds– Messagesproceedatdifferentspeeds:maybearbitrarilydelayed,maybereceivedatany3me
• Worstcasemodel– Noassump3onaboutspeedsofprocessesorchannel– (Butdoesnotincludecommunica3on/computa3onerrors)
23
![Page 24: Distributed Systems Communicaon and models · E.g. different types of failures occur with corresponding probabili3es 26 Node failures • Common abstract models – Stopping failure:](https://reader033.vdocuments.site/reader033/viewer/2022050104/5f4284773e501f4dd355ed42/html5/thumbnails/24.jpg)
AsynchronousCommunica3on
• Hardertomanage– Messagecanarriveatany3meaferbeingsent,mustbehandledsuitably
– Possibletomakesomesimplifyingassump3onsE.g.:• ChannelsareFIFO:orderofmessagesonachannelarepreserved• Somecodeblocksareatomic(notinterruptedbymessages)• Eithercommunica3onorcomputa3on3mesbounded
24
![Page 25: Distributed Systems Communicaon and models · E.g. different types of failures occur with corresponding probabili3es 26 Node failures • Common abstract models – Stopping failure:](https://reader033.vdocuments.site/reader033/viewer/2022050104/5f4284773e501f4dd355ed42/html5/thumbnails/25.jpg)
Synchronouscommunica3oninRealsystems
• Synchronouscommunica3oncanbeafairmodel• Moderncomputersandnetworksarefast– (thoughnotarbitrarilyfast)
• Easiertodesignalgorithmsandanalyze• Welldesignedalgorithmsarefasterandmoreefficient
• Ofencanbeadaptedtoasynchronoussystems– Ofenastar3ngpointfordesign
25
![Page 26: Distributed Systems Communicaon and models · E.g. different types of failures occur with corresponding probabili3es 26 Node failures • Common abstract models – Stopping failure:](https://reader033.vdocuments.site/reader033/viewer/2022050104/5f4284773e501f4dd355ed42/html5/thumbnails/26.jpg)
Failures
• Nodesmayfail– Hardwarefailure– Runoutofenergyorpowerfailure– Sofwarefailure(crash)– Permanent– Temporary(whathappenswhenitrestarts?Recoversthestate?Startsfromini3alstate?)
– Modeldependsonsystem.E.g.differenttypesoffailuresoccurwithcorrespondingprobabili3es
26
![Page 27: Distributed Systems Communicaon and models · E.g. different types of failures occur with corresponding probabili3es 26 Node failures • Common abstract models – Stopping failure:](https://reader033.vdocuments.site/reader033/viewer/2022050104/5f4284773e501f4dd355ed42/html5/thumbnails/27.jpg)
Nodefailures• Commonabstractmodels
– Stoppingfailure:nodejuststopsworking• Mayneedassump3onsaboutwhichcomputa3on/communica3onitfinishesbeforestopping
• Mayneedassump3onaboutneighborsknowingoffailure– Byzan3nefailure:nodebehavesasanadversary
• Imagineyourenemyhastakencontrolofthenode• Istryingtospoilyourcomputa3on
• Nodesmayfailindividually– E.g.eachnodefailswithprobabilityp
• Nodesmayhavecorrelatedfailure– E.g.allnodesfailinaregion(datacenter,sensorfield)
27
![Page 28: Distributed Systems Communicaon and models · E.g. different types of failures occur with corresponding probabili3es 26 Node failures • Common abstract models – Stopping failure:](https://reader033.vdocuments.site/reader033/viewer/2022050104/5f4284773e501f4dd355ed42/html5/thumbnails/28.jpg)
Link/communica3onfailure• Maybetemporary/permanent• Mayhappendueto
– Hardwarefailure– Noise:electronicdevices(microwavesetc)maytransmitradiowavesatsimilarfrequenciesanddisruptcommunica3on
– Interference:Othercommunica3ngnodesnearbymaydisruptcommunica3on
• Effects– Channelsilentandunusable(hardwarefailure)– Channelac3ve,butunusableduetonoiseandinterference– Channelac3ve,butmaycontainerroneousmessage(maybedetectedbyerrorcorrec3ngcodes)
28
![Page 29: Distributed Systems Communicaon and models · E.g. different types of failures occur with corresponding probabili3es 26 Node failures • Common abstract models – Stopping failure:](https://reader033.vdocuments.site/reader033/viewer/2022050104/5f4284773e501f4dd355ed42/html5/thumbnails/29.jpg)
Security• Issues:– Unauthorizedaccess,modifica3on.Makingsystemsunavailable(DOS)
– AWackononeormorenodes• Causingtoitfail• Readdata• Takingcontroltoreadfuturedata,disruptopera3on
– AWackoncommunica3onlinks/channel• Blockcommunica3on• Readdatainthechannel(easyinwirelesswithoutencyp3on)
• Corruptdatainthechannel
29
![Page 30: Distributed Systems Communicaon and models · E.g. different types of failures occur with corresponding probabili3es 26 Node failures • Common abstract models – Stopping failure:](https://reader033.vdocuments.site/reader033/viewer/2022050104/5f4284773e501f4dd355ed42/html5/thumbnails/30.jpg)
Security
• Solu3onsusuallyhavespecificassump3onsofwhattheadversarycando
• E.g.Ifadversaryhasaccesstochannel– Cryptographymaybeabletopreventreading/corrup3ngdata
30
![Page 31: Distributed Systems Communicaon and models · E.g. different types of failures occur with corresponding probabili3es 26 Node failures • Common abstract models – Stopping failure:](https://reader033.vdocuments.site/reader033/viewer/2022050104/5f4284773e501f4dd355ed42/html5/thumbnails/31.jpg)
Mobility
• Movementmakesithardertodesigndistributedsystems– Communica3onisdifficult
• Delays,lostmessages• Edgeweightscanchange
– Applica3onsthatdependonloca3onmustadapttomovement
• Howdopeoplemove?Whatisamodelofmovement?– Notyetwellunderstood
31
![Page 32: Distributed Systems Communicaon and models · E.g. different types of failures occur with corresponding probabili3es 26 Node failures • Common abstract models – Stopping failure:](https://reader033.vdocuments.site/reader033/viewer/2022050104/5f4284773e501f4dd355ed42/html5/thumbnails/32.jpg)
Modelingdistributedsystems
• Manypossibili3es• Chooseyourassump3onscarefullyforyourproblem
• PaycloseaWen3ontowhatisknownaboutcommunica3on/network
• Startwithsimplermodels– Usuallymoreassump3ons,fewerparameters– Seewhatcanbeachieved– Thentrytodrop/relaxassump3ons
32
![Page 33: Distributed Systems Communicaon and models · E.g. different types of failures occur with corresponding probabili3es 26 Node failures • Common abstract models – Stopping failure:](https://reader033.vdocuments.site/reader033/viewer/2022050104/5f4284773e501f4dd355ed42/html5/thumbnails/33.jpg)
33