distributed systems - brown universitycs.brown.edu/courses/cs138/s19/lectures/day6_2019.pdf ·...

Post on 15-Mar-2020

5 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

DistributedSystemsDay6:DistributedHashTables [Part3]

Agenda

• Tapestry• Lookups(routing)inTapestry• AddingaNode• Deleting anode• Dealingwithfailures

TapestryNode

• ObjectStore• Localkey-value• I.e.,keyswherethisnodeis“Root”

• Route—Table• Atableofneighbors

• Backpointers• Alistofnodes(whohavethisnodeasintheirrouteTable)

RouteTableBackpointers

Objectstore(key,valuestore)

BackPointers

RouteTableBackPointers

Objectstore(key,valuestore)

IfNodeA isinNodeB’s RouteTableThenNodeB isinNodeA’s Backpointers

RouteTableBackPointers

Objectstore(key,valuestore)

NodeA NodeB

Backpointers areusefulduring:*GracefulExit*Addinganewnode

RouteTable

• RouteTableisBbyB• EachRowidentifies aprefix• EachColumnisasubset oftheprefix

• Eachcellisanode• Anodemayshowupinmultiple cell• Therearemultipleoptions

• 2303and2111arebothvalid• Pick `best’’option

• Best==Closestnode.

XXXX3XXX33XX331X

00331

———

11332311133113311

22302

—33203312

333123312

——

RouteTableforNode:3312

RouteTableBackPointers

Objectstore(key,value store)

XXXX3XXX33XX331X

0 1 2 3

RowdeterminesprefixlengthColdeterminesdigit

RouteTableforNode:3312

2130

3122

3312 2302

31111332

31200121

0331

1331

1001

3311

3320

X X X0

XXXX3XXX33XX331X

00331

1 2 3

RowdeterminesprefixlengthColdeterminesdigit

RouteTableforNode:33123312 2302

31111332

31200121

0331

1331

1001

3311

3320

X X X0

2130

3122

XXXX3XXX33XX331X

00331

11332

2 3

RowdeterminesprefixlengthColdeterminesdigit

RouteTableforNode:33123312 2302

31111332

31200121

0331

1331

1001

3311

3320

1332 1331

1001

X X X1

2130

3122

XXXX3XXX33XX331X

00331

11332

22302

3

RowdeterminesprefixlengthColdeterminesdigit

RouteTableforNode:33123312 2302

31111332

31200121

0331

1331

1001

3311

3320

X X X2

2130

3122

XXXX3XXX33XX331X

00331

11332

22302

33312

RowdeterminesprefixlengthColdeterminesdigit

RouteTableforNode:33123312 2302

31111332

31200121

0331

1331

1001

3311

3320

X X X3

2130

3122

XXXX3XXX33XX331X

00331

11332

22302

33312

RowdeterminesprefixlengthColdeterminesdigit

RouteTableforNode:33123312 2302

31111332

31200121

0331

1331

1001

3311

3320

0 X X3

2130

3122

XXXX3XXX33XX331X

00331

113323111

22302

33312

RowdeterminesprefixlengthColdeterminesdigit

RouteTableforNode:33123312 2302

31111332

31200121

0331

1331

1001

3311

3320

1 X X3

2130

3122

XXXX3XXX33XX331X

00331

———

11332311133113311

22302

—33203312

333123312

——

RouteTableforNode:33123312 2302

31111332

31200121

0331

1331

1001

3311

3320

2130

3122

XXXX3XXX33XX332X

00121

——

3320

1100131203311

22130

—3320

333113320

——

RouteTableforNode:3320

2130

3122

3312 2302

31111332

31200121

0331

1331

1001

3311

3320

XXXX3XXX33XX331X

00331

———

11332311133123311

22302

—33203312

333123312

——

RouteTableforNode:3312

2=Prefix(3312,3320)

Tapestry:IDLookup

HowtoRoute?UsingPrefixLookup

2130

3122

3312 2302

31111332

31200121

0331

1331

1001

3311

3320

lookup:3122

3XXX31XX

312X

165

RoutingAlgorithm

2130

3122

3312 2302

31111332

31200121

0331

1331

1001

3311

3320

lookup:3122

3XXX

XXXX2XXX21XX213X

00331

—--

2130

113322130

----

22302

—----

3331223022130

RouteTableforNode:2130

//executedateachnodeinroutetodestination

NextHop(targetHash,step){nextDigit=digit(targetHash,step)

return(table[step,nextDigit])}

//executedateachnodeinroutetodestination

NextHop(targetHash,step){nextDigit=digit(targetHash,step)

return(table[step,nextDigit])}

NextDigit=3

Table[0,3]

3122.at(0)

166

RoutingAlgorithm//executedateachnodeinrouteto

destination

NextHop(targetHash,step){nextDigit=digit(targetHash,step)

return(table[step,nextDigit])}

2130

3122

3312 2302

31111332

31200121

0331

1331

1001

3311

3320

lookup:3122

3XXX31XX

XXXX3XXX33XX331X

00331

———

11332311133123311

22302

—33203312

333123312

——

RouteTableforNode:3312

NextDigit=1

Table[1,1]

3122.at(1)

167

RoutingAlgorithm

2130

3122

3312 2302

31111332

31200121

0331

1331

1001

3311

3320

lookup:3122

3XXX31XX

312X

3122

//executedateachnodeinroutetodestination

NextHop(targetHash,step){nextDigit=digit(targetHash,step)

return(table[step,nextDigit])}

NextDigit=2

Table[2,2]

NextDigit=2

Table[3,2]

3122.at(2)3122.at(3)

WhatHappensifYouaremissinganEntry:SurrogateRouting

2130

3122

3312 2302

31111332

31200121

0331

1331

1001

3311

3320

lookup:3021

3XXX

XXXX3XXX33XX331X

00331

———

11332311133123311

22302

—33203312

333123312

——

RouteTableforNode:3312

????

171

How?

• Ifnonexthopexists,trythenextlargerdigit,modbase• eachneighbor-tablerowmusthaveatleastoneentry

• why?• ifanytwoneighbor-tablerows(ofdifferentnodes)sharethesameprefix,theymustagreeonwhichentriesarenull• why?

XXXX3XXX33XX331X

00331

———

11332311133123311

22302

—33203312

333123312

——

????

172

How?

• Ifnonexthopexists,trythenextlargerdigit,modbase• eachneighbor-tablerowmusthaveatleastoneentry

• why?• ifanytwoneighbor-tablerows(ofdifferentnodes)sharethesameprefix,theymustagreeonwhichentriesarenull• why?

XXXX3XXX33XX331X

00331

———

11332311133123311

22302

—33203312

333123312

——

????

XXXX3XXX33XX332X

00121

——

3320

1100131203311

22130

—3320

333113320

——

RouteTableforNode:3320

XXXX3XXX33XX331X

00331

———

11332311133123311

22302

—33203312

333123312

——

RouteTableforNode:3312

2=Prefix(3312,3320)

XXXX3XXX31XX312X

00121

——

3120

1133231223111

---

22302

—31223122

333123311

——

RouteTableforNode:3122 1 =Prefix(3320,3122)

XXXX3XXX33XX332X

00121

——

3320

1100131203311

22130

—3320

333113320

——

RouteTableforNode:3320

XXXX3XXX33XX331X

00331

———

11332311133123311

22302

—33203312

333123312

——

RouteTableforNode:3312

XXXX3XXX31XX312X

00121

——

3120

1133231223111

---

22302

—31223122

333123311

——

RouteTableforNode:3122 1 =Prefix(3320,3122)

3122

0121

1xxx

1332

0xxx

2302 3312

2xxx 3xxx

----

31xx

3122

30xx

---- 3311

32xx 33xx

---

311x

3111

310x

3122 ----

312x 313x

3120

3121

---

3120

3122 ---

3122 3123

3320

0121

1xxx

1332

0xxx

2302 3312

2xxx 3xxx

----

31xx

3122

30xx

---- 3311

32xx 33xx

---

331x

3311

330x

3320 ----

332x 333x

3320

3321

---

3320

--- ---

3322 3323

XXXX3XXX33XX332X

00121

——

3320

1133231223311

22302

—3320

333123311

——

RouteTableforNode:3320

XXXX3XXX31XX312X

00121

——

3120

1133231223111

---

22302

—31223122

333123311

——

RouteTableforNode:3122

SurrogateRoutingfor3021

2130

3122

3312 2302

31111332

31200121

0331

1331

1001

3311

3320

3021

3xxx3xxx 30xx?

30xx?

31xx

31xx

312x312x

3121?31223121?

3122

Tapestry:DeletingANode

WhatNeedstobeDoneduringDeletion?

• TwoversionsofNodeDelete• Graceful:nodewilling leaves

• Candoclean-up• Ungraceful:nodecrashes

• Noopportunity todocleanup

RouteTableBackPointers

Local<K,V> NodeA

• LocalK,Vwilldisappearà mustinformownerstorepublish

• I’minotherpeople’sroutingtableàmusttellthemI’mleaving

WhatNeedstobeDoneduringDeletion?

• TwoversionsofNodeDelete• Graceful:nodewilling leaves

• Candoclean-up• Ungraceful:nodecrashes

• Noopportunity todocleanup

RouteTableBackPointers

Local<K,V>

RouteTableBackPointers

Local<K,V>

RouteTableBackPointers

Local<K,V>

NodeA

NodeBNodeC

RemoteRoutingTable:usebackpointersLocalK/V:Informclienttorepublish

RemoteroutingTable:NodeswilldetectfailureLocalK/V:Clientwillrepublishaftertimeout!!

Tapestry:AddingANode

AddingANewNode

• Step1:FindRootforServerID

• Step2:MovesubsetofobjectsfromRoottoNode

• Step3:MakeNoderoutingtable

• Step4:UpdateRoutingTableforothernodes

RouteTableBackPointers

Objectstore(key,valuestore)

XXXX3XXX33XX332X

00121

——

3320

1100131203311

22130

—3320

333113320

——

RouteTableforNode:3320

2130

3122

3312 2302

31111332

31200121

0331

1331

1001

3311

3320

XXXX3XXX33XX331X

00331

———

11332311133123311

22302

—33203312

333123312

——

RouteTableforNode:3312

XXXX3XXX31XX312X

00121

——

3120

1100131203111

22130

—31203122

333113320

——

RouteTableforNode:3122

AddingNode3121• Step1:findRoot

MakeNewNode’sRoutingTable

• Nodesateachstageoflookshareprefix• Youcanbuildanewroutingtableby:

• Option1:Takingsubsetofroutingtables ateachstep• Option2:Takeback-pointers --- foreachbackpointer askforbackpointers

31222130 31113xxx

3123?

AddingNode3121• Step1:findRoot• Step2:Makeroutetablefor3121

3120

Option1

Givemeentriesinrow0

Givemeentriesinrow1

Givemeentriesinrow2

Givemeentriesinrow3

3123RouteTable

XXXX3XXX33XX332X

00121

——

3320

1100131203311

22130

—3320

333113320

——

RouteTableforNode:3320

XXXX3XXX33XX331X

00331

———

11332311133123311

22302

—33203312

333123312

——

RouteTableforNode:3312

2=Prefix(3312,3320)

XXXX3XXX31XX312X

00121

——

3120

1133231223111

---

22302

—31223122

333123311

——

RouteTableforNode:3122 1 =Prefix(3320,3122)

MakeNewNode’sRoutingTable

• Nodesateachstageoflookshareprefix• Youcanbuildanewroutingtableby:

• Option1:Takingsubsetofroutingtables ateachstep• Option2:Takeback-pointers ---

3122Back-

pointersBack-

pointers

3xxx

3121?

AddingNode3121• Step1:findRoot• Step2:Makeroutetablefor3121

Back-pointers

Option2

Givemepointeroverlap0

Givemepointersoverlap1

Givemepointersoverlap2

Givemepointersoverlap3

3121RouteTable

Step1:Askrootforbackpointerswithoverlapp=prefixlen(newNodeID,RootID)Step2:foreachbackpointers• Askforit’sbackpointerswithoverlapp—Step3:repeatstep2untilp=0

• Everynodein3121Routingtable• Send`hello’message• Theyshouldaddyoutotheirbackpointer

AddingNode3123• Step1:findRoot• Step2:Makeroutetablefor3123• Step2.1:updatebackpointers

RouteTableBackPointers

Objectstore(localKV)

RouteTableBackPointers

Objectstore(localKV)

3121 Nodein3121RoutingTable

• Problem:newnodecanfillinemptyslotinafewroutingtables

• Solutions• Option1:sendamessage toeverynode• Option2:sendamessage to‘need-to-know’nodes

• i.e.,nodes thatweknow haveanyemptyslotwherethenewnode’s IDgoes

AddingNode3121• Step1:findRoot• Step2:Makeroutetablefor3121• Step2.1:updatebackpointers• Step3:update `needtoknownodes’’

SurrogateRoutingfor3021

2130

3122

3312 2302

31111332

31200121

0331

1331

1001

3311

3320

3021

3xxx3xxx 30xx?

30xx?

31xx

31xx

312x312x

3121?31223121?

3122

Anynodewith30XXinitsroutingtable isaneedtoknownode

SurrogateRoutingfor3021

2130

3122

3312 2302

31111332

31200121

0331

1331

1001

3311

3320

3021

3xxx3xxx 30xx?

30xx?

31xx

31xx

312x312x

3121?31223121?

3122

Anynodewith30XXinitsroutingtable isaneedtoknownode

SurrogateRootcontactsnodes

Updatingneedtoknownodes

Root

3121?

Neighbors

SendtoNewNodetoneighborswithp=prefix(root,NewNode)

SendtoNewNodetoneighborswithprefixp--

3***

Note: inthisexample,since p=1(3***),

Tapestry:Optimizations

ImplicationsofNodeFailure

• Problem:whenanodecrashes,allobjectsstoredonnodearelost• Naïvesolution:clientsrepublishobjectsperiodically

• I.e.,You(asaclient)needtorepublishyourfacebook pictures.• Drawback1:clients needtostoreandrepublish• Drawback2:objectsareunavailable untilclientrepublishes

• Whataresomealternativesolutions?

• ``salt’’thehashandpublishseveralcopiesoftheobject• Recoverfromfailurethroughredundancy

LatencyOptimization:EntrySelection

• Problem:multipleoptionsforeachroutingtableentry• Howdoyouselectone?

• Shortestdistance• Benefit: lowerslookuplatency

XXXX3XXX33XX331X

0331———

1332311133123311

2302—

33203312

33123312

——

Options:• 3312• 3320• 3311• 3312• 3111

DistributedHashTableRecap

• Consistenthash(Chash)• Benefitofconsistenthashingovertraditional keyallocation• Howtomapkeystoservers

• Chord(practicaluseofChash)• Terms:Successor, routingtable (finger table),• Building aroutingtable• Performing look-ups

• Tapestry• Terms: rootnode,surrogatenode,backpointers, publishing• Building routingtable• Performing lookup(regular routing, surrogaterouting)• Adding/deleting anode(``need-to-know’’)• Optimizations (``salting’’,entryselection)

k1 v1

k2 v2

k3 v3

k4 v4

k5 v5

k0 v0

Hash table

k4 v4

k5 v5

RouteTableBackPointers

Local<K,V>

32ff

7fc5

9d5e

0d61f623

3a3e

8006

8 b 8 7

7 1 b 1

5 2 6 9

a a 2 e

a 2 d 4

8 fa a

9 d ca

dfcf

c1d9

top related