pic: practical internet coordinates for distance estimation manuel costa joint work with miguel...

Post on 11-Jan-2016

221 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

PIC: Practical Internet Coordinates for Distance

Estimation

Manuel Costa

joint work with

Miguel Castro, Ant Rowstron, Peter Key

Microsoft Research Cambridge

Why estimate distances?

Why estimate distances?

• Distance estimation can be used to optimize large scale distributed systems:– Server selection– Locality aware peer-to-peer overlay networks– Application level multicast

• Problems with on-demand measurement:– Slow– High overhead

PIC

• Maps the Internet into a geometric space

• Allows very low cost distance estimation

• Fully decentralized

• Tolerates malicious nodes

Outline

• Estimating distances with coordinates

• Securing the coordinate computation process

• Application to peer-to-peer overlays

• Conclusion

Internet as a geometric space

• Map each node to a position in the geometric space

• Compute distances based on coordinates

• Any node can compute the distance between any other two nodes

• Proposed by GNP (Global Network Positioning)

y

x

(x2,y2)

(x3,y3)

(x1,y1)

GNP – computing coordinates

• Measure distance to fixed landmarks

• Assign coordinates by solving a multi-dimensional global minimization problem

• There is no exact solution:– Internet is not euclidean– Measurements have errors

y

x

(x1,y1)

(x2,y2)

(x3,y3)

(x4,y4)

d1 d2

d3

PIC – computing coordinates

• Any node in the system can act as a landmark

• Strategies for choosing landmarks include:– Random nodes– Close nodes– Hybrid

y

x

(x1,y1) (x2,y2)

(x3,y3)

(x4,y4)

(x5,y5)

d1

d2

d3

PIC – any node can act as landmark

PIC – advantages

• Self-organizing - no provisioning of servers needed

• Scalable - load distributed among all the peers

• Resilient - avoids centralized points of failure

Experimental evaluation

• 40 000 node network on 3 topologies: Georgia Tech, Mercator, Corpnet

• Compare predicted distance to real distance for 100 000 node pairs

• Euclidean space with 8 dimensions, 16 landmarks

Accuracy: Georgia Tech

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 20 40 60 80 100 120 140 160 180 200

relative error (%)

frac

tio

n o

f d

ista

nce

s

GNPrandomclosesthybrid

Accuracy over short distances

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 20 40 60 80 100 120 140 160 180 200

relative error (%)

frac

tio

n o

f d

ista

nce

s

randomGNPclosesthybrid

Accuracy: CorpNet

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 20 40 60 80 100 120 140 160 180 200

Relative error (%)

Fra

ctio

n o

f d

ista

nce

s

GNP

randomclosest

hybrid

Accuracy: Mercator

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 20 40 60 80 100 120 140 160 180 200

Relative error (%)

Fra

ctio

n o

f d

ista

nce

s

GNPrandomclosesthybrid

PIC – security

• Problem: Malicious/compromised nodes can provide incorrect coordinates or fake distances

• Solution– Incorrect coordinates and distances

are likely to violate triangle inequality– Remove landmarks that violate triangle

inequality

PIC – security

• Remove landmarks with highest sum of deviations from these bounds

• When testing landmark i, check:

dn,i di,j dn,j≤ +

dn,i di,j dn,j≥ −

dn,i dn,j di,j≥ −joining node n

landmark i(under test)

landmark j

dn,i

dn,j

di,j

Security evaluation• Fraction f of colluding attackers

– Know everything

• When a node joins, attackers collude to provide a set of fake coordinates and distances that maximize the distance to the correct position

• This is a very powerful attack

Accuracy under attack

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 20 40 60 80 100 120 140

relative error (%)

fra

cti

on

of

dis

tan

ce

s

no attackers, security on

10% colluding attackers

20% colluding attackers

Application to peer-to-peer overlays

• Structured overlays:– Nodes have nodeIds – Message sent to a key is delivered to node

with closest nodeId

Structured overlays: Mapping keys to nodes

• large id space (128-bit integers)

• nodeIds picked randomly from space

• keys picked randomly from space

• key is managed by its root node:

• live node with id closest to the key

root nodefor key

id space

nodeIdkey

Pastry: Node routing state

0* 1* 2* 3*

20* 21* 22* 23*

200* 201* 202* 203*

2030* 2031* 2032* 2033*

203231

• topology aware routing table• nodeIds and keys in some base 2b (e.g., 4)• prefix constraints on nodeIds for each slot• pick closest node satisfying slot constraints

leaf set

nodeId

Pastry: routing

• prefix matching: each hop resolves an extra key digit

323310

323211

322021

313221

103231

nodeId

key

route(m,323310)

Proximity neighbour selection

• Select close nodes for use in routing

• Important to achieve low delay routes

• PIC can replace network distance probes

Pastry: prefix-based routing

• Prefix matching: each hop resolves an extra key digit• Proximity neighbour selection: use closest known

node that matches an extra digit

323211322021313221

103231route(m,323310)

route(m,323310)

route(m,323310)

323310

Proximity test variants

• Full probing– RTT measured by taking the minimum of

three probes

• PIC– RTT estimated with coordinates

• Filtered probing– Use coordinates to filter bad candidates,

always probe before replacing a neighbour

Trace-driven evaluation

• Dynamic node arrival and failure generated from UW Gnutella study– 60 hour trace– Average session time 2.3 hours– number of active nodes varies from 1300-

>2700

• Georgia Tech topology

Distance probes

0

0.05

0.1

0.15

0.2

0.25

0.3

0 10 20 30 40 50 60

Time (hours)

Pro

bes

per

sec

on

d p

er n

od

e full probing

PIC

filtered probing

Relative delay penalty

full probing filtered probingPIC

no locality

0

0.5

1

1.5

2

2.5

3

3.5

RD

P

Related Work

• GNP: maps Internet into geometric space using centralized landmarks

• Lighthouses: uses decentralized random landmarks

• Mithos: uses closest nodes as landmarks• Virtual landmarks: partitions nodes into sets,

maps coordinates between sets• Vivaldi: computes coordinates continuously by

passively monitoring RPC delays

Conclusion

• PIC enables practical distance estimation in large distributed systems– Accurate– Self-organizing– Scalable– Secure

• Future Work– Deployment and evaluation on the Internet– Different distance metrics (e.g. bandwidth)

Questions ?

top related