infrastructure-based resilient routing

23
Infrastructure-based Resilient Routing Ben Y. Zhao, Ling Huang, Jeremy Stribling, Anthony Joseph and John Kubiatowicz University of California, Berkeley ICSI Lunch Seminar, January 2004

Upload: posy

Post on 25-Feb-2016

39 views

Category:

Documents


2 download

DESCRIPTION

Infrastructure-based Resilient Routing. Ben Y. Zhao, Ling Huang, Jeremy Stribling, Anthony Joseph and John Kubiatowicz University of California, Berkeley ICSI Lunch Seminar, January 2004. Motivation. Network connectivity is not reliable - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Infrastructure-based Resilient Routing

Infrastructure-basedResilient Routing

Ben Y. Zhao, Ling Huang, Jeremy Stribling, Anthony Joseph and John Kubiatowicz

University of California, BerkeleyICSI Lunch Seminar, January 2004

Page 2: Infrastructure-based Resilient Routing

April 22, 2023 [email protected] Lunch Seminar, Jan. 2004

Motivation Network connectivity is not reliable

Disconnections frequent in the Internet (UMichTR98,IMC02) 50% of backbone links have MTBF < 10 days 20% of faults last longer than 10mins

IP-level repair relatively slow Wide-area: BGP 3 mins Local-area: IS-IS 5 seconds

Next generation wide-area network applications Streaming media, VoIP, B2B transactions Low tolerance of delay, jitter and faults

Page 3: Infrastructure-based Resilient Routing

April 22, 2023 [email protected] Lunch Seminar, Jan. 2004

The Challenge Routing failures are diverse

Many causes Misconfigurations, cut fiber, planned downtime, software bugs

Occur anywhere with local or global impact: Single fiber cut can disconnect AS pairs

One event can lead to complex protocol interactions Isolating failures is difficult

End user symptoms often dynamic or intermittent WAN measurement research is ongoing (Rocketfuel, etc) Observations:

Fault detection from multiple distributed vantage points In-network decision making necessary for timely responses

Page 4: Infrastructure-based Resilient Routing

April 22, 2023 [email protected] Lunch Seminar, Jan. 2004

Talk Overview Motivation A structured overlay approach Mechanisms and policy Evaluation Some questions

Page 5: Infrastructure-based Resilient Routing

April 22, 2023 [email protected] Lunch Seminar, Jan. 2004

An Infrastructure Approach Our goals

Resilient overlay to route around failures Respond in milliseconds (not seconds)

Our approach (data & control plane) Nodes are observation points

(similar to Plato’s NEWS service) Nodes are also points of traffic redirection

(forwarding path determination and data forwarding) No edge node involvement

Fast response time, security focused on infrastructure Fully transparent, no application awareness necessary

Page 6: Infrastructure-based Resilient Routing

April 22, 2023 [email protected] Lunch Seminar, Jan. 2004

Why Structured Overlays Resilient Overlay Networks (MIT) Fully connected mesh Each node has full knowledge of network

Fast, independent calculation of routes Nodes can construct any path, maximum

flexibility Cost of flexibility

Protocol needs to choose the “right” route/nodes

Per node O(n) state Monitors n - 1 paths O(n2) total path monitoring is expensive

S

D

Page 7: Infrastructure-based Resilient Routing

April 22, 2023 [email protected] Lunch Seminar, Jan. 2004

v

v

vv

v v

v

v

v

v

v

v

v

O V E R L A Y

The Big Picture

Locate nearby overlay proxy Establish overlay path to destination host Overlay traffic routes traffic resiliently

Internet

Page 8: Infrastructure-based Resilient Routing

April 22, 2023 [email protected] Lunch Seminar, Jan. 2004

B

Traffic TunnelingLegacyNode A

LegacyNode B

ProxyProxy

registerregister

Structured Peer to Peer Overlay

put (hash(B), P’(B))

P’(B)

get (hash(B)) P’(B)

A, B are IP addresses

put (hash(A), P’(A))

Store mapping from end host IP to its proxy’s overlay ID Similar to approach in Internet Indirection Infrastructure (I3)

Page 9: Infrastructure-based Resilient Routing

April 22, 2023 [email protected] Lunch Seminar, Jan. 2004

Pros and Cons Leverage small neighbor sets Less neighbor paths to monitor: O(n) O(log(n))

Reduction in probing bandwidth Faster fault detection

Actively maintain static route redundancy Manageable for “small” # of paths Redirect traffic immediately when a failure is detected

Eliminate on-the-fly calculation of new routes Restore redundancy in background after failure

Fast fault detection + precomputed paths = more responsiveness Cons: overlay imposes routing stretch (mostly < 2)

Page 10: Infrastructure-based Resilient Routing

April 22, 2023 [email protected] Lunch Seminar, Jan. 2004

In-network Resiliency Details Active periodic probes for fault-detection

Exponentially weighted moving average link quality estimation Avoid route flapping due to short term loss artifacts Loss rate Ln = (1 - ) Ln-1 + p

Simple approach taken, much ongoing research Smart fault-detection / propagation (Zhuang04) Intelligent and cooperative path selection (Seshardri04)

Maintaining backup paths Create and store backup routes at node insertion Query neighbors after failures to restore redundancy

Ask any neighbor at or above routing level of faulty nodee.g. ABCD sees ABDE failed, can ask any AB?? node for info

Simple policies to choose among redundant paths

Page 11: Infrastructure-based Resilient Routing

April 22, 2023 [email protected] Lunch Seminar, Jan. 2004

First Reachable Link Selection (FRLS) Use link quality estimation to choose

shortest “usable” path Use shortest path with

minimal quality > T Correlated failures

Reduce with intelligent topology construction

Goal: leverage redundancy available

Page 12: Infrastructure-based Resilient Routing

April 22, 2023 [email protected] Lunch Seminar, Jan. 2004

Evaluation Metrics for evaluation

How much routing resiliency can we exploit? How fast can we adapt to faults (responsiveness)?

Experimental platforms Event-based simulations on transit stub topologies

Data collected over multiple 5000-node topologies PlanetLab measurements

Microbenchmarks on responsiveness

Page 13: Infrastructure-based Resilient Routing

April 22, 2023 [email protected] Lunch Seminar, Jan. 2004

Exploiting Route Redundancy (Sim)

Simulation of Tapestry, 2 backup paths per routing entry Transit-stub topology shown, results from TIER and AS graphs similar

00.10.20.30.40.50.60.70.80.9

1

0 0.05 0.1 0.15 0.2Proportion of IP Links Broken

% o

f All

Pairs

Rea

chab

le

Instantaneous IP Tapestry / FRLS

Page 14: Infrastructure-based Resilient Routing

April 22, 2023 [email protected] Lunch Seminar, Jan. 2004

Responsiveness to Faults (PlanetLab)

Two reasonable values for filter constant Response time scales linearly to probe period

0

500

1000

1500

2000

2500

0 200 400 600 800 1000 1200

Link Probe Period (ms)

Tim

e to

Sw

itch

Rou

tes

(ms)

alpha=0.2alpha=0.4

300

660 = 0.2 = 0.4

Page 15: Infrastructure-based Resilient Routing

April 22, 2023 [email protected] Lunch Seminar, Jan. 2004

Link Probing Bandwidth (Planetlab)

0

1

2

3

4

5

6

7

1 10 100 1000

Size of Overlay

Ban

dwid

th P

er N

ode

(KB

/s)

PR=300msPR=600ms

Bandwidth increases logarithmically with overlay size Medium sized routing overlays incur low probing bandwidth

Page 16: Infrastructure-based Resilient Routing

April 22, 2023 [email protected] Lunch Seminar, Jan. 2004

Conclusion Trading flexibility for scalability and responsiveness

Structured routing has low path maintenance costs Allows “caching” of backup paths for quick failover

Can no longer construct arbitrary paths But simple policy exploits available redundancy well

Fast enough for most interactive applications 300ms beacon period response time < 700ms ~300 nodes, b/w cost = 7KB/s

Page 17: Infrastructure-based Resilient Routing

April 22, 2023 [email protected] Lunch Seminar, Jan. 2004

Ongoing Questions Is this the right approach?

Is there a lower bound on desired responsiveness? Is this responsive enough for VoIP?

If not, is multipath routing the solution?

What about deployment issues? How does inter-domain deployment happen? A third-party approach? (Akamai for routing)

Page 18: Infrastructure-based Resilient Routing

April 22, 2023 [email protected] Lunch Seminar, Jan. 2004

Related Work Redirection overlays

Detour (IEEE Micro 99) Resilient Overlay Networks (SOSP 01) Internet Indirection Infrastructure (SIGCOMM 02) Secure Overlay Services (SIGCOMM 02)

Topology estimation techniques Adaptive probing (IPTPS 03) Internet tomography (IMC 03) Routing underlay (SIGCOMM 03)

Many, many other structured peer-to-peer overlays

Thanks to Dennis Geels / Sean Rhea for their work on BMark

Page 19: Infrastructure-based Resilient Routing

April 22, 2023 [email protected] Lunch Seminar, Jan. 2004

Backup Slides

Page 20: Infrastructure-based Resilient Routing

April 22, 2023 [email protected] Lunch Seminar, Jan. 2004

Another Perspective on Reachability

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.05 0.1 0.15 0.2

Proportion of IP Links Broken

Prop

ortio

n of

All

Path

sPortion of all pair-wise paths where

no failure-free paths remain

Portion of all paths where IP and FRLS both

route successfully

A path exists, but neither IP nor

FRLS can locate the path

FRLS finds path, where

short-term IP routing fails

Page 21: Infrastructure-based Resilient Routing

April 22, 2023 [email protected] Lunch Seminar, Jan. 2004

Constrained Multicast Used only when all paths are below

quality threshold Send duplicate messages on multiple

paths Leverage route convergence

Assign unique messageIDs

Mark duplicates Keep moving window of IDs Recognize and drop duplicates

Limitations Assumes loss not from congestion Ideal for local area routing

2046

1111

2281 2530

2299 2274 2286

2225

? ? ?

Page 22: Infrastructure-based Resilient Routing

April 22, 2023 [email protected] Lunch Seminar, Jan. 2004

Latency Overhead of Misrouting

00.10.20.30.40.50.60.70.80.9

1

20 40 60 80 100 120 140 160 180

Latency from source to destination (ms)

Prop

ortio

nal L

aten

cy O

verh

ead

Hop 0 Hop 1 Hop 2 Hop 3 Hop 4

Page 23: Infrastructure-based Resilient Routing

April 22, 2023 [email protected] Lunch Seminar, Jan. 2004

Bandwidth Cost of Constrained Multicast

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

20 40 60 80 100 120 140 160

Latency from source to destination (ms)

Prop

ortio

nal B

andw

idth

Ove

rhea

d

Hop 0 Hop 1 Hop 2 Hop 3 Hop 4