active reliable multicast by li-wei h. lehman, stephan j. garland, and david l. tennenhouse

30
1 ADVANCED COMPUTER NETWORKS CS 6390.521 ACTIVE RELIABLE MULTICAST by Li-wei H. Lehman, Stephan J. Garland, and David L. Tennenhouse MIT Laboratory for Computer Science. Presented By Ravikiran Tunuguntla Asutosh Regulagadda Prasanth Jain

Upload: ace

Post on 11-Jan-2016

26 views

Category:

Documents


0 download

DESCRIPTION

ACTIVE RELIABLE MULTICAST by Li-wei H. Lehman, Stephan J. Garland, and David L. Tennenhouse MIT Laboratory for Computer Science. Presented By Ravikiran Tunuguntla Asutosh Regulagadda Prasanth Jain. Challenges for Reliable Multicast. NACK Implosion Problem. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: ACTIVE RELIABLE MULTICAST by Li-wei H. Lehman, Stephan J. Garland, and David L. Tennenhouse

1ADVANCED COMPUTER NETWORKSCS 6390.521

ACTIVE RELIABLE MULTICAST

by

Li-wei H. Lehman, Stephan J. Garland, and David L. Tennenhouse

MIT Laboratory for Computer Science.

Presented By

Ravikiran Tunuguntla

Asutosh Regulagadda

Prasanth Jain

Page 2: ACTIVE RELIABLE MULTICAST by Li-wei H. Lehman, Stephan J. Garland, and David L. Tennenhouse

2ADVANCED COMPUTER NETWORKSCS 6390.521

Challenges for Reliable Multicast

•NACK Implosion Problem.

•Variable packet loss rates at receiver end.

•Entire group retransmission.

•Non static group memberships.

•Retransmission only by source.

Page 3: ACTIVE RELIABLE MULTICAST by Li-wei H. Lehman, Stephan J. Garland, and David L. Tennenhouse

3ADVANCED COMPUTER NETWORKSCS 6390.521

Existing Schemes of Multicast

•SRM: Scalable Reliable Multicast

•Uses randomized back off to avoid NACK implosion.

•NACKs are mcast to a group with TTL.

•TTL limits NACK delivery scope.

•Any member with data can respond a NACK.

•LBRM:Log Based Receiver Reliable Multicast.

•RMTP:Reliable Multicast Transport Protocol

Page 4: ACTIVE RELIABLE MULTICAST by Li-wei H. Lehman, Stephan J. Garland, and David L. Tennenhouse

4ADVANCED COMPUTER NETWORKSCS 6390.521

Difficulties with Existing Schemes

•Provides approximate solutions to scoped recovery.

•No shield to bottle neck links from repair traffic.

•Less fault tolerance.

•Less Robust to topology changes.

•Group topology must be known.

•Local recovery is still an open issue (SRM).

Page 5: ACTIVE RELIABLE MULTICAST by Li-wei H. Lehman, Stephan J. Garland, and David L. Tennenhouse

5ADVANCED COMPUTER NETWORKSCS 6390.521

The Network Model

•Similar to traditional packet networks.

•Provides “best effort” service model.

•Some intermediate routers may be active.

•Active routers provide•Best effort soft state storage.

•Flush their caches periodically to remove items with expired life time

•Perform customized computation based on packet types.

•End points specify TTL of cached items.

•No Assumption about # of active nodes.

•Does not fully rely on Active nodes for error recovery.

Page 6: ACTIVE RELIABLE MULTICAST by Li-wei H. Lehman, Stephan J. Garland, and David L. Tennenhouse

6ADVANCED COMPUTER NETWORKSCS 6390.521

The Network Model

•End points compensate for flushed caches & Router failures

•Timeouts.•Retransmissions.

•Operates correctly even in face of router failure.

•Provides IP-multicast style multicast routing.

•Tree rooted at sender formed for mcast packets.

•Paths for mcast routing path for ucast routing.

•ARM is Receiver reliable.

Page 7: ACTIVE RELIABLE MULTICAST by Li-wei H. Lehman, Stephan J. Garland, and David L. Tennenhouse

7ADVANCED COMPUTER NETWORKSCS 6390.521

Active Reliable Multicast

•Receiver detect losses by sequence gaps or no data arrived after Tmax

•Single sender and multiple Receivers.

•Multiple NACKS are fused at active nodes.

•Sender responds only to first NACK by mcasting Repair to the group.

•NACK count is maintained to request for repair.

•No assumption about end point behavior such as•Sending Rate.•Time out schemes.

•Actions performed by active intermediate routers.•Data caching for local Retransmission.•NACK fusion/Suppression.•Partial mcast for scoped retransmission.

Page 8: ACTIVE RELIABLE MULTICAST by Li-wei H. Lehman, Stephan J. Garland, and David L. Tennenhouse

8ADVANCED COMPUTER NETWORKSCS 6390.521

Active Reliable Multicast

A.Caching Data at Routers (Data Packet):Group Address Source Address

Sequence # Cache TTL (t)NACK count

User Data

Group address: Multicast group address

Cache TTL: Useful life of DP @ active node.

NACK count: Significant only for repair packets

•TTL depends on “farthest” receiver down stream

•TTL should be bound by sequence # cycle.

•Cache packets only at strategic locations.

•In active router simply forwards DP to receiver.

•NACK count indicates # of receiver requests for repair

Page 9: ACTIVE RELIABLE MULTICAST by Li-wei H. Lehman, Stephan J. Garland, and David L. Tennenhouse

9ADVANCED COMPUTER NETWORKSCS 6390.521

Active Reliable Multicast

A.Caching Data at Routers:

•Significance of Caching are

•End-to-end latency over WAN can be reduced.

•Distribution of load of retransmission over multicast tree.

•Reduces recovery latency for distant receivers.

•Protects sender and bottle neck links from repair traffic.

•Strategic location of active nodes helps recover latency.

•Examples of strategic locations are

•Edge of a backbone network.

•Node before more lossy link(air in wireless network).

Page 10: ACTIVE RELIABLE MULTICAST by Li-wei H. Lehman, Stephan J. Garland, and David L. Tennenhouse

10ADVANCED COMPUTER NETWORKSCS 6390.521

Active Reliable Multicast

B.NACK Suppression and Local Retransmission:

•3 records maintained at ARM active node are•A NACK Record for DP with

•“NACK count” received for packet.

•“Subscription bitmap” for outgoing links.

•A REPAIR Record for DP (p) contains

•Vector of outgoing links on repair for (p).

•REPAIR record used to suppress NACK sent by receivers before they receive a repair packet.

•A Datapacket itself.

•The above 3 uniquely identifies group addr, source addr, seq# of lost DP.

Page 11: ACTIVE RELIABLE MULTICAST by Li-wei H. Lehman, Stephan J. Garland, and David L. Tennenhouse

11ADVANCED COMPUTER NETWORKSCS 6390.521

Active Reliable Multicast

B.NACK Suppression and Local Retransmission:NACK packet form receiver to ARM active node

Reciever Address Source AddressSequence # Mcast group Address

TTL for NACK NACK CountREPAIR Record

Pay Load

•Significance of processing NACK packet

•Suppresses duplicate NACKS.

•Prepares for necessary scoped retransmission.

Page 12: ACTIVE RELIABLE MULTICAST by Li-wei H. Lehman, Stephan J. Garland, and David L. Tennenhouse

12ADVANCED COMPUTER NETWORKSCS 6390.521

Active Reliable Multicast

B.NACK Suppression and Local Retransmission:

Page 13: ACTIVE RELIABLE MULTICAST by Li-wei H. Lehman, Stephan J. Garland, and David L. Tennenhouse

13ADVANCED COMPUTER NETWORKSCS 6390.521

Active Reliable Multicast

B.NACK Suppression and Local Retransmission:

•First checks if requested repair has been sent to to link where NACK is arrived

•If yes router drops NACK.

•If No and Repair is

•In cache, Router retransmits REPAIR.

•not in cache, subscribes for retransmission and forwards 1 NACK upstream

•A NACK with higher NACK count is sent after repair timeout.

•An in-active router just forwards towards sender.

•REPAIR record cache TTL RTT from sender to farthest receiver.

Page 14: ACTIVE RELIABLE MULTICAST by Li-wei H. Lehman, Stephan J. Garland, and David L. Tennenhouse

14ADVANCED COMPUTER NETWORKSCS 6390.521

Active Reliable Multicast

C.Scoped Retransmissions:Algorithm for repair packet RP:Look up unexpired NACK record andREPAIR record RR for(RP.group, RP.Source, RP.seqnumber);

If (cache available at this node) Store RP in cache; Set RP’s cache TTL to RP.t; If (NR found) Forward RP down each subscribed link in NR; Remove NR; Else For each (outgoing link) If (downstream receivers are subscribed to RP.group) Forward RP down link;

If (RR not found) Create REPAIR record RR for (RP.group, RP.source, RP.seqnumber); Set RR’s cache TTL to RP.t; For (Each link I on which RP was forwarded) Set NACK count for I in RR to RP.nackCount;

Page 15: ACTIVE RELIABLE MULTICAST by Li-wei H. Lehman, Stephan J. Garland, and David L. Tennenhouse

15ADVANCED COMPUTER NETWORKSCS 6390.521

Active Reliable Multicast

C.Scoped Retransmissions:

•ARM process REPAIR Packets same as Data Packets.

•Retransmissions is done to portions of mcast groups

•By seeing “subscription bitmap”.

•Which are suffering with losses.

•Find relevant subscription information

•If yes, router forwards REPAIR only to subscribed links.

•If no, Router caches REPAIR.

•If No cache, Forward REPAIR to all links.

•Advantageous to cache REPAIR packets than all Data Packets.

Page 16: ACTIVE RELIABLE MULTICAST by Li-wei H. Lehman, Stephan J. Garland, and David L. Tennenhouse

16ADVANCED COMPUTER NETWORKSCS 6390.521

Implementation

•Implemented ARM prototype in JAVA on SPARC under Solaris 2.5

•Prototype built using Active Node Transport System (ANTS)

•ANTS provides a set of JAVA classes which include “capsule” class.

•Capsule attaches code fragment to Data packets.

•CCF can be demand loaded at Active Nodes Dynamically.

•Different processing for Different ARM capsule types.

•ANTS provide soft-state storage for transient data at ARMs.

Page 17: ACTIVE RELIABLE MULTICAST by Li-wei H. Lehman, Stephan J. Garland, and David L. Tennenhouse

17ADVANCED COMPUTER NETWORKSCS 6390.521

Implementation

•Emulated mcast by creating many ANTS nodes.

•Work stations are connected to many ANTS nodes by 100MB Ethernet.

•ANTS nodes communicate using UDP.

•ARM increases only negligible processing.

•Efficient implementation of ARM adds 200s latency.

•Priority is to cache REPAIRs than Data Packets.

•Small portion of overall traffic requires Active processing

Page 18: ACTIVE RELIABLE MULTICAST by Li-wei H. Lehman, Stephan J. Garland, and David L. Tennenhouse

18ADVANCED COMPUTER NETWORKSCS 6390.521

Simulations

•Simulated ARM in order to evaluate performance.

•Simulated varying Active Routers in network.

•Measured tradeoffs between storage/processing and bandwidth/latency.

•Results show ARM performs well w.r.t following metrics

•Recovery Latency.

•Implosion Control

•Bandwidth Consumption

•Compared ARM with SRM using simulation results.

•Did not considered loss of NACKs, REPAIRs in addition to Data.

Page 19: ACTIVE RELIABLE MULTICAST by Li-wei H. Lehman, Stephan J. Garland, and David L. Tennenhouse

19ADVANCED COMPUTER NETWORKSCS 6390.521

Simulations

A.Recovery Latency:

•Compares loss recovery delay (LRD) for ARM to SRM.

•LRD: Time from detecting packet loss to receive repair.

•Worst Case Delay measured in units of RTT

•Group Size Range from 10 to 100.

•Adaptive in SRM adjusts for Delays.

•SRM worst Case Latency for

•Non Adaptive 2.5 RTT.

•Adaptive 1.4 RTT.

•ARM Worst case delay is 0.2 RTT.

•Local Retransmission in ARM (0.2RTT).

Page 20: ACTIVE RELIABLE MULTICAST by Li-wei H. Lehman, Stephan J. Garland, and David L. Tennenhouse

20ADVANCED COMPUTER NETWORKSCS 6390.521

Simulations

A.Recovery Latency:

•All non leaf nodes are active and does

•NACK suppression.•Scoped Retransmission.

•# of routers cache fresh data range 0%-100%.•Recovery latency cut 1/2 by 20% of cached data•ARM improves recovery latency by

•Caching mcast data.•Caching at small # of strategic active routers

Page 21: ACTIVE RELIABLE MULTICAST by Li-wei H. Lehman, Stephan J. Garland, and David L. Tennenhouse

21ADVANCED COMPUTER NETWORKSCS 6390.521

Simulations

B.Implosion Control:

•Measures effectiveness of ARM Vs SRM for NACK implosion.

•Max # NACKS handled by routers or end points to handle single loss.

•Shows what happens when packet is lost near source.

Page 22: ACTIVE RELIABLE MULTICAST by Li-wei H. Lehman, Stephan J. Garland, and David L. Tennenhouse

22ADVANCED COMPUTER NETWORKSCS 6390.521

Simulations

B.Implosion Control:

•Total # of hops NACKS go before

recovery.

•ARM performance Nodes

performing NACK suppression.

•ARM achieves benefits with < 50%

of active nodes at strategic locations.

Page 23: ACTIVE RELIABLE MULTICAST by Li-wei H. Lehman, Stephan J. Garland, and David L. Tennenhouse

23ADVANCED COMPUTER NETWORKSCS 6390.521

Simulations

B.Implosion Control:

• # of strategically placed nodes/

Total # of non-leaf nodes in mcast

tree.

•Group size taken is up to 100.

•< 50% of nodes are strategic in

mcast tree

Page 24: ACTIVE RELIABLE MULTICAST by Li-wei H. Lehman, Stephan J. Garland, and David L. Tennenhouse

24ADVANCED COMPUTER NETWORKSCS 6390.521

Simulations

B.Implosion Control:

•Compares SARM-SRM-RARM

where

•SARM: Strategically active

ARM.

•RARM: Randomly active

ARM.

•SARM approaches All active nodes in

ARM.

Page 25: ACTIVE RELIABLE MULTICAST by Li-wei H. Lehman, Stephan J. Garland, and David L. Tennenhouse

25ADVANCED COMPUTER NETWORKSCS 6390.521

Simulations

C.Bandwidth Consumption:

•Shows bandwidth consumed by REPAIR.

•Normal distrib of total # of hops in randomly generated mcast tree.

•Dose not cache fresh data.

•40% losses require REPAIR to go 10 hops.

•80% losses require REPAIR to go 12 hops.

•When only 25% of nodes are active

•40% losses require 1/3 mcast tree traversal.

•80% losses require 1/2 mcast tree traversal.

•Group size is fixed at 100.

Page 26: ACTIVE RELIABLE MULTICAST by Li-wei H. Lehman, Stephan J. Garland, and David L. Tennenhouse

26ADVANCED COMPUTER NETWORKSCS 6390.521

Simulations

C.Bandwidth Consumption:

•Shows effectiveness of scoped

retransmission for

•Different group sizes.

•Different # of active nodes in

mcast tree.

•SRM recovery bandwidth group

size.

•ARM is effective in limiting repair

traffic.

Page 27: ACTIVE RELIABLE MULTICAST by Li-wei H. Lehman, Stephan J. Garland, and David L. Tennenhouse

27ADVANCED COMPUTER NETWORKSCS 6390.521

Simulations

C.Bandwidth Consumption:

•All non leaf nodes are active, but # of

cache capable varies.

•Most benefit on bandwidth occurs

when only 40% of nodes cache repairs.

Page 28: ACTIVE RELIABLE MULTICAST by Li-wei H. Lehman, Stephan J. Garland, and David L. Tennenhouse

28ADVANCED COMPUTER NETWORKSCS 6390.521

Simulations

C.Bandwidth Consumption:

•When all nodes are cache capable

•ARM achives perfect scoping of

REPAIRS.

•If 40% can cache it shields 90% of affected

nodes.

Page 29: ACTIVE RELIABLE MULTICAST by Li-wei H. Lehman, Stephan J. Garland, and David L. Tennenhouse

29ADVANCED COMPUTER NETWORKSCS 6390.521

Related Work

•Existing work on Wide area Reliable Multicast is 2 types•SRM.•Hierarchical such as RMTP, TMTP, LBRM.

•Both approaches don’t give satisfactory solution.•Hierarchical approaches perform better for localized recovery.•General idea to send NACK upstream till “turning point”.•Difficult to decide location of “turning points”.•When ARM compares to SRM, has

•Low recovery latency.•Provides specific solution.

•ARM is flexible to hierarchical approaches.•ARM does not require all active nodes.

Page 30: ACTIVE RELIABLE MULTICAST by Li-wei H. Lehman, Stephan J. Garland, and David L. Tennenhouse

30ADVANCED COMPUTER NETWORKSCS 6390.521

Conclusions

•ARM uses intermediate nodes to reduce

•NACKs.

•Repair Traffic.

•Robust to dynamic changes in group membership.

•ARM performs well in terms of

•Recovery latency.

•Implosion Control.

•Repair Bandwidth.

•ARM performance degrades gracefully as # of active nodes decreases.