an active reliable multicast framework for the grids m. maimour & c. pham iccs 2002, amsterdam...
Post on 22-Dec-2015
217 views
TRANSCRIPT
An Active Reliable Multicast Framework for the Grids
M. Maimour & C. Pham
ICCS 2002, AmsterdamNetwork Support and Services for Computational Grids Sunday, April 21st, 2002
http://www.ens-lyon.fr/LIP/RESAM
Action INRIA-RESO
2
Outline
Motivations behind (reliable) multicast
Use of active networks : the DyRAM protocol
DyRAM main services Simulation results Conclusion
3
From unicast…
Problem Sending same data to many receivers via unicast is inefficient.
Sender
data
datadata
data
Receiver Receiver Receiver
datadata
4
…to multicast on the Internet.Sender
data
datadata
data
Receiver Receiver Receiver
Problem Sending same data to many receivers via unicast is inefficient.
SolutionUsing multicast is more efficient
5
At the routing level, IP Multicast efficiently delivers packets to all the receivers subscribed to a multicast session but without any reliability guarantees.
Reliability (including flow and congestion control) is to be addressed at the transport level.
Reliable multicast
6
Data replications
Database updates
Code & data transfers
Data communications for distributed applications (collective & gather operations, sync. barrier)
Data replications
Database updates
Code & data transfers
Data communications for distributed applications (collective & gather operations, sync. barrier)
Reliable multicast: a big win for grids
Multicast address group 224.2.0.1
224.2.0.1
SDSC IBM SP1024 procs5x12x17 =1020
NCSA Origin Array256+128+1285x12x(4+2+2) =480
CPlant cluster256 nodes
7
Reliable multicast strategies
End-to-end solutions :Only the end hosts (the source and/or the receivers) are involved.Problem : the lack of topology information at the end hosts.
In-network solutions :Some intermediate nodes (router/server) are involved in the recovery process.
8
Active networking solutions
Active routers are able to perform customized computations on incoming packets: cache of data, feedback aggregation, filtering, subcasting, …
9
The DyRAM framework for grids(Dynamic Replier Active Reliable Multicast)
In order to enable distributed grid applications, main design goals are :
low recovery latency using local recovery
low memory usage in routers : local recovery is performed from the receivers (no cache in routers)
low processing overheads in routers : light active services
10
DyRAM loss recovery strategy : main active services
DyRAM is NACK-based …
Global NACK suppression Early packet loss detection Subcast of repair packets Dynamic replier election
11
Global NACKs suppression
NACK4NACK4
NACK4
NACK4data4
NACK4
only one NACK is forwarded to the source
12
Early loss packet detection
NACK4
NACK4
NACK4
NACK4
NACK4
A NACK is sent by the router
data3data4
data5
The repair latency can be reduced if the lost packet could be requested as soon as possible
These NACKs are ignored!
13
Replier election
A receiver is elected to be a replier for each lost packet (one recovery tree per packet)
Load balancing can be taken into account for the replier election
Replier election and repair subcast
IP multicastIP multicast
IP multicast
DyRAMDyRAM
IP multicast
IP multicast
DyRAMDyRAM
R1
R2R3R4
R5 R6 R7
0
12
1 0
NAK 2,@ NAK 2,@
NAK 2,@
NAK 2 from link 1NAK 2 from link 2
NAK 2
Repair 2
Repair 2
Repair 2
Repair 2
D0
D1
NAK 2
NAK 2
core networkGbits rate
1000 Base FX
active routeractive router
active router
active router
active router
100 Base FX
sourcesourceThe backbone is very fast so nothing else than fast forwarding functions.
• Nacks suppresion• Subcast• Loss detection
A hierarchy of active routers can be used for processing specific functions at different layers of the hierarchy.
Any receiver can be elected as a replier for a loss packet.
•Nacks suppression•Subcast •Replier election
The DyRAM framework for grids
16
Some simulation results
Network model and metrics used Local recovery from the receivers DyRAM vs. ARM (cache in routers) DyRAM : early lost packet detection
18
Metrics
Load at the source : the number of the retransmissions from the source.
Load at the network : the consumed bandwidth.
Completion time per packet (latency).
19
Local recovery from the receivers (1)
Local recoveries reduces the end-to-end delay (especially for high loss rates and a large number of receivers).
#grp: 6…24
4 receivers/group
p=0.25
20
Local recovery from the receivers (2)
As the group size increases, doing the recoveries from the receivers greatly reduces the bandwidth consumption
48 receivers distributed in g groups #grp: 2…24
21
DyRAM vs ARM
ARM performs better than DyRAM only for very low loss rates and with considerable caching requirements
22
DyRAM: early lost packet detection
#grp: 6…244 receivers/group
The end-to-end latency is decreased when the early lost packet detection is enabled
23
Conclusions
Reliability on large-scale multicast is difficult.
Active services can provide more efficient solutions for reliable multicast related problems.
Main DyRAM design goal is reducing the end-to-end latencies using active services
which are keeped as light as possible making DyRAM more suitable to grid applications.