an active reliable multicast framework for the grids m. maimour & c. pham iccs 2002, amsterdam...

An Active Reliable Multicast Framework for the Grids

M. Maimour & C. Pham

ICCS 2002, AmsterdamNetwork Support and Services for Computational Grids Sunday, April 21st, 2002

http://www.ens-lyon.fr/LIP/RESAM

Action INRIA-RESO

2

Outline

Motivations behind (reliable) multicast

Use of active networks : the DyRAM protocol

DyRAM main services Simulation results Conclusion

3

From unicast…

Problem Sending same data to many receivers via unicast is inefficient.

Sender

data

datadata

data

Receiver Receiver Receiver

datadata

4

…to multicast on the Internet.Sender

data

datadata

data

Receiver Receiver Receiver

Problem Sending same data to many receivers via unicast is inefficient.

SolutionUsing multicast is more efficient

5

At the routing level, IP Multicast efficiently delivers packets to all the receivers subscribed to a multicast session but without any reliability guarantees.

Reliability (including flow and congestion control) is to be addressed at the transport level.

Reliable multicast

6

Data replications

Database updates

Code & data transfers

Data communications for distributed applications (collective & gather operations, sync. barrier)

Data replications

Database updates

Code & data transfers

Data communications for distributed applications (collective & gather operations, sync. barrier)

Reliable multicast: a big win for grids

Multicast address group 224.2.0.1

224.2.0.1

SDSC IBM SP1024 procs5x12x17 =1020

NCSA Origin Array256+128+1285x12x(4+2+2) =480

CPlant cluster256 nodes

7

Reliable multicast strategies

End-to-end solutions :Only the end hosts (the source and/or the receivers) are involved.Problem : the lack of topology information at the end hosts.

In-network solutions :Some intermediate nodes (router/server) are involved in the recovery process.

8

Active networking solutions

Active routers are able to perform customized computations on incoming packets: cache of data, feedback aggregation, filtering, subcasting, …

9

The DyRAM framework for grids(Dynamic Replier Active Reliable Multicast)

In order to enable distributed grid applications, main design goals are :

low recovery latency using local recovery

low memory usage in routers : local recovery is performed from the receivers (no cache in routers)

low processing overheads in routers : light active services

10

DyRAM loss recovery strategy : main active services

DyRAM is NACK-based …

Global NACK suppression Early packet loss detection Subcast of repair packets Dynamic replier election

11

Global NACKs suppression

NACK4NACK4

NACK4

NACK4data4

NACK4

only one NACK is forwarded to the source

12

Early loss packet detection

NACK4

NACK4

NACK4

NACK4

NACK4

A NACK is sent by the router

data3data4

data5

The repair latency can be reduced if the lost packet could be requested as soon as possible

These NACKs are ignored!

13

Replier election

A receiver is elected to be a replier for each lost packet (one recovery tree per packet)

Load balancing can be taken into account for the replier election

Replier election and repair subcast

IP multicastIP multicast

IP multicast

DyRAMDyRAM

IP multicast

IP multicast

DyRAMDyRAM

R1

R2R3R4

R5 R6 R7

0

12

1 0

NAK 2,@ NAK 2,@

NAK 2,@

NAK 2 from link 1NAK 2 from link 2

NAK 2

Repair 2

Repair 2

Repair 2

Repair 2

D0

D1

NAK 2

NAK 2

core networkGbits rate

1000 Base FX

active routeractive router

active router

active router

active router

100 Base FX

sourcesourceThe backbone is very fast so nothing else than fast forwarding functions.

• Nacks suppresion• Subcast• Loss detection

A hierarchy of active routers can be used for processing specific functions at different layers of the hierarchy.

Any receiver can be elected as a replier for a loss packet.

•Nacks suppression•Subcast •Replier election

The DyRAM framework for grids

16

Some simulation results

Network model and metrics used Local recovery from the receivers DyRAM vs. ARM (cache in routers) DyRAM : early lost packet detection

17

Network model

10 MBytes file transfer

Source router

18

Metrics

Load at the source : the number of the retransmissions from the source.

Load at the network : the consumed bandwidth.

Completion time per packet (latency).

19

Local recovery from the receivers (1)

Local recoveries reduces the end-to-end delay (especially for high loss rates and a large number of receivers).

#grp: 6…24

4 receivers/group

p=0.25

20

Local recovery from the receivers (2)

As the group size increases, doing the recoveries from the receivers greatly reduces the bandwidth consumption

48 receivers distributed in g groups #grp: 2…24

21

DyRAM vs ARM

ARM performs better than DyRAM only for very low loss rates and with considerable caching requirements

22

DyRAM: early lost packet detection

#grp: 6…244 receivers/group

The end-to-end latency is decreased when the early lost packet detection is enabled

23

Conclusions

Reliability on large-scale multicast is difficult.

Active services can provide more efficient solutions for reliable multicast related problems.

Main DyRAM design goal is reducing the end-to-end latencies using active services

which are keeped as light as possible making DyRAM more suitable to grid applications.

an active reliable multicast framework for the grids m. maimour & c. pham iccs 2002, amsterdam...

Documents

reliable multicast slide

source slide

efficient slide

cache of data

barrier reliable multicast

multicast session

main active services

sender data receiver