wiki:Old/NodeHandler/Multicast

Version 49 (modified by sswami, 18 years ago) ( diff )

Table of Contents

    Error: Page Documentation/NodeHandler/Multicast does not exist

Reliable Multicast Architecture Design

Saswati Swami (sswami@eden.rutgers.edu)

Introduction

The current NodeHandler code works satisfactorily on the small grid and the sandboxes. But this same code fails to work correctly on the big grid. This is due to the fact that in the current grid consisting of 400 nodes, packet loss is a major problem. And this problem escalates sharply with the increase in the no. of nodes. Specifically, when trying to image more than 150 nodes in a single attempt, the high packet loss prevents successful completion. To alleviate this problem, it has been decided to explore the use of a reliable multicast protocol. The implementation being considered here is MCLv3, which is an Open Source Implementation of the ALC and NORM Reliable Multicast Protocols.

Major Design Requirements

R.1:

It has been decided that a feedback-free reliable multicast protocol will be used and that all 
feedbacks will be sent through TCP. This is because then

- reliable feedbacks can then be ensured, 
- explicit control over the feedback message content can be allowed,
- integrating the feedback messages with the existing message processing code in the 
  NodeHandler will be easier e.g. sequence id correlation, etc,
- existing messages being sent from the NodeAgent to the NodeHandler can be modified to 
  serve the dual purpose of providing feedbacks too.

MCLv3 is an Open Source Implementation of the ALC and the NORM Reliable Multicast Protocols.
Of these 2 protocols, only the use of the ALC/LCT protocol is being explored here. This is 
because the ALC/LCT protocol is feedback-free and also it provides an unlimited scalability. 
NORM lacks both these attributes.

R.2:

All communication will be handled in the communication layer which will be a separate process.
ALC/LCT is a multi-threaded implementation and so we are not sure of the issues that may arise 
if it is made into a loadable library instead of a separate process. The present focus is on 
exploring reliable multicast and once this issue is resolved, the issues pertaining to 
converting this process into a loadable library will be addressed to.

At this time, only changes to the communication layer in the NodeHandler is being considered. 
Similar changes to the communication layer in the NodeAgent will be considered later. At the 
moment, minor changes will be made to the current NodeAgent communication layer. The changes 
made will be limited to conforming to the new NodeHandler communication layer, e.g. existing
UDP socket calls and socket processing code will be changed to that for TCP sockets.

R.3:

The communication layer will use two separate approaches, one for sending messages and the 
other for receiving messages. Messages being sent from the NodeHandler to the NodeAgent will 
use ALC/LCT. A single message will be sent by the NodeHandler using ALC/LCT and this message 
will be received by all the NodeAgents. 

Messages being received from the NodeAgent will be use TCP. The NodeAgent communication layer 
will be modified to send all messages to the NodeHandler using TCP. 

R.4:

The messages sent from the NodeHandler to the NodeAgent consist of commands to be executed on
the NodeAgent. These messages may be sent to all the nodes in the multicast group or to a 
subset of the nodes in the multicast group based on node Alias. If the message has to be sent 
to a subset of the nodes, then the NodeHandler will indicate as such to the communication
layer and also identify the set of nodes which will receive the message. Otherwise, by 
default, the communication layer will send the message to all the nodes. 

After a message is sent, the communication server will wait for ACKs from the NodeAgent, which
will be received through the TCP socket. All message-ACK correlation for each node will be 
done by the communication server. Also, it will, after a pre-defined interval, repeatedly send 
the command till it receives an ACK confirming receipt of a previously sent message from all 
the intended nodes. Only after all the NodeAgents have confirmed successful receipt of the 
command, will the communication server initmate the NodeHandler to proceed with sending the 
next command. 

This amounts to an error correcting mechanism on top of reliable multicast, but it has been 
deemed necessary because the ACL/LCT implementation is not fully reliable in the sense that it
doesn't guarantee reliable delivery.

R.5:

The communication layer will initially be a separate server that is running the reliable 
multicast protocol. It will also handle all TCP socket related functions. This separation of 
processes will help in isolating and subsequent easy resolution of all communication related 
issues. The IPC mechanism between this server and the NodeHandler will be implemented using 
pipes. When the NodeHandler wants to send a message to the NodeAgent, this message will be 
piped to the server which will then send the message using multicast. Again, when a message is 
received from the NodeAgent by this server, it will pipe this message to the NodeHandler. 

Later this separate server can be combined with the NodeHandler as a loadable library if there 
are no significant performance issues found.

R.6:

The communication server will not pipe the heartbeats from the NodeAgents to the NodeHandler. 
Instead, it will keep track of these messages on a per-node basis and on detecting a breakdown
 in communication; it will send a RETRY message to the NodeAgent. The NodeAgent will consider 
it to be a message from the NodeHandler.

R.7:

All issues relating to scaling impacts on the decision to use TCP will be thoroughly 
investigated. TCP is a quick way for us to not think of reverse path reliability. Once we get 
to proper scaling on the forward path, we will switch to UDP, if necessary. We might also 
implement some scheme to prioritize the messages.

Overall Architecture

Software Design

See Also

http://www.ietf.org/internet-drafts/draft-ietf-rmt-fec-bb-revised-3.txt

http://www.ietf.org/internet-dratfs/draft-ietf-rmt-bb-fec-ldpc-01.txt

http://www.inrialpes.fr/planete/people/roca/mcl/norm_infos.html

Note: See TracWiki for help on using the wiki.