[[TOC(Documentation/NodeHandler/Multicast)]] = Reliable Multicast Architecture Design = {{{ Saswati Swami (sswami@eden.rutgers.edu) }}} == Introduction == The current NodeHandler code works satisfactorily on the small grid and the sandboxes. But this same code fails to work correctly on the big grid. This is due to the fact that in the current grid consisting of 400 nodes, packet loss is a major problem. And this problem escalates sharply with the increase in the no. of nodes. Specifically, when trying to image more than 150 nodes in a single attempt, the high packet loss prevents successful completion. To alleviate this problem, it has been decided to explore the use of a reliable multicast protocol. The implementation being considered here is MCLv3, which is an Open Source Implementation of the ALC and NORM Reliable Multicast Protocols. == Major Design Requirements == '''R.1:''' {{{ It has been decided that a feedback-free reliable multicast protocol will be used and that all feedbacks will be sent through TCP. This is because then - reliable feedbacks can then be ensured, - explicit control over the feedback message content can be allowed, - integrating the feedback messages with the existing message processing code in the NodeHandler will be easier e.g. sequence id correlation, etc, - existing messages being sent from the NodeAgent to the NodeHandler can be modified to serve the dual purpose of providing feedbacks too. MCLv3 is an Open Source Implementation of the ALC and the NORM Reliable Multicast Protocols. Of these 2 protocols, only the use of the ALC/LCT protocol is being explored here. This is because the ALC/LCT protocol is feedback-free and also it provides an unlimited scalability. NORM lacks both these attributes. }}} '''R.2:''' {{{ All communication will be handled in the communication layer which will be a separate process. ALC/LCT is a multi-threaded implementation and so we are not sure of the issues that may arise if it is made into a loadable library instead of a separate process. The present focus is on exploring reliable multicast and once this issue is resolved, the issues pertaining to converting this process into a loadable library will be addressed to. At this time, only changes to the communication layer in the NodeHandler is being considered. Similar changes to the communication layer in the NodeAgent will be considered later. At the moment, minor changes will be made to the current NodeAgent communication layer. The changes made will be limited to conforming to the new NodeHandler communication layer, e.g. existing UDP socket calls and socket processing code will be changed to that for TCP sockets. }}} == Overall Architecture == == Software Design == == See Also == [http://www.ietf.org/internet-drafts/draft-ietf-rmt-fec-bb-revised-3.txt] [http://www.ietf.org/internet-dratfs/draft-ietf-rmt-bb-fec-ldpc-01.txt] [http://www.inrialpes.fr/planete/people/roca/mcl/norm_infos.html]