[[TOC(Documentation/NodeHandler/Broadcast)]]
= Architecture Design =

{{{
Saswati Swami (sswami@eden.rutgers.edu)
}}}

== Introduction ==
The current NodeHandler code works satisfactorily on the small grid and the sandboxes. But this 
same code fails to work correctly on the big grid. This is due to the fact that in the current 
grid consisting of 400 nodes, packet loss is a major problem. And this problem escalates 
sharply with the increase in the no. of nodes. Specifically, when trying to image more than 150 
nodes in a single attempt, the high packet loss prevents successful completion. To alleviate 
this problem, it has been decided to explore the use of broadcast instead of multicast. 

== Major Design Requirements ==
'''R.1:'''
{{{
It has been decided that all communications from the NodeHandler to the NodAgent will be 
through broadcast and that all feedbacks from the NodeAgent to the NodeHandler will be sent 
through TCP. This is because then

- reliable feedbacks can then be ensured, 
- explicit control over the feedback message content can be allowed,
- integrating the feedback messages with the existing message processing code in the 
  NodeHandler will be easier e.g. sequence id correlation, etc,
- existing messages being sent from the NodeAgent to the NodeHandler can be modified to 
  serve the dual purpose of providing feedbacks too.

}}}

'''R.2:'''
{{{
All communication will be handled in the communication layer which will be a separate process.
The present focus is on exploring reliable communication with minimum packet loss and once this 
issue is resolved, the issues pertaining to converting this process into a loadable library 
will be addressed to.

This will need changes to the communication layer in both the NodeHandler and the NodeAgent. 
}}}

'''R.3:'''
{{{
The communication layer in the NodeHandler will use two separate approaches, one for sending messages and the other for receiving messages. Messages being sent from the NodeHandler to the 
NodeAgent will use broadcast. A single message will be broadcast by the NodeHandler and this 
message will be received by all the NodeAgents. 

Messages being received from the NodeAgent will be use TCP. The NodeAgent communication layer 
will be modified to send all messages to the NodeHandler using TCP. 
}}}

'''R.4:'''
{{{
The messages sent from the NodeHandler to the NodeAgent consist of commands to be executed on
the NodeAgent. Since the communication layer will broadcast the message to all the nodes, the 
NodeAgents will have the filters to deteremine whether a message is to be accepted / rejected.
Current NodeAgent code has such filters and these will be enhanced only if necessary.

After a message is sent, the communication server will wait for ACKs from the NodeAgent, which
will be received through the TCP socket. All message-ACK correlation for each node will be 
done by the communication server. Also, it will, after a pre-defined interval, repeatedly send 
the command till it receives an ACK confirming receipt of a previously sent message from all 
the intended nodes. Only after all the NodeAgents have confirmed successful receipt of the 
command, will the communication server initmate the NodeHandler to proceed with sending the 
next command. 

}}}

'''R.5:'''
{{{
The communication layer will initially be a separate server that is running the reliable 
multicast protocol. It will also handle all TCP socket related functions. This separation of 
processes will help in isolating and subsequent easy resolution of all communication related 
issues. The IPC mechanism between this server and the NodeHandler will be implemented using 
pipes. When the NodeHandler wants to send a message to the NodeAgent, this message will be 
piped to the server which will then send the message using multicast. Again, when a message is 
received from the NodeAgent by this server, it will pipe this message to the NodeHandler. 

Later this separate server can be combined with the NodeHandler as a loadable library if there 
are no significant performance issues found.
}}}

'''R.6:'''
{{{
The communication server will not pipe the heartbeats from the NodeAgents to the NodeHandler. 
Instead, it will keep track of these messages on a per-node basis and on detecting a breakdown
in communication; it will send a RETRY message to the NodeAgent. The NodeAgent will consider 
it to be a message from the NodeHandler.
}}}

'''R.7:'''
{{{
All issues relating to scaling impacts on the decision to use TCP will be thoroughly 
investigated. TCP is a quick way for us to not think of reverse path reliability. Once we get 
to proper scaling on the forward path, we will switch to UDP, if necessary. We might also 
implement some scheme to prioritize the messages.
}}}


== Overall Architecture ==

== Software Design ==

== Algorithm ==

The NodeHandler and the NodeAgent processes on startup will open the required pipes and then 
fork the Communication Server process. The existing code will be modified to send all 
messages through a pipe instead of a socket. Also, access to the pipe will be serialized. The 
Communication Server is a 2-threaded process.

'''Note: '''
This is a first test implementation. The final implementation will have multiple TCP 
connections - one for each NodeAgent. Also, all issues relating to message size will be 
investigated.

'''NodeHandler Communication Server'''

Following is the pseudo-code for this process functionality:

{{{
1.  MAIN THREAD
    ===========
    create the new receiving thread       /* this thread (2.) will receive TCP message */
    WHILE (true)
       message = recv(pipe)               /* receive message from NodeHandler*/
       IF (messgage != "SHUTDOWN") 
           SendReliableBroadcast(message)
       ELSE
           initiate graceful shutdown
       END IF
    END WHILE

    FUNCTION SendReliableBroadcast(message)
       IF (first invocation)              /* sp that the Socket is created only once */
          open a new Broadcast Server Socket
       END IF
       Setup ACK list
       set complete = 0
       WHILE (!complete)
         sendto(BROADCAST_ADDR, message)  /* send/resend the message */
         sleep(50ms)

         lock(ACK_list)
         read ACK_list
         unlock(ACK_list)
         
         IF (ACK_LIST_COMPLETE)
            complete = 1
         END IF
       END WHILE
    END FUNCTION

      
2. Receiving Thread
   ================
   create a new TCP socket
   bind 
   listen(400)
   WHILE (true)
     accept(a connection)
     read(message)
     IF (message == ACK)
        lock(ACK_list)
        update ACK list
        IF (ACK_LIST_COMPLETE)
           unlock(ACK_list)
           send "WAKEUP" to main thread
        ELSE IF (ACK_LIST_INCOMPLETE && TIMEOUT)
           unlock(ACK_list)
           lock(pipe_access)
           send(pipe) "TIMEOUT" message to Nodehandler
           unlock(pipe_access)
        END IF
        
     ELSE IF (message == EVENT)
        lock(pipe_access)
        send(pipe) message to NodeHandler
        unlock(pipe_access)
     END IF
     close(connection)
   END WHILE
}}}

'''NodeAgent Communication Server'''

Following is the pseudo-code for this process functionality:
{{{
1. MAIN THREAD
   ===========
   create a TCP connection
   create the new Broadcast Client socket  
   create the receiving thread and pass TCP sockfd  /* this thread will send TCP message */
    WHILE (true)
       message = recvfrom(socket)         /* receive message from NodeHandler*/
       IF (message for this NodeAgent)
          construct ACK message
          send ACK message to the NodeHandler COmmunication Server
          send(pipe) message to the NodeAgent
       END IF
    END WHILE

2. RECEIVING THREAD
   ================
   
   WHILE (true)
       recv(pipe) message from the NodeAgent
       Send message to the NodeHandler COmmunication Server
   END WHILE
   
}}}

== See Also ==

[http://www.orbit-lab.org/wiki/Internal/DesignNotes]