Investigating parallel imaging using gexec
Current approach for imaging a node
The task of parallelization of imaging basically involves running multiple instances of nodeHandler which would launch different images on different nodes. The basic sequence of imaging of nodes as of now is as follows:
- The user runs the script imageNodes with the nodes and the image to be loaded as inputs.
- The parameters are passed on to the nodeHandler which then resets the specific nodes using cmc.
- It also launches the frisbee server from the tftpboot which sends out the particular image on a multicast address and creates tftp links to the node to provide nodes with the memory based image.
- On the node side, the nodes when started boot into the memory image and then launch the frisbee client.
- The frisbee client fetches disk chunks from the frisbee client and writes it to the hard disk.
- The tftp links for the nodes are then removed.
- From now on the nodes boot up from the image stored in the hard disk.
Changes needed for parallel imaging processes
The imaging of nodes in parallel involves launching multiple instances of nodehandler which in turn implies launching more than one frisbee servers with different multicast addresses throwing away different images. At present the frisbee server settings are such that one multicast address per domain is allowed. For e.g. if a user is using a sb only one multicast address for a particular sb is allowed. So in case a user runs a second imageNodes in parallel, the frisbee server fetches the same nulticast address that results in an error. Thus the basic steps involved in parallelization would be :
- Tweaking the way frisbee server is launched, so that even if two frisbee servers are launched in the same domain, they throw images on different multicast addresses.
- Making appropriate changes in nodeHandler.rb since right now the script only allows one instance of nodeHandler to run at a time.
An alternative approach could be launching the frisbee server and client manually for different images. On a high level the basic steps involved would be:
- Create the tftp links to the nodes manually so that the nodes can boot into the memory based images.
- Launch the frisbee servers from tftpboot in reposotory2 using different multicast addresses.
- Launch the frisbee client on the nodes manually, with multicast addresses corresponding to the images to be loaded.
As a first step, some changes have been made to the imageNodes shell script which result in the identification of any concurrent running instance of nodeHandler. If the nodeHandler is already being run, it gives out a warning message to the user and exits. The nodeHandler in the meanwhile continues the imaging of the nodes. In case nodeHandler is not running, it will go ahead and image the nodes as specified. The modified imageNodes (imageNodes_new) is attached at the end of the page.
Date : 9/19/06
To image the nodes in parallel, instead of running multiple instances of nodeHandler, we are trying to bypass the nodeHandler itself and. The basic three functions of the nodeHandler
- Creating soft links with the node to server the basic memory based image,
- Launching the frisbee server
- And reseting the respective nodes.
will now be performed by the shell script imageNodes itself.
All these three commands can be executed using the xml querries at the command line. The querries for these three operations are:
For creating soft pxe links with a particular node:
http://pxe:5012/pxe/setBootImage?img=<image name>&node≤node ID>&ip≤ip address of node>
The pxe service also provides a mean to set boot image for all the nodes at once with allSetBootImage? which is not working right now. Possibly its just a dummy query.
For launching the frisbee server:
http://frisbee:5012/frisbee/getAddress?img=<image name to be launched>
In case the image is already launched, it will return the multicast address and port number for that image. If not it will launch that image on new port number and a default multicast address. Since new images are launched on new port, different multicast addresses for different images are not required.
Similarly the cmc service can be used to reset the nodes as follows:
http://cmc:5012/cmc/reset?x=<x co-ordinate of node>?&y≤y coordinate of node>
To implement these changes the imageNodes has been changed a bit.
- On running imageNodes it fisrt checks for the domain its running on and the nodes on which it has to serve the soft pxe link. It then creates a soft link using the pxe server, with the subnet number of the domain itself.
- It then launches the frisbee server on the tftpboot and starts giving out the image specified at the command line.
- Resets the specific nodes using cmc.
Thus the revised version of imageNodes completly by passes nodeHandler.
- One problem being faced in this process is when to terminate the pxe links and delaunch the frisbee server. One simple but dumb solution would be to wait a comfortable period of time (say 5 mins) and break the pxe links. This is a matter of furthur consideration.
- The work has only been done on the frisbee server side. The respective nodes still need to launch the frisbee client and get the image multicast by frisbee server.
The changed copy of imageNodes is attached at the end of the page.
Kindly refer to imageNodes2 as the most recent copy of imageNodes.
After the imagenodes is run at the console, the frisbee client needs to be launched at the individual nodes. The frisbee client can be launched after the nodes boot into the basic image served by the pxe link. To launch the client, one needs to telnet into the node and then launch the application as follows:
./frisbee -p <port number> -m <multicast address> -i <interface ip address> <the memory location for the image>
frisbee is located in the /usr/sbin directory.
Experiments to observe the performance of orbit while nodes are being imaged in parallel.
In order to evaluate any performance issues, if occuring, while the nodes are being imaged in parallel a basic ping experiment was performed to calculate the RTT from the console to the individual nodes while the nodes were being imaged in parallel. Following are the results:
Experiment 1: On sb1 both the nodes were imaged at the same time using two different images, The RTT obtained was:
round-trip min/avg/max/ = 0.12/0.2/1.8/ ms
while the RTT values obtained while a single node was being imaged were:
round-trip min/avg/max = 0.1/0.2/1.2 ms
Experiment 2: On grid
- RTT values when 5 nodes were being imaged with same image:
rtt min/avg/max/mdev = 0.142/0.287/6.648/0.691 ms
- RTT values when 10 nodes were being imaged with 2 different images:
rtt min/avg/max/mdev = 0.126/0.288/7.413/0.734 ms
- RTT values when 15 nodes were being imaged with 3 different images:
RTT min/avg/max/mdev = 0.138/0.315/7.215/0.792 ms
- RTT values once all the images are loaded and nodes have been reset:
RTT min/avg/max/mdev = 0.087/0.148/0.221/0.039 ms
Imaging Nodes in parallel using imageNodes5:
The final script for imaging nodes in parallel is named imageNodes5. The script runs as follows:
imageNodes5 <text file containing coordinates of nodes> <name of the image to be loaded>
Thus a major change from the previous imageNodes is the nodes are taken as coordinates from a text file. In order to image another set of nodes with a different image, one has to use imageNodes from another console window with a new text file consisting of the node coordinates.
The script imageNodes5 may be found attached to the page.
Ping Experiments on Grid
Ping experiments have been performed to determine the performance of grid when large number of nodes are being imaged in parallel, with different images. The results have been tabulated below:
|Experiment||No of Packets sent||% Lost||min RTT||avg RTT||max RTT||mdev|
|5 grps of 20 nodes being imaged with same image||178||11%||0.119 ms||10.434||100.857||23.052|
|5 grps of 20 nodes imaged with 2 different images||180||15%||0.119 ms||5.172||169.088||22.689|
|10 grps of 20 nodes with 3 different images||174||4%||0.084 ms||2.184||99.143||10.905|
|1 grp of 100 nodes with same image||176||27%||0.140 ms||11.937||371.617||41.183|