Using GEXEC as a mean to communicate with nodes from Console
As a next step in imaging nodes in parallel we are trying to communicate with the nodes using GEXEC instead of the now used nodehandler.
Steps in installing GEXEC
On the nodes and console first openssl needs to be installed as GEXEC uses authd as the encryption software which in turn uses private and public keys generated by Openssl. So the steps involved in installing GEXEC are as follows:
- Install openssl using apt-get install openssl.
- Download the gexec packages and its dependencies: authd and libe.
- Generate the public and private keys as follows:
console.sb1# openssl genrsa -out auth_priv.pem console.sb1# chmod 600 auth_priv.pem console.sb1# openssl rsa -in auth_priv.pem -pubout -out auth_pub.pem
- Distribute the keys to all the nodes.
console.sb1# scp auth_priv.pem node1-1:/etc/auth_priv.pem console.sb1# scp auth_pub.pem node1-1:/etc/auth_pub.pem
- Now install the 3 packages in the order:
authd libe gexec
On newer Linux kernels (e.g., the 2.4.x ), you'll need to set the LD_ASSUME_KERNEL environment variable to "2.4.2" to avoid LinuxThreads bugs (e.g., incomplete implementation of POSIX cancellation points).
In addition the /etc/services needs to be updated with
gexec 2875/tcp #GEXEC
In order to run the client program gexec the gexec deamon (gexecd) program (/usr/local/sbin/) and authd (/usr/local/sbin/) needs to be run on all the clients. A shell script(named start attached to this page) has been written for the same purpose and added to the /etc/init.d. The links to the script at startup can be created using the command:
update-rc.d start defaults
The image running gexec is stored in repository2 in /export/orbit/image/tmp/node-1-1-2006-10-03-13-16-05.ndz.
Installing GEXEC in the PXE-Image
To install gexec service in the pxe image following changes have to be made to the pxe makefile:
- The GEXEC has problems running on kernel version 2.6.14 (current pxe version). So change the version to 2.6.12 (same as the baseline kernel version).
- Add all the lib depencies of gexec: /usr/lib/libssl.so.0.9.8 /usr/local/lib/libe.a /usr/lib/libcrypto.so.0.9.8 /usr/lib/libz.so.1 /lib/libcrypt.so.1 /lib/libpthread.so.0
- Add the keys auth_priv.pem and auth_pub.pem to /etc/<file_name>.
- Add the required binary files (gexec ,gexecd and authd) to /usr/sbin.
- Add a shell script start to the init.d/rcS script to be executed at the time of booting of image. The script performs 3 operations:
- Runs authd
- Runs gexecd
III.Loads the environment variable LD_ASSUME_KERNEL="2.4.2" for the reasons stated above.
- The resultant image has been named as orbit-parallel.
On this orbit-parallel pxe image it is possible to run gexec from the console with the GEXEC_SVRS set to the particular node.
Imaging Nodes using GEXEC
To image the nodes using gexec I have made some changes to the imageNodes. This imagenodes can be used to load images on one node only. One can load multiple images on different nodes using the script on different windows simultaneosly. The script imageNodes4 basically performs the following steps:
- Takes as input the node and the image to be loaded (same as imageNodes).
- Sets up soft tftp links to the node for orbit-parallel pxe image.
- Boots on the node using cmc service.
- The node boots into the orbit-parallel image.
- The image provided by the user is launched using the frisbee service.
- The environmental variables for gexec are set accordingly i.e. the GEXEC_SVRS and LD_ASSUME_KERNEL.
- The port number on which the frisbee server is launched is determined and the frisbee client is set up on the node using gexec. The command looks as follows:
gexec -n 0 frisbee -p <port no> -m <multicast address> -i <node_ip> /dev/hda
- The boot image is cleared.
- Once the image is copied to the node using the frisbee service, the node is reset using cmc service to boot into the loaded image.
The imageNodes4 may be found attached with the webpage.