Options (as of 9/2006)
http://www.linuxbios.org/index.php/Welcome_to_LinuxBIOS close to Linux.
A custom BIOS looks like a good bet for ORBIT, if we can port it to the ORBIT node hardware. In theory, we have enough documentation of the ORBIT node hardware to do this. It will improve our node boot/imaging process in the following ways:
- It is relatively difficult to service 400 simultaneous DHCP requests with our network infrastructure. There are COTS solutions, but these are overfeatured and therefore unreasonably expensive. Observing that every node gets the same answer from the DHCP server for every request it sends (based upon its position in the grid), it would be possible to eliminate the DHCP step entirely and going straight to image download if we could pre-program nodes with their basic network identity by running our own BIOS.
- It is also difficult to tftp down a PXE image to 400 nodes simultaneously. We want to use a multicast tftp server (orthogonal with Frisbee), but there is no mtftp client in our present BIOS.
- We may be able to provide other useful features in BIOS. For example, we could inventory the devices on nodes without booting even as much as a PXE image.
- We almost certainly have not yet encountered the full extent of problems with grid/cluster computing presented by an installation such as ORBIT. An Open Source BIOS affords us a great deal of flexibility. Because LinuxBIOS is used primarly on similar installations, it may already contain solutions for the problems we have not encountered yet.
To upgrade the firmware on every ORBIT node will take a significant amount of time. It will also mean calibrating the nodes . However, the process of updating firmware and calibrating the radios can be done by a documented procedure and (relatively) unskilled labor. We estimate the ORBIT community can tolerate a day or two in which the grid is not available, if it is planned well in advance.
LinuxBIOS may be worse than what we have now. There is a chance we won't discover how much worse until the whole grid is reprogrammed.
- Can we get it to play on our hardware? Last time this was looked at (when?) there were chips for which no support already existed. Although we have complete documentation for these chips, it seems like a lot of code to write, debug, and maintain.
- Can we add the features we need (local static IP assignment, mtftp, etc.)?
- Could we update the 400 node grid in a reasonable amount of time?
- Can we eliminate the rabbit?