wiki:Internal/Repair

Version 2 (modified by Joseph F. Miklojcik III, 17 years ago) ( diff )

CMC need not send inventory command, it will run automatically

1) Clear some space in the node repair area. Obtain a Philips-head screwdriver and a bin for garbage. Open a web browser, and ssh sessions to dhcp1.orbit-lab.org and repository2.orbit-lab.org (probably through gw.orbit-lab.org) on a network connected computer.

2) Make a page in the orbit-lab.org wiki with a name matching the template Internal/RepairYYYYMMDD (Internal/Repair20070520 for example). Write the current time and whoever is helping do the repairs on this wiki page.

3) Determine the set of nodes you are going to replace. These will be any nodes marked as red on orbit-lab.org/wiki/Status, or nodes in which the CM cannot reliably power up the node. Do not repair more than ten at a time. Write the coordinates of these nodes down in the wiki page for the repair. Note which of those node positions are supposed to have Atheros and which are supposed to have Intel. It simplifies things if you can do all Atheros or all Intel nodes in a particular round of repairs.

4) Comment out lines for these nodes from dhcp1:/etc/dhcp3/dhcpd.conf. Restart dhcpd on dhcp1.

5) For each node to be repaired, remove each node from its mounting in the grid, leaving the node id box attached. As you remove nodes, take them and their node id box back to the node repair area. One or two other people can work on nodes in the node repair area while one person moves nodes back and forth from the grid. Note any exceptional hardware or incorrectly installed connections on the wiki page.

6) Once in the node repair area, remove the node id box and then the yellow node enclosure. Verify that the node id boxes match the list of nodes to be repaired on the wiki page, and that the 802.11 hardware vendor matches what is expected. Note exceptions on the wiki page.

7) Replace the power supply. Take care to put old power supplies in the garbage bin. If the 802.11 hardware vendor did not match what is expected, correct the hardware. Replace the enclosure. Replace the node id box.

8) Calibrate the node (NYI).

8) Replace the node in the grid. Verify the node id box against two adjacent nodes.

9) Once all nodes have been repaired and replaced, verify that the nodes are not red on the orbit-lab.org/wiki/Staus page. That is, that the CM reports back to the CMC correctly.

10) Turn the repaired nodes on. Because they obtain pool addresses from dhcp, they will load an 'inventory' image (NYI). Wait five minutes for the inventory image to finish loading, and for the inventory script to run.

11) Run the gendhcpconf script on repository2. Compare its output with the entries you commented out in step 4. Correct dhcp1:/etc/dhcp3/dhcpd.conf if needed.

12) During the following maintenance slot, verify that you can image all nodes that have been repaired since the last maintenance slot by running the CM stress experiment (NYI).

Note: See TracWiki for help on using the wiki.