wiki:Internal/Repair

1) Clear some space in the node repair area. Obtain a Philips-head screwdriver and a bin for garbage. Open a web browser, and ssh sessions to dhcp1.orbit-lab.org and repository2.orbit-lab.org (probably through gw.orbit-lab.org) on a network connected computer.

2) Make a page in the orbit-lab.org wiki with a name matching the template Internal/RepairYYYYMMDD (Internal/Repair20070520 for example). Write the current time and whoever is helping do the repairs on this wiki page.

3) Determine the set of nodes you are going to replace. These will be any nodes marked as red on orbit-lab.org/wiki/Status, or nodes in which the CM cannot reliably power up the node. Do not repair more than ten at a time. Write the coordinates of these nodes down in the wiki page for the repair. Note which of those node positions are supposed to have Atheros and which are supposed to have Intel. It simplifies things if you can do all Atheros or all Intel nodes in a particular round of repairs.

4) For each node to be repaired, remove each node from its mounting in the grid, leaving the node id box attached. As you remove nodes, take them and their node id box back to the node repair area. One or two other people can work on nodes in the node repair area while one person moves nodes back and forth from the grid. Note any exceptional hardware or incorrectly installed connections on the wiki page.

5) Once in the node repair area, remove the node id box and then the yellow node enclosure. Verify that the node id boxes match the list of nodes to be repaired on the wiki page, and that the 802.11 hardware vendor matches what is expected. Note exceptions on the wiki page.

6) Replace the power supply. When attaching the new power supply to the chassis, put in all four screws loosely first, then tighten once all four are in place. Take care to put old power supplies in the garbage bin. If the 802.11 hardware vendor did not match what is expected, correct the hardware. Replace the enclosure. Replace the node id box. Do not replace the side enclosure screw, we no longer use them.

7) Bring the node to the calibration shelf. Attach the CM cable, CONTROL cable, power cable, and two antennas. Run the calibration experiment.

8) Replace the node in the grid. Re-attach all peripherals, such as USB devices. Verify the node id box visually against two adjacent nodes.

9) Once all nodes have been repaired and replaced, verify that the nodes are not red on the orbit-lab.org/wiki/Staus page. That is, that the CM reports back to the CMC correctly. Note exceptions in the wiki page.

10) Comment out lines for these nodes from dhcp1:/etc/dhcp3/dhcpd.conf. Restart dhcpd on dhcp1.

11) Turn the repaired nodes on. Because they obtain pool addresses from dhcp, they will load an 'inventory' image. Wait five minutes for the inventory image to finish loading, and for the inventory script to run.

12) Run the gendhcpconf script on repository2. Compare its output with the entries you commented out in step 4. Correct dhcp1:/etc/dhcp3/dhcpd.conf if needed.

13) If there is time, verify that imaging works on the repaired nodes.

Last modified 17 years ago Last modified on May 30, 2007, 4:17:05 PM
Note: See TracWiki for help on using the wiki.