Changes between Initial Version and Version 1 of Internal/Repair


Ignore:
Timestamp:
May 22, 2007, 4:12:51 PM (17 years ago)
Author:
Joseph F. Miklojcik III
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • Internal/Repair

    v1 v1  
     11) Clear some space in the node repair area.  Obtain a Philips-head
     2screwdriver and a bin for garbage.  Open a web browser, and ssh
     3sessions to dhcp1.orbit-lab.org and repository2.orbit-lab.org
     4(probably through gw.orbit-lab.org) on a network connected computer.
     5
     62) Make a page in the orbit-lab.org wiki with a name matching the
     7template Internal/RepairYYYYMMDD (Internal/Repair20070520 for
     8example).  Write the current time and whoever is helping do the
     9repairs on this wiki page.
     10
     113) Determine the set of nodes you are going to replace.  These will be
     12any nodes marked as red on orbit-lab.org/wiki/Status, or nodes in
     13which the CM cannot reliably power up the node.  Do not repair more
     14than ten at a time.  Write the coordinates of these nodes down in the
     15wiki page for the repair.  Note which of those node positions are
     16supposed to have Atheros and which are supposed to have Intel.  It
     17simplifies things if you can do all Atheros or all Intel nodes in a
     18particular round of repairs.
     19
     204) Comment out lines for these nodes from dhcp1:/etc/dhcp3/dhcpd.conf.
     21Restart dhcpd on dhcp1.
     22
     235) For each node to be repaired, remove each node from its mounting in
     24the grid, leaving the node id box attached.  As you remove nodes, take
     25them and their node id box back to the node repair area.  One or two
     26other people can work on nodes in the node repair area while one
     27person moves nodes back and forth from the grid.  Note any exceptional
     28hardware or incorrectly installed connections on the wiki page.
     29
     306) Once in the node repair area, remove the node id box and then the
     31yellow node enclosure.  Verify that the node id boxes match the list
     32of nodes to be repaired on the wiki page, and that the 802.11 hardware
     33vendor matches what is expected.  Note exceptions on the wiki page.
     34
     357) Replace the power supply.  Take care to put old power supplies in
     36the garbage bin.  If the 802.11 hardware vendor did not match what is
     37expected, correct the hardware.  Replace the enclosure.  Replace the
     38node id box.
     39
     408) Calibrate the node (NYI).
     41
     428) Replace the node in the grid.  Verify the node id box against two
     43adjacent nodes.
     44
     459) Once all nodes have been repaired and replaced, verify that the
     46nodes are not red on the orbit-lab.org/wiki/Staus page.  That is, that
     47the CM reports back to the CMC correctly.
     48
     4910) Turn the repaired nodes on.  Because they obtain pool addresses
     50from dhcp, they will load an 'inventory' image (NYI).  Wait five
     51minutes for the inventory image to finish loading.  Then, command the
     52CMC to run the inventory command on each node.
     53
     5411) Run the gendhcpconf script on repository2.  Compare its output
     55with the entries you commented out in step 4.  Correct
     56dhcp1:/etc/dhcp3/dhcpd.conf if needed.
     57
     5812) During the following maintenance slot, verify that you can image
     59all nodes that have been repaired since the last maintenance slot by
     60running the CM stress experiment (NYI).