[orbit-user] OMF5.2 load image failure in outdoor orbit nodes
Christoph Dwertmann
lists.cd at gmail.com
Tue Apr 27 22:20:31 EDT 2010
Hi Tong!
I just tried to reproduce the error message you saw. I SSH'd into
outdoor.orbit-lab.org and ran:
cdw at console:~$ omf-5.2 load [1,102] tongjin_all.ndz
Imaging nodes: '[1,102]' with image 'tongjin_all.ndz'
(Domain: default from hostname)
(Timeout: 800 sec.)
INFO NodeHandler: init OMF Experiment Controller 5.2.408
INFO NodeHandler: init Experiment ID: outdoor.orbit-lab.org_2010_04_27_22_11_14
INFO NodeHandler: Web interface available at: http://10.40.0.10:4000
INFO Experiment: load system:exp:stdlib
INFO property.resetDelay: value = 210 (Fixnum)
INFO property.resetTries: value = 1 (Fixnum)
INFO Experiment: load system:exp:imageNode
INFO property.nodes: value = [1, 102] (Array)
INFO property.image: value = "tongjin_all.ndz" (String)
INFO property.domain: value = nil (NilClass)
INFO property.outpath: value = "/tmp" (String)
INFO property.timeout: value = 800 (Fixnum)
INFO stdlib: Waiting for nodes (Up/Down/Total): 0/1/1 - (still down: n_1_102)
INFO stdlib: Waiting for nodes (Up/Down/Total): 0/1/1 - (still down: n_1_102)
INFO stdlib: Waiting for nodes (Up/Down/Total): 0/1/1 - (still down: n_1_102)
INFO stdlib: Waiting for nodes (Up/Down/Total): 0/1/1 - (still down: n_1_102)
INFO stdlib: Waiting for nodes (Up/Down/Total): 0/1/1 - (still down: n_1_102)
INFO stdlib: Waiting for nodes (Up/Down/Total): 0/1/1 - (still down: n_1_102)
INFO exp: Progress(0/0/1): 0/0/0 min(n_1_102)/avg/max (110) - Timeout: 690 sec.
INFO whenAll: *: 'status[@value='UP']' fires
INFO exp: Progress(0/0/1): 0/0/0 min(n_1_102)/avg/max (110) - Timeout: 680 sec.
INFO exp: Progress(0/0/1): 0/0/0 min(n_1_102)/avg/max (110) - Timeout: 670 sec.
INFO exp: Progress(0/0/1): 0/0/0 min(n_1_102)/avg/max (110) - Timeout: 660 sec.
INFO exp: Progress(0/0/1): 0/0/0 min(n_1_102)/avg/max (110) - Timeout: 650 sec.
INFO exp: Progress(0/0/1): 0/0/0 min(n_1_102)/avg/max (110) - Timeout: 640 sec.
INFO exp: Progress(0/0/1): 10/10/10 min(n_1_102)/avg/max (110) -
Timeout: 630 sec.
INFO exp: Progress(0/0/1): 20/20/20 min(n_1_102)/avg/max (110) -
Timeout: 620 sec.
INFO exp: Progress(0/0/1): 30/30/30 min(n_1_102)/avg/max (110) -
Timeout: 610 sec.
INFO exp: Progress(0/0/1): 30/30/30 min(n_1_102)/avg/max (110) -
Timeout: 600 sec.
INFO exp: Progress(0/0/1): 40/40/40 min(n_1_102)/avg/max (110) -
Timeout: 590 sec.
INFO exp: Progress(0/0/1): 50/50/50 min(n_1_102)/avg/max (110) -
Timeout: 580 sec.
INFO exp: Progress(0/0/1): 60/60/60 min(n_1_102)/avg/max (110) -
Timeout: 570 sec.
INFO exp: Progress(0/0/1): 60/60/60 min(n_1_102)/avg/max (110) -
Timeout: 560 sec.
INFO exp: Progress(0/0/1): 80/80/80 min(n_1_102)/avg/max (110) -
Timeout: 550 sec.
INFO exp: Progress(1/0/1): 100/100/100 min()/avg/max (110) - Timeout: 540 sec.
INFO exp: -----------------------------
INFO exp: Imaging Process Done
INFO exp: - 1 node(s) successfully imaged - See the topology file:
'/tmp/outdoor.orbit-lab.org_2010_04_27_22_11_14_topo_active.rb'
INFO exp: -----------------------------
INFO Experiment: DONE!
INFO NodeHandler: Shutting down experiment, please wait...
INFO NodeHandler: Shutdown flag is set - Turning Off the resources
INFO run: Experiment outdoor.orbit-lab.org_2010_04_27_22_11_14
finished after 4:22
Is this the command you ran? Can you please give more details under
which circumstances you encountered the error? Can you also please
open a second SSH session to repository2 and run a "tail -f
/var/log/omf-aggmgr-5.2.log" there and post the messages you see there
while you encouter the ServiceException?
Thank you!
Kind regards,
Christoph Dwertmann
On Fri, Apr 23, 2010 at 11:40 AM, Tong Jin <tjin at eden.rutgers.edu> wrote:
> Hi,
> I tried to load images on ourdoor orbit nodes using the command "omf-5.2
> load". But it doesn't work all the time. Could anyone check that please?
> I put the failure information here, and hope it helps.
> Thanks.
>
> Imaging nodes: '[1,102]' with image 'tongjin_all.ndz'
> (Domain: default from hostname)
> (Timeout: 800 sec.)
> INFO NodeHandler: init OMF Experiment Controller 5.2.388
> INFO NodeHandler: init Experiment ID:
> outdoor.orbit-lab.org_2010_04_20_16_46_34
> INFO NodeHandler: Web interface available at: http://10.40.0.10:4000
> INFO Experiment: load system:exp:stdlib
> INFO property.resetDelay: value = 210 (Fixnum)
> INFO property.resetTries: value = 1 (Fixnum)
> INFO Experiment: load system:exp:imageNode
> INFO property.nodes: value = [1, 102] (Array)
> INFO property.image: value = "tongjin_all.ndz" (String)
> INFO property.domain: value = nil (NilClass)
> INFO property.outpath: value = "/tmp" (String)
> INFO property.timeout: value = 800 (Fixnum)
> FATAL service_call: Exception: ServiceException
> (http://repository2:5052/pxe/setBootImageNS?domain=outdoor.orbit-lab.org&ns=[[1,102]])
> INFO NodeHandler: Shutdown flag is set - Turning Off the resources
> FATAL service_call: Exception: ServiceException
> (http://repository2:5052/pxe/clearBootImageNS?domain=outdoor.orbit-lab.org&ns=[[1,102]])
> /usr/share/omf-expctl-5.2/omf-expctl/nodeHandler.rb:278:in `service_call':
> ServiceException (ServiceException)
> from /usr/share/omf-expctl-5.2/omf-expctl/node/nodeSet.rb:510:in
> `setPxeEnvMulti'
> from /usr/share/omf-expctl-5.2/omf-expctl/node/nodeSet.rb:475:in
> `pxeImage'
> from
> /usr/share/omf-expctl-5.2/omf-expctl/node/rootNodeSetPath.rb:85:in`pxeImage'
> from /usr/share/omf-expctl-5.2/omf-expctl/nodeHandler.rb:748:in
> `shutdown'
> from /usr/share/omf-expctl-5.2/omf-expctl.rb:71
>
> Tong
>
> _______________________________________________
> orbit-user mailing list
> orbit-user at orbit-lab.org
> http://orbit-lab.org/cgi-bin/mailman/listinfo/orbit-user
> to unsubscribe login to the orbit webpage and the "Preferences" option will
> appear just above the top menu bar on Orbit web page, choose "Account" and
> set your mailing list membership to "none".
>
More information about the orbit-user
mailing list