[orbit-user] xmax and ymax
Thierry Rakotoarivelo
Thierry.Rakotoarivelo at nicta.com.au
Tue Apr 20 00:28:31 EDT 2010
Hi Giovanni,
Thank you for reporting that issue to us. I had a closer look at this and confirm that this is a short coming of the current stable release 5.2. First let me explain why this is happening and then propose a solution on how to fix permanently and temporary fix this.
1) Why is this happening?
- When a new experiment starts with a group of node, the Experiment Controller (= EC, the software launch with the 'omf exec' command) checks if the nodes defined in the experiment are "active" (= they exist in the testbed and are not 'broken').
- To do that, for each node the current EC v5.2 asks the full list of "active" node to the CMC service of the Aggregate Manager (= AM) and test if the defined node is in that list.
- To build that list, the CMC service v5.2 has 2 options:
-- it you are at Winlab, CMC service queries the CM of the defined node to see if it is working,
-- if you have OMF installed without a CM card on the nodes, CMC service will built a mock list with all the node set as "active", and to do that it will take the X_MAX and Y_MAX and generate a list with X_MAX * Y_MAX entries.
- In your case, I guess it is the 2nd option , thus the list for you would have 770 entries (7*110, right?)
- Therefore your EC needs to test if your defined node is within the 770 entries.
- Then the all process loops again to test for the next defined node in your experiment... So depending on the machines that runs the EC and AM service, their load, and the number of nodes defined in your experiment that might take some time.
- This is clearly a non-optimized way of doing things in the current 5.2 release.
2) One possible long-term solution for the next 5.3 release (mid-year)
- have the EC issue a specific query to the CMC service for checking if a unique node is active. Thus the EC will not have to test the inclusion of X in a list of N entries)
- have the the CMC service (in the 2nd case, when no CM card is installed on the node) return not a list, but the specific reply to the EC's query, i.e. node is active or not (this could be built out of the Inventory database information). Thus the AM will not have to build a N-entry list for each query.
- I have created a new Ticket in our tracking system for this issue: http://mytestbed.net/issues/show/271
Incidentally, we have already replaced the [X,Y] way to address a node to a flat naming scheme in the currently-being-developed 5.3 code (see http://mytestbed.net/wiki/omf/Changelog_53). The coordinate information of the nodes will still be available to the user, so she/he knows where the nodes are physically. But addressing them in the experiment will be done through human readable names.
3) One possible short-term solution for your specific case
So in the meantime (while waiting for 5.3), you could patch the EC code if you really need to have fast executing experiment. To do that:
- assuming that you have deployed OMF in a setup where the nodes do not have a CM card,
- you could patch the EC code to bypass that test for active nodes
- edit "/usr/share/omf-expctl-5.2/omf-expctl/cmc.rb"
- change the definition of the "CMC.nodeActive?(x, y)" method (line 176), to always return "true"
- the new method should look like:
def CMC.nodeActive?(x, y)
return true
# Check if EC is running in 'Just Print' or 'Slave mode'
#if ( NodeHandler.JUST_PRINT || NodeHandler.SLAVE_MODE )
## Yes - Then always say that a node is active!
# return true
#end
#if (@@activeNodes == nil)
# CMC.getAllActiveNodes
#end
#@@activeNodes.has_key?([x,y])
end
Regards,
Thierry.
--
On 20/04/2010, at 3:43 AM, Giovanni Di Stasi wrote:
> Ivan Seskar wrote:
>> Hi Giovanni,
>>
>> I thought you were using the group but had to ask just in case. I am not sure if we ever tried it with large non-contiguous range of nodes (we have a small non-contiguous range in outdoor.orbit-lab.org but nobody complained probably because it is to small - it covers single 1..255 range). What is the range of addresses that you are dealing with?
>>
>> Also, just so that we don't lose track, can you also please add a ticket at http://mytestbed.net/projects/omf/issues .
>>
>> Thanks,
>>
>> Ivan.
>>
>>
> In my case xmax is 7 and ymax is 110. I get the long delay when
> executing the simple HelloWorld experiment, where I define two groups
> with a node each and request a single Udp flow between them.
>
> The experiment gets stuck for quite a while (some minutes), while the
> cpu is at about 100%. Are you able to reproduce it? It should suffice to
> change xmax and ymax to high values (like mine) and see what happens.
>
> Regards,
>
>
>> -----Original Message-----
>> From: Giovanni Di Stasi [mailto:gdistasi at gmail.com]
>> Sent: Monday, April 19, 2010 7:51 AM
>> To: Ivan Seskar
>> Cc: Roberto Bifulco; Max Ott; Thierry Rakotoarivelo
>> Subject: Re: [orbit-user] xmax and ymax
>>
>> Hi Ivan,
>>
>> we are just using defGroup(s) in the experiment, so we must be referring
>> just to the nodes we are using (no ranges or topology used).
>>
>> Regarding the version, we are using the version 5.2 (installing from the
>> deb packages of the stable version).
>>
>> Giovanni
>>
>> Ivan Seskar wrote:
>>
>>> Hi Giovanni,
>>>
>>> Yes you are right - both inventory and cmc services at the moment do assume it is an array [1..xmax.1..ymax] and thus their performance does depend on the range. Having said that, if your experiment group (or topology) only includes list of nodes that are available, it shouldn't be that much dependent on the range (and this applies to imaging as well e.g. if you give it and argument that is the list of nodes rather than the range). BTW, what version of omf are we talking about?
>>>
>>> Ivan.
>>>
>>> PS: The inventory database does enable us to flatten the architecture but I guess we do need to carefully go through all the services and remove the dependence on xmax and ymax ...
>>>
>>>
>>> -----Original Message-----
>>> From: orbit-user-bounces at orbit-lab.org [mailto:orbit-user-bounces at orbit-lab.org] On Behalf Of Giovanni Di Stasi
>>> Sent: Monday, April 19, 2010 5:57 AM
>>> To: orbit-user at orbit-lab.org
>>> Cc: Roberto Bifulco
>>> Subject: [orbit-user] xmax and ymax
>>>
>>> Hi everybody,
>>>
>>> I've noticed that when xmax and ymax (in the testbeds table) are set to high values the execution of the experiment gets a lot longer. It seems (I'm not 100% sure, because I checked this a long time ago), that the status of all the nodes (i.e. nodes from [1,1] to [xmax,ymax]) is checked, even if the nodes actually installed are just a few.
>>>
>>> Unfortunately, we are forced to use those high values for xmax and ymax because the ip addresses of nodes are an external input (therefore the coordinates of the nodes cannot start from [1,1]).
>>>
>>> Do you have a quick fix for this or some suggestions?
>>>
>>> Thanks,
>>> Giovanni
>>> _______________________________________________
>>> orbit-user mailing list
>>> orbit-user at orbit-lab.org
>>> http://orbit-lab.org/cgi-bin/mailman/listinfo/orbit-user
>>> to unsubscribe login to the orbit webpage and the "Preferences" option will appear just above the top menu bar on Orbit web page, choose "Account" and set your mailing list membership to "none".
>>>
>>>
>>>
>>
>>
>>
>
The information in this e-mail may be confidential and subject to legal professional privilege and/or copyright. National ICT Australia Limited accepts no liability for any damage caused by this email or its attachments.
More information about the orbit-user
mailing list