wiki:Internal/Operations

Version 10 (modified by faiyaz, 18 years ago) ( diff )

Operations

Hardware

System Overview

The ORBIT Testbed consists of 416 nodes, 26 Servers, and 45 ethernet switches. Nodes, servers, and switches are grouped into ORBIT resources which are referred to as "grid", and "sb1" through "sb8". The grid consists of the 400 nodes, a server that acts as a console, and 30 switches that are seperated into control, data, and CM networks. The eight sandboxes consist of 2 nodes, a console server, and a switch which aggregates all three networks.

Networking

Each resource is connected to the ORBIT back-end via the control, data, and CM networks. Each network of each resource is a seperate subnet following RFC 1981 and all route back to a Cisco PIX 515E Firewall apliance. Each subnet is connected to individual DMZ interfaces on the firewall and, therefore, has a set of security rules governing all traffic to and from each network. The firewall is configured to allow traffic from the external login machines to the ORBIT resources. Traffic generated on one resource will be blocked at the firewall if it's destination is in another resource. The purpose of this is the logical seperation of control planes for each resource; one user's experiment cannot interfere with that of another. Similarly, since all resources share the same backend, the firewall will not allow an experimenter direct access to any of the backend servers or its services.

The Control network is comprised of 10 discrete switches on the grid, and shared switches on the sandboxes. Its purpose is to allow remote access to the nodes via ssh as well as provide a back channel for nodehandler communication and measurments collection.

Each resource shares the same ORBIT back-end which consists of 17 servers connected via a series of gigabit ethernet switches. The back-end servers run a variety of services ranging from industry standard services, such as DNS and DHCP, to ORBIT specific services.

Software

LDAP

Access control to each resource is done via OpenLDAP. Each user is represented by an entry in the LDAP database with a set of attributes corresponding to the user's experiment group name, resource reservations, and email address. ORBIT services can use the information in this database to notify the user of scheduling conflicts, grant access to a resource for a requested time slot, and allow other users in his/her experiment group access to the same resources.

When a user requests a timeslot on a resource, the user accesses the ORBIT schedule webpage and selects slots. Each slot, by default, remains in the pending state until an administrator approves the request. To alleviate the human aspect of approving slots, an auto approver approves pending slots 3 minutes before the start time. Upon auto or manual approval, the schedule page generates and sends an email to the user's email address specified in the LDAP database informing the user of the state change. During the start of the slot, the auto approval service modify's the user's entry in LDAP to allow access to the approved resource. Once access is granted in LDAP, the console of the resource will detect the new entry in the user's LDAP profile and allow them access to that resource.

During a user's slot, experiment data can be collected via the ORBIT Measurement Library. Overall 802.11 packet traces are collected via the ARUBA Monitoring service. Both of these mechanisms for capturing experiment data aggregate data from multiple experiment nodes via multicast back channels. Once captured, they are archived in a series of MySQL Databases. During experiment runtime, the collection services insert relavant data into high performance Internal Database server (IDB). Once an experiment has completed, the OML database is moved off of IDB onto an External Database server (EDB) for presistent storage via the ORBIT Collection Service. Users can then export their data from EDB or manipulate it directly without requesting a timeslot on a resource.

DHCP/PXE

Note: See TracWiki for help on using the wiki.