wiki:Internal/Operations

Version 4 (modified by faiyaz, 18 years ago) ( diff )

Operations

Hardware

System Overview

The ORBIT Testbed consists of 416 nodes, 26 Servers, and 45 ethernet switches. Nodes, servers, and switches are grouped into ORBIT resources which are referred to as "grid", and "sb1" through "sb8". The grid consists of the 400 nodes, a server that acts as a console, and 30 switches that are seperated into control, data, and CM networks. The eight sandboxes consist of 2 nodes, a console server, and a switch which aggregates all three networks.

Each resource is connected to the ORBIT back-end via the control, data, and CM networks. Each network of each resource is a seperate subnet following RFC 1981 and all route back to a Cisco PIX 515E Firewall apliance. Each subnet is connected to individual DMZ interfaces on the firewall and, therefore, has a set of security rules governing all traffic to and from each network. The firewall will allow traffic

The Control network is comprised of 10 discrete switches on the grid, and shared switches on the sandboxes. Its purpose is to allow remote access to the nodes via ssh as well as provide a back channel for nodehandler communication and measurments collection.

Each resource shares the same ORBIT back-end which consists of 17 servers connected via a series of gigabit ethernet switches. The back-end servers run a variety of services ranging from industry standard services, such as DNS and DHCP, to ORBIT specific services.

Software

Access control to each resource is done via OpenLDAP. Each user is represented by an entry in the LDAP database with a set of attributes corresponding to the user's experiment group name, resource reservations, and email address. ORBIT services can use the information in this database to notify the user of scheduling conflicts, grant access to a resource for a requested time slot, and allow other users in his/her experiment group access to the same resources.

When a user requests a timeslot on a resource, the user accesses the ORBIT schedule webpage and selects slots. Each slot, by default, remains in the pending state until an administrator approves the request. To alleviate the human aspect of approving slots, an auto approver approves pending slots 3 minutes before the start time. Upon auto or manual approval, the schedule page generates and sends an email to the user's email address specified in the LDAP database informing the user of the state change. During the start of the slot, the auto approval service modify's the user's entry in LDAP to allow access to the approved resource. Once access is granted in LDAP, the console of the resource will detect the new entry in the user's LDAP profile and allow them access to that resource.

Note: See TracWiki for help on using the wiki.