How to Submit Jobs to the N1 Grid Engine

This scheduler was installed to help make courtney more efficient and allow fair use of all the computers. ANY program (not just MPI programs) you run on the cluster should be submitted to this scheduler. Jobs not submitted to the scheduler will not be kept track of and other jobs will most likely be scheduled concurrently with your own causing both jobs to run more slowly.

The documentation for this software can be found in /opt/n1ge6/doc/ on any host. It will be extremely helpful if you at least skim through the Users Manual. The specific configuration for our cluster is given here as well as a few simple examples of job submissions and some basic instructions on how to monitor your jobs.

Overview
Janelle is the master host, but every computer in the cluster has been configured as a submit host, meaning you can submit a job from any node on the cluster. To submit a job use the command qsub. Interactive logins are not available, but short jobs may be run directly on the master node itself. There are also other ways around this (see below). The basic syntax for submitting a job via the command line is

qsub submit_script.sh
After a job is submitted the grid engine decides the best order to run it in given the status of the cluster and the runtime defined by you. (hint: your job will be scheduled more efficiently the more accurately you can estimate your program's run time). It then decides which queue to place the job in if there is not a queue explicitly provided. There are currently three queues configured on courtney: singlq, amdq, and pedgeq. The singlq queue contains all the single processor machines while the amdq queue contains the dual processor AMD machines. The single processor machines have 1GB of memory each and the dual processor machines have 2GB of memory each. The pedgeq queue contains the other Dell poweredge server identical to the master node with 2 processors and 2GB of memory.

You do not have to specify a queue explicity for your job to run. The time limit for any job running in this queue is 30 days. If you need to run a job longer than 30 days, then you will need to rethink the structure of your code. Download this submit script to see the syntax of a submit script. If you wish to run an MPI program, download this MPI submit script.

A word on shells
There are two shells you should be aware of, Bourne-Again SHell (/bin/bash or /bin/sh) and the C shell (/bin/tcsh or /bin/csh). The serial submit script provided above uses CSH and the MPI submit script above use BASH. You may use either shell for any job. There is a small problem with CSH right now which gives an output

Warning: no access to tty; thus no job control in this shell....
There is a rogue stty command in a login file somwhere causing this error and until it is found this annoyance will remain. This is nothing major, in most cases it will merely result in an unwanted string showing up in every output file of your jobs. Look at the serial script above to see the syntax for using CSH.
BASH on the other hand will not give you this error. Using BASH, however, will result in a shell without a $PATH defined. This means any command called from a BASH script will need to provide the entire path to the executable (eg. /usr/local/mpich2-1.0.3/bin/mpiexec). The scripts above circumvent this problem by manually sourcing the global definitions file /etc/bashrc. This works just as well and you don't have to worry about any annoying errors popping up in your output file.

Interactive logins
If you need to run FORTRAN programs that require interactive data entry, you can make a separate file containing the data you want entered into your program and direct it into your executable. For instance, if you need to read the value nt (READ*,nt) place your value for nt into a data file (input.dat); then your submit script might look something like

#!/bin/csh
#$ -l h_rt=0:31:00
cd ~/run-dir/
./exec < input.dat
FORTRAN will then read everything from input.dat in the order you specified your READ*, statements.

If this is still not a viable solution, you may request an interactive login via qssh. This will schedule a job just like any other job except once running, you will be logged in automatically to another node on the cluster determined to have free resources. Since courtney is configured to use ssh instead of rsh, there is a lack of complete accounting; meaning the online usage of the jobs is not collected, only the wallclock time. Thus, unfortunately, an interactive login job will be dropped from the master scheduler after 1 minute and any other jobs submitted could be assigned to the node you are logged into. Also, there is a potential loss of control: in some unusual cases, killing the job might leave behind idle or busy processes, even if the job is seen as finished by Grid Engine. This is why interactive logins are not encouraged, but they are available if absolutely necessary.

qssh takes the same options as qlogin. man qssh for more details.

Monitoring your jobs
qstat - Show the status of Grid Engine jobs and queues. This will show which jobs are currently running and which jobs are in the queue waiting area. To see the status of every node explicitly use qstat -f and to see the details of all jobs type qstat -r. Type man qstat to see the multitude of other options available for this command.
qhost - Show the status of Grid Engine hosts, queues, jobs. This will give you a nice status report of all the nodes and the load they are carrying even if a job was not submitting via N1GE6.
email - unfortunately the emailing system for notifying you of errors or job completions has not been configured yet.

Job Deletion
Type qstat to find your Job ID number, then qdel jobid.

Graphical Interface
Using submit scripts and running them from the command line is probably the quickest way to submit any job. If you find yourself frustrated, however, you may use the graphical interface provided by N1GE6. You will need an X-server: cygwin has an excellent one, just make sure you click on xorg-x11-base when you install. Then simply run qmon & from the command line to start this graphical front end. You may submit jobs and just about anything else with this tool. See the documentation for more details.