How to Submit Jobs to the N1 Grid Engine | |
The documentation for this software can be found in /opt/n1ge6/doc/ on any host. It will be extremely helpful if you at least skim through the Users Manual. The specific configuration for our cluster is given here as well as a few simple examples of job submissions and some basic instructions on how to monitor your jobs.
Overview qsub submit_script.shAfter a job is submitted the grid engine decides the best order to run it in given the status of the cluster and the runtime defined by you. (hint: your job will be scheduled more efficiently the more accurately you can estimate your program's run time). It then decides which queue to place the job in if there is not a queue explicitly provided. There are currently three queues configured on courtney: singlq, amdq, and pedgeq. The singlq queue contains all the single processor machines while the amdq queue contains the dual processor AMD machines. The single processor machines have 1GB of memory each and the dual processor machines have 2GB of memory each. The pedgeq queue contains the other Dell poweredge server identical to the master node with 2 processors and 2GB of memory. You do not have to specify a queue explicity for your job to run. The time limit for any job running in this queue is 30 days. If you need to run a job longer than 30 days, then you will need to rethink the structure of your code. Download this submit script to see the syntax of a submit script. If you wish to run an MPI program, download this MPI submit script.
A word on shells Warning: no access to tty; thus no job control in this shell....There is a rogue stty command in a login file somwhere causing this error and until it is found this annoyance will remain. This is nothing major, in most cases it will merely result in an unwanted string showing up in every output file of your jobs. Look at the serial script above to see the syntax for using CSH. BASH on the other hand will not give you this error. Using BASH, however, will result in a shell without a $PATH defined. This means any command called from a BASH script will need to provide the entire path to the executable (eg. /usr/local/mpich2-1.0.3/bin/mpiexec). The scripts above circumvent this problem by manually sourcing the global definitions file /etc/bashrc. This works just as well and you don't have to worry about any annoying errors popping up in your output file. Interactive logins If you need to run FORTRAN programs that require interactive data entry, you can make a separate file containing the data you want entered into your program and direct it into your executable. For instance, if you need to read the value nt (READ*,nt) place your value for nt into a data file (input.dat); then your submit script might look something like #!/bin/csh #$ -l h_rt=0:31:00 cd ~/run-dir/ ./exec < input.datFORTRAN will then read everything from input.dat in the order you specified your READ*, statements. If this is still not a viable solution, you may request an interactive login via qssh. This will schedule a job just like any other job except once running, you will be logged in automatically to another node on the cluster determined to have free resources. Since courtney is configured to use ssh instead of rsh, there is a lack of complete accounting; meaning the online usage of the jobs is not collected, only the wallclock time. Thus, unfortunately, an interactive login job will be dropped from the master scheduler after 1 minute and any other jobs submitted could be assigned to the node you are logged into. Also, there is a potential loss of control: in some unusual cases, killing the job might leave behind idle or busy processes, even if the job is seen as finished by Grid Engine. This is why interactive logins are not encouraged, but they are available if absolutely necessary. qssh takes the same options as qlogin. man qssh for more details.
Monitoring your jobs
qstat - Show the status of Grid Engine jobs and queues. This will show which jobs are currently running and which jobs are in the queue waiting area. To see the status of every node explicitly use qstat -f and to see the details of all jobs type qstat -r. Type man qstat to see the multitude of other options available for this command.
Job Deletion
Graphical Interface |