Softpanorama
May the source be with you, but remember the KISS principle ;-)

Contents Bulletin Scripting in shell and Perl Network troubleshooting History Humor

qsub -- Submitting Job To Queue Instance

News SGE Commands Recommended Links SGE cheat sheet Reference Viewing SGE Job Output SGE Troubleshooting
SGE Submit Scripts Submitting binaries in SGE Some interesting queue params queue_conf - Grid Engine queue configuration file format slots queue attribute Monitoring and Controlling Jobs Monitoring Queues
qalter qstat qhold qping qacct qmod qdel
qhost qrsh Execution hosts Excluding SGE execution host from scheduling Restricting number of slots per server Slot hacking SGE Array Jobs
Creating and modifying SGE Queues Getting information about hosts Job Post Mortem  Job or Queue Reported in Error State E Starting and Killing Daemons Queue instance in AU state ulimit problem with infiniband in SGE
SGE hostgroups Creating and modifying SGE Queues SGE Execution Host Installation SGE Parallel Environment Tips Humor Etc

Introduction

In the clusters there is "batch scheduler" which typically in installed on the headnode. In case the batch scheduler is SGE the submit command is qsub.  

Once a job has been received by the batch server, the scheduler decides the placement and notifies the batch server which in turn notifies qsub whether the job can be run or not. The current status (whether the job was successfully scheduled or not) is then returned to the user. You may use a command file or STDIN as input for qsub.

A job in SGE represents a task to be performed on a node (or multiple nodes) in the cluster and contains the command line used to start the task. A job may have specific resource requirements but in general should be agnostic to which node in the cluster it runs on as long as its resource requirements are met.

Note:

All jobs require at least one available slot on a node in the cluster to run. SGE does not deal with fractional slots.

Here is a simple example of the qsub command which launch a simple job that runs the hostname command on a given cluster node. You can't submit  jobs unless your UID is more the 100. That excludes submission jobs as root. Note that in example below  sgeadmin account is used.

sgeadmin@master:~$ qsub -o /Apps/myputput.txt -b y -cwd -q all.q@b1 hostname
Your job 1 ("hostname") has been submitted

Notice that the qsub command, when successful, will print the job number to stdout. You can use the job number to monitor the jobís status and progress within the queue as weíll see in the next section.

Default job options and  $HOME/.sge_request file

If you always include the same options with your job (e.g. the email notifications above) you can include these automatically. Typically at least two option are specified:

To accomplish this you need to create the file '$HOME/.sge_request', and include the options you would include on the command link in there, one option per line. For example:

# mail me when the job starts & ends
 -M dummy@example.com
 -m be

# pass through some environment variables
 -v PYTHONPATH

# pass through ALL environment variables
# -V
# use multiple cores
# NOTE: Divide h_vmem by the number of cores you request (e.g. 2)
 -pe smp 2

Notes:

Output specification

There are following three options control output stream

Sun Grid Engine creates stdout and stderr files in the user home directory, unless -e and -o options are specified. If any additional files are created during a jobís execution, they will be put in the jobís working directory unless explicitly saved elsewhere.

The jobís stdout and stderr files are named after the job with the extension ending in the jobís number.

For the simple job submitted with the command

sgeadmin@master:~$ ls hostname.*
hostname.e1 hostname.o1
sgeadmin@master:~$ cat hostname.o1
b1
sgeadmin@master:~$ cat hostname.e1
sgeadmin@master:~$

Notice that Sun Grid Engine automatically named the job hostname and created two output files: hostname.e1 and hostname.o1. The e stands for stderr and the o for stdout. The 1 at the end of the filesí extension is the job number. So if the job had been named my_new_job and was job #23 submitted, the output files would look like:

my_new_job.e23 my_new_job.o23

Host Specification

If  queue is not specified it is submitted to the default queue (typically all.q ) A job can be submitted to a particular queue without any host(s) specification, or to a selected host (queue instance), or to a selected host group (queue domain). Note that this is opposites order in comparison to DNS specification: here query (domain) goes before host.

  qsub -q queue_name  job
  qsub -q queue_name@host_name  job
  qsub -q queue_name@@hostgroup_name  job

One can use regular expressions (single quoted) to specify the hosts. For example, to request any host in the given group regardless of the queue name:

  qsub -q '*@@hostgroup_name'  job


Specifying resource with -l option

Option -l accepts resource=value pairs separated by comma 

It allows to launch the job in a Sun Grid Engine queue  meeting  the  given  resource  request  list.

This option is available for qsub, qsh, qrsh, qlogin and qalter only. There may be multiple -l switches in a single  command. You  may request multiple -l options to be soft or hard  both in the same command line. In case of a serial  job  multiple  -l  switches  refine  the  definition for the sought queue.

Among other things you can specidy hostname using this option. For example

qsub -l hostname='p6.hpc.firma.com' ...

complex(5) describes the list  available  resources  and  their  associated  valid  value  specifiers can be  obtained.

DESCRIPTION

Complex reflects the format of the Sun Grid Engine complex configuration. The definition of complex attributes pro- vides all pertinent information concerning the resource attributes a user may request for a Sun Grid Engine job via the qsub(1) -l option and for the interpretation of these parameters within the Sun Grid Engine system.

The Sun Grid Engine complex object defines all entries which are used for configuring the global, the host, and queue object. The system has a set of pre defined entries, which are assigned to a host or queue per default. In a addition can the user define new entries and assign them to one or multiple objects. Each load value has to have its corresponding complex entry object, which defines the type and the relational operator for it.

defining resource attributes

The complex configuration should not be accessed directly. In order to add or modify complex entries, the qconf(1) options -Mc and -mc should be used instead. While the -Mc option takes a complex configuration file as an argument and overrides the current configuration, the -mc option bring up an editor filled in with the current complex configuration.

The provided list contains all definitions of resource attributes in the system. Adding a new entry means to pro- vide: name, shortcut, type, relop, requestable, consumable, default, and urgency. The fields are described below. Chang- ing one is easily done by updating the field to change and removing an entry by deleting its definition. An attribute can only be removed, when it is not referenced in a host or queue object anymore. Also does the system have a set of default resource attributes which are always attached to a host or queue. They cannot be deleted nor can the type of such an attribute be changed.

working with resource attributes Before a user can request a resource attribute it has to be attached to the global, host, or cqueue object. The resource attribute exists only for the objects, it got attached to ( if it is attached to the global object(qconf -me global), it exits system wide, host object: only on that host (qconf -me NAME): cqueue object: only on that cqueue (qconf -mq NAME)).

When the user attached a resource attribute to an object, one also has to assign a value to it; the resource limit. Another way to get a resource attribute value is done by configuring a load sensor for that attribute.

Default queue resource attributes In its default form it contains a selection of parameters in the queue configuration as defined in queue_conf(5). The queue configuration parameters being requestable for a job by the user in principal are:

          qname
          hostname
          notify
          calendar
          min_cpu_interval
          tmpdir
          seq_no
          s_rt
          h_rt
          s_cpu
          h_cpu
          s_data
          h_data
          s_stack
          h_stack
          s_core
          h_core
          s_rss
          h_rss
Default host resource attributes The standard set of host related attributes consists of two categories. he first category is built by several queue con- figuration attributes which are particularly suitable to be managed on a host basis. These attributes are:
          slots
          s_vmem
          h_vmem
          s_fsize
          h_fsize
(please refer to queue_conf(5) for details).

Note: Defining these attributes in the host complex is no contradiction to having them also in the queue configura- tion. It allows maintaining the corresponding resources on a host level and at the same time on a queue level. Total vir- tual free memory (h_vmem) can be managed for a host, for example, and a subset of the total amount can be associated with a queue on that host.

The second attribute category in the standard host complex are the default load values Every sge_execd(8) periodically reports load to sge_qmaster(8). The reported load values are either the standard Sun Grid Engine load values such as the CPU load average (see uptime(1)) or load values defined by the Sun Grid Engine administration (see the load_sensor parameter in the cluster configuration sge_conf(5) and the Sun Grid Engine Installation and Administration for details). The characteristics definition for the standard load values is part of the default host complex, while administrator defined load values require extension of the host complex. Please refer to the file /doc/load_parameters.asc for detailed information on the standard set of load values.

Overriding attributes

One attribute can be assigned to the global object, host object, and queue object at the same time. On the host level it might get its value from the user defined resource limit and a load sensor. In case that the attribute is a consum- able, we have in addition to the resource limit and its load report on host level also the internal usage, which the sys- tem keeps track of. The merge is done as follows:

In general an attribute can be overridden on a lower level

We have one limitation for overriding attributes based on its relational operator:

In the case of a consumable on host level, which has also a load sensor, the system checks for the current usage, and if the internal accounting is more restrictive than the load sensor report, the internal value is kept; if the load sen- sor report is more restrictive, that one is kept.

Note, Sun Grid Engine allows backslashes (\) be used to escape newline (\newline) characters. The backslash and the newline are replaced with a space (" ") character before any interpretation.

SGE pseudo comments

Scripts that are submitted via SGE can have pseudocomments that will be processed by SGE and set options much like option to SGE command itself.  Here is an example of such comments, which start with "#$"  in a simple job script:

$ cat sleep.sh
#!/bin/bash
#
#$ -cwd
#$ -j y
#$ -S /bin/bash
#
date
sleep 10
date
You can put several lines which start with #$. Those are treated by SGE as pseudo comments everything specified in them will be treated as SGE options.

To submit such a wrapper job script, you can use the qsub command without any additional parameters, which is very convenient:

$ qsub sleep.sh
your job 16 ("sleep.sh") has been submitted

For a parallel MPI job script, take a look at this script, linpack.sh. Note that you need to put in two SGE variables, $NSLOTS and $TMP/machines within the job script.

$ cat linpack.sh
#!/bin/bash
#
#$ -cwd
#$ -j y
#$ -S /bin/bash
#
MPI_DIR=/opt/mpich/gnu/
HPL_DIR=/opt/hpl/mpich-hpl/
# OpenMPI part. Uncomment the following code and comment the above code
# to use OpemMPI rather than MPICH
# MPI_DIR=/opt/openmpi/
# HPL_DIR=/opt/hpl/openmpi-hpl/

$MPI_DIR/bin/mpirun -np $NSLOTS -machinefile $TMP/machines \
        $HPL_DIR/bin/xhpl

The command to submit a MPI parallel job script is similar to submitting a serial job script but you will need to use the -pe mpich N. N refers to the number of processes that you want to allocate to the MPI program. Here's an example of submitting a 2 processes linpack program using this HPL.dat file:

$ qsub -pe mpich 2 linpack.sh
your job 17 ("linpack.sh") has been submitted

If you need to delete an already submitted job, you can use qdel given it's job id. Here's an example of deleting a fluent job under SGE:

$ qsub fluent.sh
your job 31 ("fluent.sh") has been submitted
$ qstat
job-ID  prior name       user         state submit/start at     queue      master  ja-task-ID
---------------------------------------------------------------------------------------------
     31     0 fluent.sh  sysadm1      t     12/24/2003 01:10:28 comp-pvfs- MASTER
$ qdel 31
sysadm1 has registered the job 31 for deletion
$ qstat
$

Although the example job scripts are bash scripts, SGE can also accept other types of shell scripts. It is trivial to wrap serial programs into a SGE job script. Similarly, for MPI parallel jobs, you just need to use the correct mpirun launcher and to also add in the two SGE variables, $NSLOTS and $TMP/machines within the job script. For other parallel jobs other than MPI, a Parallel Environment or PE needs to be defined. This is covered in the SGE documentation.

"Unknown option" error message

You can specify qsub command line options within the script on lines beginning with #$. For example:
#$ -S /bin/bash

If #$ is not followed by a valid qsub option, you get the unhelpful message:

qsub: Unknown option

If you get this message, search the script you are submitting for #$ and make sure it is followed by a valid qsub command line option.

It is easy to inadvertently introduce #$ when you comment out a line that begins with $:

#$SOME_COMMAND

There is no qsub command line option SOME_COMMAND, so this is an error.

If you are really must keep your lines beginning with #$, you can specify a different prefix string using the qsub -C command line option.

 

SGE job environment

Inherited Job Environment

Dot files

Job submission default settings files hierarchy:

$SGE_ROOT/$SGE_CELL/common/sge_request
$HOME/.sge_request
$PWD/.sge_request

Checking if submitted Job started executing

The  two command that will be most useful to you are

Host/Node Status: qhost

Node or host status can be obtained by using the qhost command. An example listing is shown below.

HOSTNAME             ARCH       NPROC  LOAD   MEMTOT   MEMUSE   SWAPTO   SWAPUS
-------------------------------------------------------------------------------
global               -              -     -        -        -        -        -
node000              lx24-amd64     2  0.00     3.8G    35.8M      0.0      0.0
node001              lx24-amd64     2  0.00     3.8G    35.2M      0.0      0.0
node002              lx24-amd64     2  0.00     3.8G    35.7M      0.0      0.0
node003              lx24-amd64     2  0.00     3.8G    35.6M      0.0      0.0
node004              lx24-amd64     2  0.00     3.8G    35.7M      0.0      0.0

If job started then for computational tasks load jumps from zero to the number of CPUs.

See qhost For more details

Queue Status: qstat

Queue status for your jobs can be found by issuing a qstat command. An example qstat issued by user deadline is shown below.

job-ID  prior   name       user   state submit/start at   queue  slots ja-task-ID
---------------------------------------------------------------------------------
 304 0.60500 Sleeper4   deadline    r     01/18/2008 17:42:36 cluster@norbert  4 
 307 0.60500 Sleeper4   deadline    r     01/18/2008 17:42:37 cluster@norbert  4 
 310 0.60500 Sleeper4   deadline    qw    01/18/2008 17:42:29                  4 
 313 0.60500 Sleeper4   deadline    qw    01/18/2008 17:42:29                  4 
 316 0.60500 Sleeper4   deadline    qw    01/18/2008 17:42:29                  4 
 321 0.60500 Sleeper4   deadline    qw    01/18/2008 17:42:30                  4 
 325 0.60500 Sleeper4   deadline    qw    01/18/2008 17:42:30                  4 
 308 0.53833 Sleeper2   deadline    qw    01/18/2008 17:42:29                  2 
More detail can be found by using the -f option. An example qstat -f issued by user deadline is shown below.
[deadline@norbert sge-tests]$ qstat -f
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
cluster@node0                  BIP   2/2       0.00     lx26-amd64
    310 0.60500 Sleeper4   deadline     r     01/18/2008 17:43:51     1
    313 0.60500 Sleeper4   deadline     r     01/18/2008 17:43:52     1
----------------------------------------------------------------------------
cluster@node1                  BIP   2/2       0.00     lx26-amd64
    310 0.60500 Sleeper4   deadline     r     01/18/2008 17:43:51     1
    313 0.60500 Sleeper4   deadline     r     01/18/2008 17:43:52     1
----------------------------------------------------------------------------
cluster@node2                  BIP   2/2       0.00     lx26-amd64
    310 0.60500 Sleeper4   deadline     r     01/18/2008 17:43:51     1
    313 0.60500 Sleeper4   deadline     r     01/18/2008 17:43:52     1
----------------------------------------------------------------------------
cluster@norbert                BIP   2/2       0.02     lx26-amd64
    310 0.60500 Sleeper4   deadline     r     01/18/2008 17:43:51     1
    313 0.60500 Sleeper4   deadline     r     01/18/2008 17:43:52     1

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
    316 0.60500 Sleeper4   deadline     qw    01/18/2008 17:42:29     4
    321 0.60500 Sleeper4   deadline     qw    01/18/2008 17:42:30     4
    325 0.60500 Sleeper4   deadline     qw    01/18/2008 17:42:30     4
    308 0.53833 Sleeper2   deadline     qw    01/18/2008 17:42:29     2
To look at jobs for all users, you must issue the following:
qstat -u "*"
For queue details, you may add the -f option as shown above. If you prefer to always see all user jobs, you can use the alias command to make this the default behavior. For bash users add the following to your .bashrc file.
alias qstat='qstat -u "*"'

Even more data information can be obtained by using the -F option (see the qstat for more information. For parallel jobs, the output is not very easy to understand. In the above listing, the stat is either qw (queue waiting), t (transferring), and r (running).

A more convenient queue status package called userstat combines qstat, qhost, and qdel into a simple easy to use "top" like interface. Each will be described below. Additional information on these commands is available by using man command-name

Examples

Example 1

Let's assume we have a script sge-date  that looks like:

#!/bin/bash 
/bin/date 
We could run it using the command:
qsub sge-date
SGE will then run the program, and place two files in your current directory:
sge-date.e# 
sge-date.o#
where # is the job number assigned by SGE.

After job finished the sge-date.e# file contains the output from standard error and the sge-date.o# file contains the output form standard out.  Note, the home directory of the user who is submitting job is not of NFS outpout will be on the node on which job run, not visible on the head node.  See Viewing SGE Job Output

 

The following basic options may be used to submit the job.

-A [account name] -- Specify the account under which to run the job 
-N [name] -- The name of the job 
-l h rt=hr:min:sec -- Maximum walltime for this job 
-r [y,n] -- Should this job be re-runnable (default y) 
-pe [type] [num] -- Request [num] amount of [type] nodes. 
-cwd -- Place the output files (.e,.o) in the current working directory. 
     The default is to place them in the users home directory. 
-S [shell path] -- Specify the shell to use when running the job script
Although it is possible to use command line options and script wrappers to submit jobs, it is usually more convenient to use just a single script to include all options for the job. The next section describes how this is done.

Example 2

The most convenient method to submit a job to SGE is to use a "job script" which contains SGE options as pseudo comments.

SGE pseudo comments allow all options and the program file to placed in the batch file.

The following script will report the node on which it is running, sleep for 60 seconds, then exit. It also reports the start/end date and time as well as sending an email to user when the jobs starts and when the job finishes. Other SGE options are set as well. The example script can be found  here as well.

#!/bin/bash
#
# Usage: sleeper.sh [time]]
#        default for time is 60 seconds

# -- our name ---
#$ -N Sleeper1
#$ -S /bin/sh
# Make sure that the .e and .o file arrive in the
# working directory
#$ -cwd
#Merge the standard out and standard error to one file
#$ -j y
/bin/echo Here I am: `hostname`. Sleeping now at: `date`
/bin/echo Running on host: `hostname`.
/bin/echo In directory: `pwd`
/bin/echo Starting on: `date`
# Send mail at submission and completion of script
#$ -m be
#$ -M deadline@kronos
time=60
if [ $# -ge 1 ]; then
   time=$1
fi
sleep $time

echo Now it is: `date` 
The "#$" is used in the script to indicate an SGE option. If we name the script sleeper1.sh we can submit it to SGE as follows:
qsub sleeper1.sh
The output will be in the file Sleeper1.o#, where # is the job number assigned by SGE. Here is an example output file for the sleeper1.sh script. When submitting MPI or PVM jobs, we will need additional information in the job script. See below.

Inheriting Your Environment

If you want to make sure your current environment variables are used on you SGE jobs, include the following in your submit script:

#$ -V

Recently -V might stopped working because of fix to  Shellshock Bash bug. See

Parallel Submit Scripts

Submitting parallel jobs is very similar to submitting single node jobs (as shown above). A parallel job needs a pe parallel environment assigned to the script. The following is an annotated script for submitting an MPICH job to SGE.
#!/bin/sh
#
# EXAMPLE MPICH SCRIPT FOR SGE
# Modified by Basement Supercomputing 1/2/2006 DJE
# To use, change "MPICH_JOB", "NUMBER_OF_CPUS"
# and "MPICH_PROGRAM_NAME" to real values.
#
# Your job name
#$ -N MPICH_JOB
#
# Use current working directory
#$ -cwd
#
# Join stdout and stderr
#$ -j y
#
# pe request for MPICH. Set your number of processors here.
# Make sure you use the "mpich" parallel environment.
#$ -pe mpich NUMBER_OF_CPUS
#
# Run job through bash shell
#$ -S /bin/bash
#
# The following is for reporting only. It is not really needed
# to run the job. It will show up in your output file.
echo "Got $NSLOTS processors."
echo "Machines:"
cat $TMPDIR/machines
# Adjust MPICH procgroup to ensure smooth shutdown
export MPICH_PROCESS_GROUP=no
#
# Use full pathname to make sure we are using the right mpirun
/opt/mpi/tcp/mpich-gnu3/bin/mpirun -np $NSLOTS -machinefile $TMPDIR/machines MPICH_PROGRAM_NAME

The important option is the -pe line in the submit script. This variable must be set for the MPI environment for which you compiled your program. The following example submit scripts are available in examples directory:

To use SGE with MPI simply copy the appropriate scripts to your working directory, edit the script to fill in the appropriate variables, rename it to reflect your program and use qsub to submit it to SGE.

See also


Top updates

Bulletin Latest Past week Past month
Google Search


NEWS CONTENTS

Old News ;-)

[May 07, 2017] Monitoring and Controlling Jobs

biowiki.org

After submitting your job to Grid Engine you may track its status by using either the qstat command, the GUI interface QMON, or by email.

Monitoring with qstat

The qstat command provides the status of all jobs and queues in the cluster. The most useful options are:

You can refer to the man pages for a complete description of all the options of the qstat command.

Monitoring Jobs by Electronic Mail

Another way to monitor your jobs is to make Grid Engine notify you by email on status of the job.

In your batch script or from the command line use the -m option to request that an email should be send and -M option to precise the email address where this should be sent. This will look like:

#$ -M myaddress@work
#$ -m beas

Where the (-m) option can select after which events you want to receive your email. In particular you can select to be notified at the beginning/end of the job, or when the job is aborted/suspended (see the sample script lines above).

And from the command line you can use the same options (for example):

qsub -M myaddress@work -m be job.sh

How do I control my jobs

Based on the status of the job displayed, you can control the job by the following actions:

Monitoring and controlling with QMON

You can also use the GUI QMON, which gives a convenient window dialog specifically designed for monitoring and controlling jobs, and the buttons are self explanatory.


For further information, see the SGE User's Guide ( PDF, HTML).


[Mar 15, 2017] SGE Queueing System - Scalable Computing Support Center - DukeWiki

Adding SGE Options

If you always need certain SGE options to be specified for a given job, you can embed those options into the SGE job script using lines that start with "#$":

?
#!/bin/tcsh

#

#$ -S /bin/tcsh -cwd

#$ -o simple.out -j y

#$ -l mem_free=500M

cd /home/username/seq/simple

myprog

The '-cwd' option tells SGE to run the job in the same directory that the qsub command was issued - i.e. SGE will 'cd' into the current working directory before it runs the job script. Again, the '-o' option is used to direct screen output to a file. The '-S' option is another way to indicate the shell type (tcsh, bash, or sh) to SGE. The '-j y' is a way to tell SGE to "join" the standard-error output with the standard-output (screen output) - in this case all of it will go to the file 'simple.out'. The '-l mem_free=500M' tells SGE to only run the job on a node with at least 500 megabytes of free RAM available. The '500M' here should be changed to match the actual amount of RAM you expect your job to use, e.g. '-l mem_free=750K' (750 kilobytes of RAM) or '-l mem_free=4G' (4 gigabytes of RAM). Help with estimating your program's memory use can be found here: Monitoring Memory Usage.

In any shell script, lines starting with a "#" are generally ignored as comments by a shell script. It is only SGE that interprets the lines that start with a "#$", other systems will consider them to be comments. This makes it possible for your SGE script to still run on other Linux machines, for example.

** NOTE ** Only the first contiguous block of comments is searched by SGE for "#$" lines. SGE stops processing the "#$" lines when it sees the first command or blank line. It is usually easiest to just follow the above example and put all SGE information at the very top of the file.

** NOTE ** SGE is very particular about the formatting of the queue-submission scripts. In particular, you should make sure that you have a blank line at the end of your script, and that the script is saved in the standard Unix text format (and NOT in the standard Windows-text format). Generally, this is only an issue if you copy job scripts from your Windows laptop to the cluster. If you do so, you may want to run the 'dos2unix' program on your script before submitting. If you are a 'vi' user and you see a '[ dos ]' tag on the status line at the bottom of the screen, you can do ':set fileformat=unix' and save the file; 'vi' will also show '[ noeol ]' on the status line if you don't have an end-of-line marker at the end of the file.

** NOTE ** The more memory your job requires, the more important it is to include a '-l mem_free' request in your script. See Monitoring Memory Usage for more info on determining how much the memory request should be.

Common SGE Options
-cwd use the current directory (where the job was submitted from) to store all output files, including the -o specified file
-M user@hostname -m b,e sends email to the specified account at the beginning (-m b) and end (-m e) of the job. This way, you know when the job has finally started if it got stuck in the queue. You can also use this to send yourself a text message.
-o file -e file directs the standard output (-o) and standard error-output (-e) to the specified files. Note that this is output or error-output that is not otherwise directed to files; ie. csh redirection (myprog > file.out) takes precedence
-j y "joins" the error-output with the standard output, thereby sending both to the same file (given by -o)
-S /bin/tcsh what shell to use, tcsh or sh or whatever you prefer
-N name use this name when displaying in the qstat output; defaults to the name of the script file
-hold_jid job_id_or_job_name if you have one job that must wait for another to complete (perhaps the first one creates an output file which is needed by the second program), then you can request that the job be held until that first job completes, see [SGE Job Dependencies]
-pe high 10-20 requests a high-priority "parallel environment" that spans several machines, in this case any number of CPUs between 10 and 20 (inclusive), a single number will request exactly that many CPUs. Note that if you request more CPUs than you actually have high-priority access to, your job will hang. See Submitting OpenMP Jobs or Submitting MPI Jobs
-q *@machineName-n* request a specific machine, or machine group
-l slots=2 requests that the job be given 2 slots (or 2 cpus) instead of 1; you MUST use this if your program is multi-threaded, you should NOT use it otherwise
-l mem_free=1.5G requests that only machines with 1.5GB (=1536MB) or more be used for this job; ie. the job requires a lot of memory and thus is not suitable for all hosts. Note that 1G is equal to 1024M (How do I determine how much memory my program needs? See the FAQ)
-l h_cpu=Xh requests that X hours be allocated for this job to run
-l scr_free=XG requests that only machines with X GB or more free disk space in the /scratch partition to be used for this job; ie. the job requires a lot of temporary file space and thus is not suitable for all hosts. Note that 1G is equal to 1024M
-l highprio requests that the job be placed in a high-priority queue or parallel environment
-soft putting -soft before a -l requirement indicates that it is a "soft" request; SGE will make a best-effort attempt to find a machine with the requested attribute, but the job may be queue with machines that do NOT have the attribute. Note that -soft applies to ALL -l options that come after it
-t start-stop:step submit an SGE "array" job, that is, run the same job multiple times, but set SGE_TASK_ID to the value start, then start+step, etc., up through stop (step may be omitted); it is up to your script to do something different for each task-ID; see [SGE Array Jobs]

Common Parallel Environment Options

-pe low-all 8 low priority use any machines (Note: No longer working)
-pe low-core 8 low priority only use core machines (Note: No longer working)
-pe threaded 8 low priority use multiple slots on one machine
-l highprio -pe high 8 high priority use any high-priority slots

** NOTE ** Jobs with no memory requests will be placed on ANY machine that is not heavily-loaded. If you need a significant amount of memory in order for your program to run, then you must explicitly request it with "-l mem_free=750M". (How do I determine this? FAQ)

** NOTE ** If you run an MPI job in low-priority mode and even one CPU gets slowed down due to a high-priority request, then all of your MPI tasks are likely to be slowed down as well. This will waste those computational resources that other people could have used.

** NOTE ** For shell script programmers, there are a number of "environment variables" that are automatically set by SGE, see [SGE Env Vars] (and see [SGE Array Jobs] for possible ways to use them).

** NOTE ** For users who "live" in multiple groups (perhaps you collaborate with multiple professors who each have machines in the DSCR), please see [SGE Multiple Groups] for info on how to properly allocate your runs to each group, if this is important to you.

See also [SGE Job Monitoring] or [FAQ]

[Nov 09, 2015] Basement Sun Grid Engine Quick Start

web.njit.edu

Submitting a job to the queue: qsub
Qsub is used to submit a job to SGE. The qsub command has the following syntax:

 qsub [ options ] [ scriptfile | -- [ script args ]] 
Binary files may not be submitted directly to SGE. For example, if we wanted to submit the "date" command to SGE we would need a script that looks like:
#!/bin/bash 
bin/date 
If the script were called sge-date, then we could simply run the following:
$ qsub sge-date 
SGE will then run the program, and place two files in your current directory:
sge-date.e# 
sge-date.o# 
where # is the job number assigned by SGE. The sge-date.e# file contains the output from standard error and the sge-date.o# file contains the output form standard out. The following basic options may be used to submit the job.
-A [account name] -- Specify the account under which to run the job 
-N [name] -- The name of the job 
-l h rt=hr:min:sec -- Maximum walltime for this job 
-r [y,n] -- Should this job be re-runnable (default y) 
-pe [type] [num] -- Request [num] amount of [type] nodes. 
-cwd -- Place the output files (.e,.o) in the current working directory. 
     The default is to place them in the users home directory. 
-S [shell path] -- Specify the shell to use when running the job script 
Although it is possible to use command line options and script wrappers to submit jobs, it is usually more convenient to use just a single script to include all options for the job. The next section describes how this is done.

Job Scripts

The most convenient method to submit a job to SGE is to use a "job script". The job script allows all options and the program file to placed in a single file. The following script will report the node on which it is running, sleep for 60 seconds, then exit. It also reports the start/end date and time as well as sending an email to user when the jobs starts and when the job finishes. Other SGE options are set as well. The example script can be found here as well.

#!/bin/sh
#
# Usage: sleeper.sh [time]]
#        default for time is 60 seconds

# -- our name ---
#$ -N Sleeper1
#$ -S /bin/sh
# Make sure that the .e and .o file arrive in the
# working directory
#$ -cwd
#Merge the standard out and standard error to one file
#$ -j y
/bin/echo Here I am: `hostname`. Sleeping now at: `date`
/bin/echo Running on host: `hostname`.
/bin/echo In directory: `pwd`
/bin/echo Starting on: `date`
# Send mail at submission and completion of script
#$ -m be
#$ -M deadline@kronos
time=60
if [ $# -ge 1 ]; then
   time=$1
fi
sleep $time

echo Now it is: `date`

The "#$" is used in the script to indicate an SGE option. If we name the script sleeper1.sh and then submit it to SGE as follows:

qsub sleeper1.sh

 
The output will be in the file Sleeper1.o#, where # is the job number assigned by SGE. Here is an example output file for the sleeper1.sh script. When submitting MPI or PVM jobs, we will need additional information in the job script. See below.

[Sep 23, 2014] Reserving resources (RAM, disc, GPU) by MerlinWiki

SGE - MerlinWiki

We have found that for some tasks, it is advantageous to specify the info on required resources to SGE. It has sense in case an excessive use of RAM/netowrk storage is expected. The limits are soft and hard (parameters -soft, -hard), the limits themselves are:

 -l resource=value

For example, in case a job needs at least 400MB RAM: qsub -l ram_free=400M my_script.sh Another often requested resource is the space in /tmp: qsub -l tmp_free=10G my_script.sh. Or both:

qsub -l ram_free=400M,tmp_free=10G my_script.sh

Of course, it is possible (and preferable if the number does not change) to use the construction #$ -l ram_free=400M directly in the script. The actual status of given resource on all nodes can be obtained by: qstat -F ram_free, or more things by: qstat -F ram_free,tmp_free.

Details on other standard available resources are in /usr/local/share/SGE/doc/load_parameters.asc. In case you do not specify value for given resource, implicit value will be used (for space on /tmp it is 1GB, for RAM 100MB)

WARNING: You need to distinguish, if you request resources that are available at the time of submission (so called non-consumable resources), or if you need to allocate given resource for the whole runtime of your computation - for example, your program will need 400MB of memory but in the first 10 min of computation, it will allocate only 100MB. In case you use the standard resource mem_free, and during the first 10min another jobs will be submitted to the given node, SGE will interpret it in the following way: you wanted 400MB but you finally use only 100MB so that the rest of 300MB will be given to someone else (i.e. it will submit another task requesting this memory).

For these purposes, it is better to use consumable resources, that are computed independently on the current status of the task - for memory it is ram_free, for disc tmp_free. For example, resource ram_free does not look at the actual free RAM, but it computes the occupation of RAM only based on the requests of individual scripts. It works with the size of RAM of the given machine and subtracts the amount requested by the job that should be run on this machine. In case the job does not specify ram_free, implicit value of ram_free=100M will be used.

For the disk space in /tmp (tmp_free), the situation is more tricky: in case a job does not clean up properly its mess after it finishes, the disk can actually have less space than defined by the resource. Unfortunately, nothing can be done about this.

Known problems with SGE

#$ -q all.q@@blade,all.q@@PCNxxx,all.q@@servers

Main groups of computers are: @blade, @servers, @speech, @PCNxxx, @PCN2xxx - the full and actual list can be obtained by qconf -shgrpl

@stable - @blade, @servers - servers that run all the time w/o restarting
@PCOxxx, @PCNxxx - computer labs, there is a possibility that any node might be restarted at any time,
      a student or someone can shut the machine down by error or "by error". It is more or less sure that these
      machines will run smoothly over night and during weekends. There is also a group for each independent lab e.g. @PCN103.

Parallel jobs - OpenMP

For parallel tasks with threads, it is enough to use parallel environment smp and to set the number of threads:

#!/bin/sh 
#
#$ -N OpenMPjob
#$ -o $JOB_NAME.$JOB_ID.out
#$ -e $JOB_NAME.$JOB_ID.err
#
# PE_name    CPU_Numbers_requested
#$ -pe smp  4
#
cd SOME_DIR_WITH_YOUR_PROGRAM
export OMP_NUM_THREADS=$NSLOTS
 
./your_openmp_program [options]

Parallel jobs - OpenMPI

Listing follows:

#!/bin/bash
# ---------------------------
# our name 
#$ -N MPI_Job
#
# use reservation to stop starvation
#$ -R y
#
# pe request
#$ -pe openmpi 2-4
#
# ---------------------------
# 
#   $NSLOTS          
#       the number of tasks to be used

echo "Got $NSLOTS slots."

mpirun -n $NSLOTS /full/path/to/your/executable

Show only job that are not on hold in qstat

I'm running some jobs on an SGE cluster. Is there a way to make qstat show me only jobs that are not on hold?

qstat -s p shows pending jobs, which is all those with state "qw" and "hqw".

qstat -s h shows hold jobs, which is all those with state "hqw".

I want to be able to see all jobs with state "qw" only and NOT state "hqw". The man pages seem to suggest it isn't possible, but I want to be sure I didn't miss something. It would be REALLY useful and it's really frustrating me that I can't make it work.

Other cluster users have a few thousand jobs on hold ("hqw") and only a handful actually in the queue waiting to run ("qw"). I want to see quickly and easily the stuff that is not on hold so I can see where my jobs are in the queue. It's a pain to have to show everything and then scroll back up to find the relevant part of the output.

Laura

So I figured out a way to show what I want by piping the output of qstat into grep:

qstat -u "*" | grep " qw"


(Note that I need to search for " qw" not just "qw" or it will return the "hqw" states as well.)

But I'd still love to know if it's possible using qstat options only.

[Sep 18, 2014] Documentation-How do I setup a qsub script - Systems

columbia.edu

Besides the guide below here are another links I found covering qsub:

https://www.nbcr.net/pub/wiki/index.php?title=Sample_SGE_Script

http://www.it.uu.se/datordrift/maskinpark/albireo/gridengine.html

http://www.rbvi.ucsf.edu/Resources/sge/user_guide.html

Guide to Using the Grid Engine

The main access to the nodes of the Beowulf Cluster is done by the Sun Grid Engine batch system. Grid Engine will distribute requested jobs on the nodes, depending on the current load of the nodes, the priority of the job and the numbers of jobs a user has already running on the cluster (jobs in the queue of users, which have fewer jobs running, are preferred within the same priority level). Direct login onto the nodes and interactive executions of programs are strongly discourage, because it bypasses the monitoring system of the nodes by the Sun Gridengine and can cause incomplete execution of batch jobs. If interactive jobs are required by some users they can use the command qsh, which starts an xterm session through Grid Engine. Submitting a Job

Programs cannot be submitted directly to the grid engine. Instead they require a small shell script, which is a wrapper for the program to be run. Note that the script must be an executable (check with the ls -l command. If there is not an x in front of the shell script name, it is not executable. It can be changed with the command chmod +x <script name> ). If the program requires interactive input (e.g. Genesis) the input has to be piped in by either the echo command or an external file. The minimal script genesis.sh to run Genesis would be:

  #!/bin/bash
  #$ -S /bin/sh
  echo "lcls.in" | ~/bin/genesis

Note that this is a specific case, which requires that the executable of genesis is located in the directory bin of your home directory. After a check that the script runs correctly (typing ./genesis.sh at the prompt should execute genesis without an error), the job is submitted with the qsub command:

qsub genesis.sh 

The command qsub has many option which should be explicitly defined for each submitted job. There are three methods of doing so with increasing priority (a higher priority will overwrite an already defined option of a lower priority):

   * The default option in the file .sge_request, located in your home directy. The format is just one line with white space between the list of options (e.g. '-cwd -A reiche -j y')
   * Option embedded in your shell script. Normally lines starting with a pound sign are ignored, except if it is immediatly followed by a dollar sign. Everything behind #$ is filtered out by the grid engine as an option.
   * Command line arguments of the qsub command (e.g. qsub -cwd genesis.sh) 

In any case an option starts always with a minus sign and a keyword, followed - if necessary - by additional arguments. Following options are recommended to be set, preferable by the .sge_request file in the home directory:

-cwd Uses the directory, where the job has been submitted, as the working directory. Otherwise your home directory is used.

-C #$ Defines the letter sequence in the script which indicates additional option for submitting the job.

-A <login-name> Defines the user account of the job owner. If not defined it falls back to the user who submitted the job.

-j y Merges the normal output of the file and any error messages into one file, typically with the name <job-name>.o<job-id>.

-m aes Sun Grid Engine will notify the job owner by email if the job is either completed, suspended or aborted.

-M <email-address> The email address to where the notification is send.

-p 0 The priority level of the submitted jobs. Jobs with a higher priority are preferred to be submitted to a node by the grid engine.

-r forces grid engine to restart the job in the case the system has a crash or is rebooted (note, this does not apply if the job itself crashes).

Following option should be defined differently for each job, because they are defined in a context to the specific jobs which is not generally applicable for all jobs.

-N <job-name> Defines a short name for the job to identify it besides the job ID. If omitted the job name is the name of the shell script

-o <outputfile> Names the output file. If omitted the output filename is the defined by <job-name>.o<job-id>

-v <environment> Normally environment variables, defined in your .bash_profile or relarted file, are not exported to the node, where the job runs. With this option grid engine sets the environment variable prior to starting the job.

-notify If the code supports the signals SIGUSR1 and SIGUSR2, these signal will be sent to the program before it is terminated by the grid engine

-pe <parallel environment> Needed for executing parallel jobs

Use man qsub to see further option. All options can also be set/defined in an interactive way by using the job submission feature of qmon.

Monitoring a Job and the Queue

Once the job is submitted a job id is assigned and the job is placed in the queue. To see the status of the queue the command qstat prints a list of all running and pending jobs with a list of the most important information (job ID, job owner, job name, status, node). More information on a specific job can be optain with qstat -j <job-id>. The status of the job is indicated by one or more characters:

r - running
t - transfering to a node
qw - waiting in the queue
d - marked for deletion
R - marked for restart

Normally the status d is hardly observed with qstat and if a job hangs in the queue for a long time, marked for deletion, it indicates that the grid engine is not running properly. Please inform the system administrator about it.

To remove a job from the queue, the command qdel only requires the job-id. A job can also be changed after it has been submitted with the qalter command. It works similar to the qsub commmand but with the job-id instead of the shell script name.

The command qhost gives the status of all nodes. If the load is close to unity it indicates that the machine is busy and most likely running a job (use the qstat command to check - if not then a user might have logged directly onto the node to run a job interatively). Submitting an MPI-Job

To run a parallel job the script requires some additional information. First the option -pe has to be used to indicate the parallel environment. Right now only mpich is supported on the Beowulf cluster. The second mandatory argument for the pe-optionn is the number of requested nodes, which can be also defined as a range of needed nodes. Sun gridengine tries to maximized this number. It is recommmended to add this line to the shell script

    #$ -pe mpich N

where N is the number of the desired nodes. Right now it is limited to 14, corresponding loosely to one job per node/CPU. If mulitple instances per node are required, please contact the system administrator to increase the maximum number of slots.

The invocation of mpirun requires also some non-standard place holders (environmental variables), which is then filled by grid engine at the execution of the script. The format is (one line!)

/usr/local/mpich/bin/mpirun -np $NSLOTS -machinefile $TMPDIR/machines <path to mpi program + optional command line arguments>

Everything up to the path to the mpi program should be used as it is. $NSLOTS and $TMPDIR will be defined by the sun grid engine. Not also that this script does not run correctly if it is executed directly. Further information on MPICH can be found here. Interactive Sessions

If the user has to run interactive session (e.g. Oopics) it can log onto a node with the qsh command. The Sun Grid Engine will then mark that node as busy and do not submit any further job to it till the user has logged out. The command qstat will show INTERATIVE as the job name, indicating that an interactive session is running on that node.

For now the command qsh is not working properly, but the system adminsitrator is currently working on it to fix it. The Interactive Monitor QMON

QMON

is a user interface to replace all of the UNIX commands of the grid engine (e.g. qsub, qdel ...). It is started by typing qmon at the command prompt, follow by a space and an ampersand, so that the prompt is not blocked. For the normal user only the first three buttons are of importance. They correspond to qstat, qhost and qsub, respectively. The usage is mostly intuitive. You can ask also the system administrators for help. It is recommended that at least once the job submission panel is used to define your default parameters and to save the settings. After filling out the parameter press the 'Save Setting' button and name the file to be written. The generated file can be used as a template for .sge_request. Overview of the Most Common Gridengine Commands

More information can be obtain by the man command at the command prompt. The User and Adminstration guide gives a complete discription of the sun gridengine, which most can be also found on the official homepage.

[Sep 18, 2014] Tutorial Submitting a job using qsub

wiki.ibest.uidaho.edu
qsub [options]
  [-a date_time]                           request a start time
  [-ac context_list]                       add context variable(s)
  [-ar ar_id]                              bind job to advance reservation
  [-A account_string]                      account string in accounting record
  [-b y[es]|n[o]]                          handle command as binary
  [-binding [env|pe|set] exp|lin|str]      binds job to processor cores
  [-c n s m x]                             define type of checkpointing for job
             n           no checkpoint is performed.
             s           checkpoint when batch server is shut down.
             m           checkpoint at minimum CPU interval.
             x           checkpoint when job gets suspended.
             <interval>  checkpoint in the specified time interval.
  [-ckpt ckpt-name]                        request checkpoint method
  [-clear]                                 skip previous definitions for job
  [-cwd]                                   use current working directory
  [-C directive_prefix]                    define command prefix for job script
  [-dc simple_context_list]                delete context variable(s)
  [-dl date_time]                          request a deadline initiation time
  [-e path_list]                           specify standard error stream path(s)
  [-h]                                     place user hold on job
  [-hard]                                  consider following requests "hard"
  [-help]                                  print this help
  [-hold_jid job_identifier_list]          define jobnet interdependencies
  [-hold_jid_ad job_identifier_list]       define jobnet array interdependencies
  [-i file_list]                           specify standard input stream file(s)
  [-j y[es]|n[o]]                          merge stdout and stderr stream of job
  [-js job_share]                          share tree or functional job share
  [-jsv jsv_url]                           job submission verification script to be used
  [-l resource_list]                       request the given resources
  [-m mail_options]                        define mail notification events
  [-masterq wc_queue_list]                 bind master task to queue(s)
  [-notify]                                notify job before killing/suspending it
  [-now y[es]|n[o]]                        start job immediately or not at all
  [-M mail_list]                           notify these e-mail addresses
  [-N name]                                specify job name
  [-o path_list]                           specify standard output stream path(s)
  [-P project_name]                        set job's project
  [-p priority]                            define job's relative priority
  [-pe pe-name slot_range]                 request slot range for parallel jobs
  [-q wc_queue_list]                       bind job to queue(s)
  [-R y[es]|n[o]]                          reservation desired
  [-r y[es]|n[o]]                          define job as (not) restartable
  [-sc context_list]                       set job context (replaces old context)
  [-shell y[es]|n[o]]                      start command with or without wrapping <loginshell> -c
  [-soft]                                  consider following requests as soft
  [-sync y[es]|n[o]]                       wait for job to end and return exit code
  [-S path_list]                           command interpreter to be used
  [-t task_id_range]                       create a job-array with these tasks
  [-tc max_running_tasks]                  throttle the number of concurrent tasks (experimental)
  [-terse]                                 tersed output, print only the job-id
  [-v variable_list]                       export these environment variables
  [-verify]                                do not submit just verify
  [-V]                                     export all environment variables
  [-w e|w|n|v|p]                           verify mode (error|warning|none|just verify|poke) for jobs
  [-wd working_directory]                  use working_directory
  [-@ file]                                read commandline input from file
  [{command|-} [command_args]]
What is qsub?

Qsub is the command used for job submission to the cluster. It takes several command line arguments and can also use special directives found in the submission scripts or command file. Several of the most widely used arguments are described in detail below.

Environment variables in qsub

The qsub command will pass certain environment variables in the Variable_List attribute of the job. These variables will be available to the job. The value for the following variables will be taken from the environment of the qsub command:

These values will be assigned to a new name which is the current name prefixed with the string "sge_o_". For example, the job will have access to an environment variable named sge_o_home which have the value of the variable HOME in the qsub command environment.

Arguments to control behavior and request resources

As stated before there are several arguments that you can use to get your jobs to behave a specific way or request resources. This is not an exhaustive list, but some of the most widely used and many that you will will probably need to accomplish specific tasks.

Declare the date/time a job becomes eligible for execution

To set the date/time which a job becomes eligible to run, use the -a argument. The date/time format is [[[[CC]YY]MM]DD]hhmm[.SS]. If -a is not specified qsub assumes that the job should be run immediately.

Try it out

To test -a get the current date from the command line and add a couple of minutes to it. It was 11:31 when I checked. Add hhmm to -a and submit a command from STDIN.

echo "sleep 30" | qsub -a 1133

Manipulate the output files

As a default all jobs will print all stdout (standard output) messages to a file with the name in the format <job_name>.o<job_id> and all stderr (standard error) messages will be sent to a file named <job_name>.e<job_id>. These files will be copied to your working directory when the job finishes. To rename the file or specify a different location for the standard output and error files, use the -o for standard output and -e for the standard error file. You can also combine the output using -j.

Try it out

Create a simple submission file:

sleep.sh

#!/bin/sh

for i in `seq 1 60` ; do
        echo $i
        sleep 1
done

Then submit your job with the standard output file renamed to sleep.log:

qsub -o sleep.log sleep.sh

Submit your job with the standard error file renamed:

qsub -e sleep.log sleep.sh

Mail job status at the start and end of a job

The mailing options are set using the -m and -M arguments. The -m argument sets the conditions under which the batch server will send a mail message about the job and -M will define the users that emails will be sent to (multiple users can be specified in a list seperated by commas). The conditions for the -m argument include:

Try it out

Using the sleep.sh script created earlier, submit a job that emails you for all conditions:

# qsub -m abe -M myname@uidaho.edu sleep.sh

Submitting a job that uses specific resources

For now lets look at checking resources available.

Submitting a job that is dependent on the output of another

To create a job that will not run until another job has completed, simple add the -hold_jid <job name> argument to your environment. This takes the place of PBS's '-W depend=afterok:$ID' argument.

An SGE script example: test2.sge'

#!/bin/sh
#$ -cwd
#$ -N test2
#$ -hold_jid test1

./test2

Now, the 'test2' job will not run until test1 has completed. Note that this is a fairly static way of doing things. If you are building a batch submit script that creates job dependency trees, you could not replace 'test1' with 'test$1', $1 being an argument or environment variable, in this submit script 'test2.sge'. This is because even though the #$ -hold_jid test$1 line is an active comment, since it is a comment in bash, the $1 is not evaluated and changed; it stays as the literal $1, and then is interpreted by SGE as an unset value. The solution is to simply call qsub with the argument in the program call:

qsub -hold_jid $WAITONJOB test2.sge

This allows you to make the job to wait on dynamic. This can be a way of submitting all the directives you want to qsub (like -cwd here) instead of with the active comments. But, if the argument is not dynamic, it complicated job submissions and is generally not a good way to go.

For more examples on dependent job submissions, see an example PBS pipeline. The -W depend=afterok:$ID directive would be replaced with our -hold_jid <job name> for SGE, and you would have the same thing.

Opening an interactive shell to the compute node

See SGE_Tutorial:_Interactive_jobs

Passing an environment variable to your job

You can pass user defined environment variables to a job by using the -v argument.

Try it out

To test this we will use a simple script that prints out an environment variable.

variable.sh

#!/bin/sh
if [ "x" == "x$MYVAR" ] ; then
     echo "Variable is not set"
else
     echo "Variable says: $MYVAR"
fi

Next use qsub without the -v and check your standard out file

# qsub variable.sh

Then use the -v to set the variable

# qsub -v MYVAR="hello" variable.sh

Reference

 

qsub - submit a batch job to Sun Grid Engine.

qsub [ options ] [ command | -- [ command_args ]]

Qsub submits batch jobs to the Sun Grid Engine queuing sys- tem. Sun Grid Engine supports single- and multiple-node jobs. Command can be a path to a binary or a script (see -b below) which contains the commands to be run by the job using a shell (for example, sh(1) or csh(1)). Arguments to the command are given as command_args to qsub . If command is handled as a script then it is possible to embed flags in the script. If the first two characters of a script line either match '#$' or are equal to the prefix string defined with the -C option described below, the line is parsed for embedded command flags.

For qsub, the administrator and the user may define default request files (see sge_request(5)) which can contain any of the options described below. If an option in a default request file is understood by qsub and qlogin but not by qsh the option is silently ignored if qsh is invoked. Thus you can maintain shared default request files for both qsub and qsh.

A cluster wide default request file may be placed under $SGE_ROOT/$SGE_CELL/common/sge_request. User private default request files are processed under the locations $HOME/.sge_request and $cwd/.sge_request. The working direc- tory local default request file has the highest precedence, then the home directory located file and then the cluster global file. The option arguments, the embedded script flags and the options in the default request files are pro- cessed in the following order: left to right in the script line, left to right in the default request files, from top to bottom of the script file (qsub only), from top to bottom of default request files, from left to right of the command line. In other words, the command line can be used to override the embedded flags and the default request settings. The embed- ded flags, however, will override the default settings.

Note, that the -clear option can be used to discard any pre- vious settings at any time in a default request file, in the embedded script flags, or in a command-line option. It is, however, not available with qalter.

The options described below can be requested either hard or soft. By default, all requests are considered hard until the -soft option (see below) is encountered. The hard/soft status remains in effect until its counterpart is encoun- tered again. If all the hard requests for a job cannot be met, the job will not be scheduled. Jobs which cannot be run at the present time remain spooled.

OPTIONS

ENVIRONMENTAL VARIABLES

  • SGE_ROOT Specifies the location of the Sun Grid Engine standard configuration files.
  • SGE_CELL If set, specifies the default Sun Grid Engine cell. To address a Sun Grid Engine cell qsub, qsh, qlogin or qalter use (in the order of precedence):

    The name of the cell specified in the environment variable SGE_CELL, if it is set. The name of the default cell, i.e. default.

  • SGE_DEBUG_LEVEL If set, specifies that debug information should be written to stderr. In addition the level of detail in which debug information is generated is defined.
  • SGE_QMASTER_PORT If set, specifies the tcp port on which sge_qmaster(8) is expected to listen for com- munication requests. Most installations will use a services map entry for the service "sge_qmaster" instead to define that port. In addition to those environment variables specified to be exported to the job via the -v or the -V option (see above) qsub, qsh, and qlogin add the following variables with the indicated values to the variable list: Furthermore, Sun Grid Engine sets additional variables into the job's environment, as listed below. RESTRICTIONS There is no controlling terminal for batch jobs under Sun Grid Engine, and any tests or actions on a controlling ter- minal will fail. If these operations are in your .login or .cshrc file, they may cause your job to abort. Insert the following test before any commands that are not pertinent to batch jobs in your .login:
     if ( $?JOB_NAME) then
          echo "Sun Grid Engine spooled job"
          exit 0
     endif
    
    Don't forget to set your shell's search path in your shell start-up before this code. EXIT STATUS The following exit values are returned:

    EXAMPLES

    The following is the simplest form of a Sun Grid Engine script file.
    #!/bin/csh
      a.out
    =====================================================
    
    The next example is a more complex Sun Grid Engine script.
    
    =====================================================
    
    #!/bin/csh
    
    # Which account to be charged cpu time
    #$ -A santa_claus
    
    # date-time to run, format [[CC]yy]MMDDhhmm[.SS]
    #$ -a 12241200
    
    # to run I want 6 or more parallel processes
    # under the PE pvm. the processes require
    # 128M of memory
    #$ -pe pvm 6- -l mem=128
    
    # If I run on dec_x put stderr in /tmp/foo, if I
    # run on sun_y, put stderr in /usr/me/foo
    #$ -e dec_x:/tmp/foo,sun_y:/usr/me/foo
    
    # Send mail to these users
    #$ -M santa@nothpole,claus@northpole
    
    # Mail at beginning/end/on suspension
    #$ -m bes
    
    # Export these environmental variables
    #$ -v PVM_ROOT,FOOBAR=BAR
    
    # The job is located in the current
    # working directory.
    #$ -cwd
    
    FILES
         $REQUEST.oJID[.TASKID]      STDOUT of job #JID
         $REQUEST.eJID[.TASKID]      STDERR of job
         $REQUEST.poJID[.TASKID]     STDOUT of par. env. of job
         $REQUEST.peJID[.TASKID]     STDERR of par. env. of job
    
         $cwd/.sge_aliases         cwd path aliases
         $cwd/.sge_request         cwd default request
         $HOME/.sge_aliases        user path aliases
         $HOME/.sge_request        user default request
         <sge_root>/<cell>/common/sge_aliases
                                   cluster path aliases
         <sge_root>/<cell>/common/sge_request
                                   cluster default request
         <sge_root>/<cell>/common/act_qmaster
                                   Sun Grid Engine master host file
    

    Recommended Links

    Softpanorama hot topic of the month

    Softpanorama Recommended

    Etc

     



    Etc

    FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available in our efforts to advance understanding of environmental, political, human rights, economic, democracy, scientific, and social justice issues, etc. We believe this constitutes a 'fair use' of any such copyrighted material as provided for in section 107 of the US Copyright Law. In accordance with Title 17 U.S.C. Section 107, the material on this site is distributed without profit exclusivly for research and educational purposes.   If you wish to use copyrighted material from this site for purposes of your own that go beyond 'fair use', you must obtain permission from the copyright owner. 

    ABUSE: IPs or network segments from which we detect a stream of probes might be blocked for no less then 90 days. Multiple types of probes increase this period.  

    Society

    Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

    Quotes

    War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

    Bulletin:

    Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

    History:

    Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

    Classic books:

    The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Haterís Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

    Most popular humor pages:

    Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

    The Last but not Least


    Copyright © 1996-2016 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License.

    The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

    Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

    FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

    This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

    You can use PayPal to make a contribution, supporting development of this site and speed up access. In case softpanorama.org is down you can use the at softpanorama.info

    Disclaimer:

    The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

    Last modified: May, 08, 2017