Softpanorama May the source be with you, but remember the KISS principle ;-)	Home	Switchboard	Unix Administration	Red Hat	TCP/IP Networks	Neoliberalism	Toxic Managers
	(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and bastardization of classic Unix

qsub -- Submitting Job To Queue Instance

News	SGE Commands	Recommended Links	SGE cheat sheet	Reference	Viewing SGE Job Output	SGE Troubleshooting
SGE Submit Scripts	Submitting binaries in SGE	Some interesting queue params	queue_conf - Grid Engine queue configuration file format	slots queue attribute	Monitoring and Controlling Jobs	Monitoring Queues
qalter	qstat	qhold	qping	qacct	qmod	qdel
qhost	qrsh	Execution hosts	Excluding SGE execution host from scheduling	Restricting number of slots per server	Slot hacking	SGE Array Jobs
Creating and modifying SGE Queues	Getting information about hosts	Job Post Mortem	Job or Queue Reported in Error State E	Starting and Killing Daemons	Queue instance in AU state	ulimit problem with infiniband in SGE
SGE hostgroups	Creating and modifying SGE Queues	SGE Execution Host Installation	SGE Parallel Environment	Tips	Humor	Etc

Introduction
Default job options and $HOME/.sge_request file
Output specification
Host Specification
Specifying resource with -l option
SGE pseudo comments
- "Unknown option" error message
SGE jobs environment
Dot files
Checking if submitted Job started executing
Examples
Inheriting Your Environment
Parallel Submit Scripts

Introduction

In the clusters there is "batch scheduler" which typically in installed on the headnode. In case the batch scheduler is SGE the submit command is qsub.

Once a job has been received by the batch server, the scheduler decides the placement and notifies the batch server which in turn notifies qsub whether the job can be run or not. The current status (whether the job was successfully scheduled or not) is then returned to the user. You may use a command file or STDIN as input for qsub.

A job in SGE represents a task to be performed on a node (or multiple nodes) in the cluster and contains the command line used to start the task. A job may have specific resource requirements but in general should be agnostic to which node in the cluster it runs on as long as its resource requirements are met.

Note:

All jobs require at least one available slot on a node in the cluster to run. SGE does not deal with fractional slots.

Here is a simple example of the qsub command which launch a simple job that runs the hostname command on a given cluster node. You can't submit jobs unless your UID is more the 100. That excludes submission jobs as root. Note that in example below sgeadmin account is used.

sgeadmin@master:~$ qsub -o /Apps/myputput.txt -b y -cwd -q all.q@b1 hostname
Your job 1 ("hostname") has been submitted

-o <outputfile> Path (and optionally name) for the output file. If you want to view file from the head node this should exported by NFS directory. If omitted the output filename is the assumed the name <job-name>.o<job-id> and put in the user home directory (unless -cwd or -o options are specified). If -cwd option is specified the output files are put is the current working directory. If -o option is specified in the directory defined by this option.
-e <errorfile> same for error file.

Here are two examples how names are formed. simple.sh and hostname are names of the job (default to script and binary), the suffix starts iether with o or e followed by job number:
- -rw-r--r--. 1 bezroun bezroun 167 Nov 6 17:17 simple.sh.o5 -rw-r--r--. 1 bezroun bezroun 212 Nov 6 17:17 simple.sh.e5 -rw-r--r--. 1 bezroun bezroun 118 Nov 6 17:06 hostname.o3 -rw-r--r--. 1 bezroun bezroun 212 Nov 6 17:06 hostname.e3
The -b option to qsub states that the command being executed could be a single binary executable or a bash script. In this case the command hostname is a single binary. This option takes a y or n argument indicating either yes the command is a binary or no it is not a binary.
The -cwd option to qsub tells Sun Grid Engine that the job should be executed in the same directory that qsub was called.
-q option specified in which queue and on which host of this queue job should run
-pe <parallel environment> [<number of cores>] Needed for executing parallel jobs. Allow to specify the number of cores
- IMPORTANT: If you put something like #$ -pe smp 4 and number of core on the particular node is higher you need to satisfy two conditions
  - Queue that you are using should specify correct number of cores available on this node\
  - PE should have $full_up allocation strategy.
The last argument to qsub is the command to be executed (hostname in this case)

Notice that the qsub command, when successful, will print the job number to stdout. You can use the job number to monitor the job’s status and progress within the queue as we’ll see in the next section.

Default job options and $HOME/.sge_request file

If you always include the same options with your job (e.g. the email notifications above) you can include these automatically. Typically at least two option are specified:

-m aes Types of notifications. Sun Grid Engine will notify the job owner by email if the job is either completed, suspended or aborted.
-M <email-address> The email address to where the notification is send.

To accomplish this you need to create the file '$HOME/.sge_request', and include the options you would include on the command link in there, one option per line. For example:

# mail me when the job starts & ends
 -M [email protected]
 -m be

# pass through some environment variables
 -v PYTHONPATH

# pass through ALL environment variables
# -V
# use multiple cores
# NOTE: Divide h_vmem by the number of cores you request (e.g. 2)
 -pe smp 2

Notes:

There should be a leading space before each option
The -V option to qsub states that the job should have the same environment variables as the shell executing qsub. Recently -V might stopped working because of fix to Shellshock Bash bug. See [gridengine users] tcl jsv error - bash related
- > I added this to our docs for now:
  >
  > ##We strongly discourage users from exporting their environment onto the compute node.
  > ##Doing this pretty much means the job is non-reproductible,
  > ##because all the required settings are not captured in the job script.
  > ##
  > ## pass the current environment variables
  > ##$ -V

Output specification

There are following three options control output stream

-e path_list specify standard error stream path(s)
-j y[es]|n[o] merge stdout and stderr stream of job
-o path_list specify standard output stream path(s)

Sun Grid Engine creates stdout and stderr files in the user home directory, unless -e and -o options are specified. If any additional files are created during a job’s execution, they will be put in the job’s working directory unless explicitly saved elsewhere.

The job’s stdout and stderr files are named after the job with the extension ending in the job’s number.

For the simple job submitted with the command

sgeadmin@master:~$ ls hostname.*
hostname.e1 hostname.o1
sgeadmin@master:~$ cat hostname.o1
b1
sgeadmin@master:~$ cat hostname.e1
sgeadmin@master:~$

Notice that Sun Grid Engine automatically named the job hostname and created two output files: hostname.e1 and hostname.o1. The e stands for stderr and the o for stdout. The 1 at the end of the files’ extension is the job number. So if the job had been named my_new_job and was job #23 submitted, the output files would look like:

my_new_job.e23 my_new_job.o23

Host Specification

If queue is not specified it is submitted to the default queue (typically all.q ) A job can be submitted to a particular queue without any host(s) specification, or to a selected host (queue instance), or to a selected host group (queue domain). Note that this is opposites order in comparison to DNS specification: here query (domain) goes before host.

  qsub -q queue_name  job
  qsub -q queue_name@host_name  job
  qsub -q queue_name@@hostgroup_name  job

One can use regular expressions (single quoted) to specify the hosts. For example, to request any host in the given group regardless of the queue name:

  qsub -q '*@@hostgroup_name'  job

Specifying resource with -l option

Option -l accepts resource=value pairs separated by comma

It allows to launch the job in a Sun Grid Engine queue meeting the given resource request list.

This option is available for qsub, qsh, qrsh, qlogin and qalter only. There may be multiple -l switches in a single command. You may request multiple -l options to be soft or hard both in the same command line. In case of a serial job multiple -l switches refine the definition for the sought queue.

Among other things you can specidy hostname using this option. For example

qsub -l hostname='p6.hpc.firma.com' ...

complex(5) describes the list available resources and their associated valid value specifiers can be obtained.

DESCRIPTION
Complex reflects the format of the Sun Grid Engine complex configuration. The definition of complex attributes pro- vides all pertinent information concerning the resource attributes a user may request for a Sun Grid Engine job via the qsub(1) -l option and for the interpretation of these parameters within the Sun Grid Engine system.

The Sun Grid Engine complex object defines all entries which are used for configuring the global, the host, and queue object. The system has a set of pre defined entries, which are assigned to a host or queue per default. In a addition can the user define new entries and assign them to one or multiple objects. Each load value has to have its corresponding complex entry object, which defines the type and the relational operator for it.

defining resource attributes

The complex configuration should not be accessed directly. In order to add or modify complex entries, the qconf(1) options -Mc and -mc should be used instead. While the -Mc option takes a complex configuration file as an argument and overrides the current configuration, the -mc option bring up an editor filled in with the current complex configuration.

The provided list contains all definitions of resource attributes in the system. Adding a new entry means to pro- vide: name, shortcut, type, relop, requestable, consumable, default, and urgency. The fields are described below. Chang- ing one is easily done by updating the field to change and removing an entry by deleting its definition. An attribute can only be removed, when it is not referenced in a host or queue object anymore. Also does the system have a set of default resource attributes which are always attached to a host or queue. They cannot be deleted nor can the type of such an attribute be changed.

working with resource attributes Before a user can request a resource attribute it has to be attached to the global, host, or cqueue object. The resource attribute exists only for the objects, it got attached to ( if it is attached to the global object(qconf -me global), it exits system wide, host object: only on that host (qconf -me NAME): cqueue object: only on that cqueue (qconf -mq NAME)).

When the user attached a resource attribute to an object, one also has to assign a value to it; the resource limit. Another way to get a resource attribute value is done by configuring a load sensor for that attribute.

Default queue resource attributes In its default form it contains a selection of parameters in the queue configuration as defined in queue_conf(5). The queue configuration parameters being requestable for a job by the user in principal are:
          qname
          hostname
          notify
          calendar
          min_cpu_interval
          tmpdir
          seq_no
          s_rt
          h_rt
          s_cpu
          h_cpu
          s_data
          h_data
          s_stack
          h_stack
          s_core
          h_core
          s_rss
          h_rss
Default host resource attributes The standard set of host related attributes consists of two categories. he first category is built by several queue con- figuration attributes which are particularly suitable to be managed on a host basis. These attributes are:
          slots
          s_vmem
          h_vmem
          s_fsize
          h_fsize
(please refer to queue_conf(5) for details).
Note: Defining these attributes in the host complex is no contradiction to having them also in the queue configura- tion. It allows maintaining the corresponding resources on a host level and at the same time on a queue level. Total vir- tual free memory (h_vmem) can be managed for a host, for example, and a subset of the total amount can be associated with a queue on that host.

The second attribute category in the standard host complex are the default load values Every sge_execd(8) periodically reports load to sge_qmaster(8). The reported load values are either the standard Sun Grid Engine load values such as the CPU load average (see uptime(1)) or load values defined by the Sun Grid Engine administration (see the load_sensor parameter in the cluster configuration sge_conf(5) and the Sun Grid Engine Installation and Administration for details). The characteristics definition for the standard load values is part of the default host complex, while administrator defined load values require extension of the host complex. Please refer to the file /doc/load_parameters.asc for detailed information on the standard set of load values.

Overriding attributes

One attribute can be assigned to the global object, host object, and queue object at the same time. On the host level it might get its value from the user defined resource limit and a load sensor. In case that the attribute is a consum- able, we have in addition to the resource limit and its load report on host level also the internal usage, which the sys- tem keeps track of. The merge is done as follows:

In general an attribute can be overridden on a lower level

global by hosts and queues

hosts by queues and load values or resource limits on the same level.

We have one limitation for overriding attributes based on its relational operator:

!=, == operators can only be overridden on the same level, but not on a lower level. The user defined value always overrides the load value.

>=, >, <=, < operators can only be overridden, when the new value is more restrictive than the old one.

In the case of a consumable on host level, which has also a load sensor, the system checks for the current usage, and if the internal accounting is more restrictive than the load sensor report, the internal value is kept; if the load sen- sor report is more restrictive, that one is kept.

Note, Sun Grid Engine allows backslashes (\) be used to escape newline (\newline) characters. The backslash and the newline are replaced with a space (" ") character before any interpretation.

SGE pseudo comments

Scripts that are submitted via SGE can have pseudocomments that will be processed by SGE and set options much like option to SGE command itself. Here is an example of such comments, which start with "#$" in a simple job script:

$ cat sleep.sh
#!/bin/bash
#
#$ -cwd
#$ -j y
#$ -S /bin/bash
#
date
sleep 10
date

You can put several lines which start with #$. Those are treated by SGE as pseudo comments everything specified in them will be treated as SGE options.

-cwd means to execute the job for the current working directory.
-j y means to merge the standard error stream into the standard output stream instead of having two separate error and output streams.
-S /bin/bash specifies the interpreting shell for this job to be the Bash shell.

To submit such a wrapper job script, you can use the qsub command without any additional parameters, which is very convenient:

$ qsub sleep.sh
your job 16 ("sleep.sh") has been submitted

For a parallel MPI job script, take a look at this script, linpack.sh. Note that you need to put in two SGE variables, $NSLOTS and $TMP/machines within the job script.

$ cat linpack.sh
#!/bin/bash
#
#$ -cwd
#$ -j y
#$ -S /bin/bash
#
MPI_DIR=/opt/mpich/gnu/
HPL_DIR=/opt/hpl/mpich-hpl/

# OpenMPI part. Uncomment the following code and comment the above code
# to use OpemMPI rather than MPICH

# MPI_DIR=/opt/openmpi/
# HPL_DIR=/opt/hpl/openmpi-hpl/

$MPI_DIR/bin/mpirun -np $NSLOTS -machinefile $TMP/machines \
        $HPL_DIR/bin/xhpl

The command to submit a MPI parallel job script is similar to submitting a serial job script but you will need to use the -pe mpich N. N refers to the number of processes that you want to allocate to the MPI program. Here's an example of submitting a 2 processes linpack program using this HPL.dat file:

$ qsub -pe mpich 2 linpack.sh
your job 17 ("linpack.sh") has been submitted

If you need to delete an already submitted job, you can use qdel given it's job id. Here's an example of deleting a fluent job under SGE:

$ qsub fluent.sh
your job 31 ("fluent.sh") has been submitted
$ qstat
job-ID  prior name       user         state submit/start at     queue      master  ja-task-ID
---------------------------------------------------------------------------------------------
     31     0 fluent.sh  sysadm1      t     12/24/2003 01:10:28 comp-pvfs- MASTER
$ qdel 31
sysadm1 has registered the job 31 for deletion
$ qstat
$

Although the example job scripts are bash scripts, SGE can also accept other types of shell scripts. It is trivial to wrap serial programs into a SGE job script. Similarly, for MPI parallel jobs, you just need to use the correct mpirun launcher and to also add in the two SGE variables, $NSLOTS and $TMP/machines within the job script. For other parallel jobs other than MPI, a Parallel Environment or PE needs to be defined. This is covered in the SGE documentation.

"Unknown option" error message

You can specify qsub command line options within the script on lines beginning with #$. For example:

#$ -S /bin/bash

If #$ is not followed by a valid qsub option, you get the unhelpful message:

qsub: Unknown option

If you get this message, search the script you are submitting for #$ and make sure it is followed by a valid qsub command line option.

It is easy to inadvertently introduce #$ when you comment out a line that begins with $:

#$SOME_COMMAND

There is no qsub command line option SOME_COMMAND, so this is an error.

If you are really must keep your lines beginning with #$, you can specify a different prefix string using the qsub -C command line option.

SGE job environment

Inherited Job Environment

execd → shepherd → shell → job
> shepherd overwrites environment with submit settings
> shell overwrites environment with user settings
Options you care about get set explicitly
> Options you don't care about get inherited
> Can lead to strange errors
INHERIT_ENV execd parameter
> Defaults to TRUE
> Should always be set to FALSE

Dot files

Job submission default settings files hierarchy:

$SGE_ROOT/$SGE_CELL/common/sge_request
$HOME/.sge_request
$PWD/.sge_request

Checking if submitted Job started executing

The two command that will be most useful to you are

qhost -- list each node status
qstat - examine the job queue

Host/Node Status: qhost

Node or host status can be obtained by using the qhost command. An example listing is shown below.

HOSTNAME             ARCH       NPROC  LOAD   MEMTOT   MEMUSE   SWAPTO   SWAPUS
-------------------------------------------------------------------------------
global               -              -     -        -        -        -        -
node000              lx24-amd64     2  0.00     3.8G    35.8M      0.0      0.0
node001              lx24-amd64     2  0.00     3.8G    35.2M      0.0      0.0
node002              lx24-amd64     2  0.00     3.8G    35.7M      0.0      0.0
node003              lx24-amd64     2  0.00     3.8G    35.6M      0.0      0.0
node004              lx24-amd64     2  0.00     3.8G    35.7M      0.0      0.0

If job started then for computational tasks load jumps from zero to the number of CPUs.

See qhost For more details

Queue Status: qstat

Queue status for your jobs can be found by issuing a qstat command. An example qstat issued by user deadline is shown below.

job-ID  prior   name       user   state submit/start at   queue  slots ja-task-ID
---------------------------------------------------------------------------------
 304 0.60500 Sleeper4   deadline    r     01/18/2008 17:42:36 cluster@norbert  4 
 307 0.60500 Sleeper4   deadline    r     01/18/2008 17:42:37 cluster@norbert  4 
 310 0.60500 Sleeper4   deadline    qw    01/18/2008 17:42:29                  4 
 313 0.60500 Sleeper4   deadline    qw    01/18/2008 17:42:29                  4 
 316 0.60500 Sleeper4   deadline    qw    01/18/2008 17:42:29                  4 
 321 0.60500 Sleeper4   deadline    qw    01/18/2008 17:42:30                  4 
 325 0.60500 Sleeper4   deadline    qw    01/18/2008 17:42:30                  4 
 308 0.53833 Sleeper2   deadline    qw    01/18/2008 17:42:29                  2

More detail can be found by using the -f option. An example qstat -f issued by user deadline is shown below.

[deadline@norbert sge-tests]$ qstat -f
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
cluster@node0                  BIP   2/2       0.00     lx26-amd64
    310 0.60500 Sleeper4   deadline     r     01/18/2008 17:43:51     1
    313 0.60500 Sleeper4   deadline     r     01/18/2008 17:43:52     1
----------------------------------------------------------------------------
cluster@node1                  BIP   2/2       0.00     lx26-amd64
    310 0.60500 Sleeper4   deadline     r     01/18/2008 17:43:51     1
    313 0.60500 Sleeper4   deadline     r     01/18/2008 17:43:52     1
----------------------------------------------------------------------------
cluster@node2                  BIP   2/2       0.00     lx26-amd64
    310 0.60500 Sleeper4   deadline     r     01/18/2008 17:43:51     1
    313 0.60500 Sleeper4   deadline     r     01/18/2008 17:43:52     1
----------------------------------------------------------------------------
cluster@norbert                BIP   2/2       0.02     lx26-amd64
    310 0.60500 Sleeper4   deadline     r     01/18/2008 17:43:51     1
    313 0.60500 Sleeper4   deadline     r     01/18/2008 17:43:52     1

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
    316 0.60500 Sleeper4   deadline     qw    01/18/2008 17:42:29     4
    321 0.60500 Sleeper4   deadline     qw    01/18/2008 17:42:30     4
    325 0.60500 Sleeper4   deadline     qw    01/18/2008 17:42:30     4
    308 0.53833 Sleeper2   deadline     qw    01/18/2008 17:42:29     2

To look at jobs for all users, you must issue the following:

qstat -u "*"

For queue details, you may add the -f option as shown above. If you prefer to always see all user jobs, you can use the alias command to make this the default behavior. For bash users add the following to your .bashrc file.

alias qstat='qstat -u "*"'

Even more data information can be obtained by using the -F option (see the qstat for more information. For parallel jobs, the output is not very easy to understand. In the above listing, the stat is either qw (queue waiting), t (transferring), and r (running).

A more convenient queue status package called userstat combines qstat, qhost, and qdel into a simple easy to use "top" like interface. Each will be described below. Additional information on these commands is available by using man command-name

Examples

Example 1

Let's assume we have a script sge-date that looks like:
#!/bin/bash 
/bin/date 
We could run it using the command:
qsub sge-date
SGE will then run the program, and place two files in your current directory:
sge-date.e# 
sge-date.o#
where # is the job number assigned by SGE.
After job finished the sge-date.e# file contains the output from standard error and the sge-date.o# file contains the output form standard out. Note, the home directory of the user who is submitting job is not of NFS outpout will be on the node on which job run, not visible on the head node. See Viewing SGE Job Output

The following basic options may be used to submit the job.
-A [account name] -- Specify the account under which to run the job 
-N [name] -- The name of the job 
-l h rt=hr:min:sec -- Maximum walltime for this job 
-r [y,n] -- Should this job be re-runnable (default y) 
-pe [type] [num] -- Request [num] amount of [type] nodes. 
-cwd -- Place the output files (.e,.o) in the current working directory. 
     The default is to place them in the users home directory. 
-S [shell path] -- Specify the shell to use when running the job script
Although it is possible to use command line options and script wrappers to submit jobs, it is usually more convenient to use just a single script to include all options for the job. The next section describes how this is done.

Example 2

The most convenient method to submit a job to SGE is to use a "job script" which contains SGE options as pseudo comments.

SGE pseudo comments allow all options and the program file to placed in the batch file.

The following script will report the node on which it is running, sleep for 60 seconds, then exit. It also reports the start/end date and time as well as sending an email to user when the jobs starts and when the job finishes. Other SGE options are set as well. The example script can be found here as well.
#!/bin/bash
#
# Usage: sleeper.sh [time]]
#        default for time is 60 seconds

# -- our name ---
#$ -N Sleeper1
#$ -S /bin/sh
# Make sure that the .e and .o file arrive in the
# working directory
#$ -cwd
#Merge the standard out and standard error to one file
#$ -j y
/bin/echo Here I am: `hostname`. Sleeping now at: `date`
/bin/echo Running on host: `hostname`.
/bin/echo In directory: `pwd`
/bin/echo Starting on: `date`
# Send mail at submission and completion of script
#$ -m be
#$ -M deadline@kronos
time=60
if [ $# -ge 1 ]; then
   time=$1
fi
sleep $time

echo Now it is: `date` 
The "#$" is used in the script to indicate an SGE option. If we name the script sleeper1.sh we can submit it to SGE as follows:
qsub sleeper1.sh
The output will be in the file Sleeper1.o#, where # is the job number assigned by SGE. Here is an example output file for the sleeper1.sh script. When submitting MPI or PVM jobs, we will need additional information in the job script. See below.
Inheriting Your Environment

If you want to make sure your current environment variables are used on you SGE jobs, include the following in your submit script:

#$ -V

Recently -V might stopped working because of fix to Shellshock Bash bug. See

[gridengine users] tcl jsv error - bash related
- > I added this to our docs for now:
  >
  > ##We strongly discourage users from exporting their environment onto the compute node.
  > ##Doing this pretty much means the job is non-reproductible,
  > ##because all the required settings are not captured in the job script.
  > ##
  > ## pass the current environment variables
  > ##$ -V

Parallel Submit Scripts

Submitting parallel jobs is very similar to submitting single node jobs (as shown above). A parallel job needs a pe parallel environment assigned to the script. The following is an annotated script for submitting an MPICH job to SGE.

#!/bin/sh
#
# EXAMPLE MPICH SCRIPT FOR SGE
# Modified by Basement Supercomputing 1/2/2006 DJE
# To use, change "MPICH_JOB", "NUMBER_OF_CPUS"
# and "MPICH_PROGRAM_NAME" to real values.
#
# Your job name
#$ -N MPICH_JOB
#
# Use current working directory
#$ -cwd
#
# Join stdout and stderr
#$ -j y
#
# pe request for MPICH. Set your number of processors here.
# Make sure you use the "mpich" parallel environment.
#$ -pe mpich NUMBER_OF_CPUS
#
# Run job through bash shell
#$ -S /bin/bash
#
# The following is for reporting only. It is not really needed
# to run the job. It will show up in your output file.
echo "Got $NSLOTS processors."
echo "Machines:"
cat $TMPDIR/machines
# Adjust MPICH procgroup to ensure smooth shutdown
export MPICH_PROCESS_GROUP=no
#
# Use full pathname to make sure we are using the right mpirun
/opt/mpi/tcp/mpich-gnu3/bin/mpirun -np $NSLOTS -machinefile $TMPDIR/machines MPICH_PROGRAM_NAME

The important option is the -pe line in the submit script. This variable must be set for the MPI environment for which you compiled your program. The following example submit scripts are available in examples directory:

To use SGE with MPI simply copy the appropriate scripts to your working directory, edit the script to fill in the appropriate variables, rename it to reflect your program and use qsub to submit it to SGE.

NEWS CONTENTS

20170507 : Monitoring and Controlling Jobs ( biowiki.org )
20170315 : SGE Queueing System - Scalable Computing Support Center - DukeWiki ( SGE Queueing System - Scalable Computing Support Center - DukeWiki, Mar 15, 2017 )
20170315 : Common Parallel Environment Options ( )
20151109 : Basement Sun Grid Engine Quick Start ( web.njit.edu )
20140923 : Reserving resources (RAM, disc, GPU) by MerlinWiki ( Reserving resources (RAM, disc, GPU), Sep 23, 2014 )
20140923 : Show only job that are not on hold in qstat ( )
20140918 : Documentation-How do I setup a qsub script - Systems ( columbia.edu )
20140918 : Tutorial Submitting a job using qsub ( wiki.ibest.uidaho.edu )

Old News ;-)

[May 07, 2017] Monitoring and Controlling Jobs

biowiki.org

After submitting your job to Grid Engine you may track its status by using either the qstat command, the GUI interface QMON, or by email.
Monitoring with qstat
The qstat command provides the status of all jobs and queues in the cluster. The most useful options are:

qstat: Displays list of all jobs with no queue status information.

qstat -u hpc1***: Displays list of all jobs belonging to user hpc1***

qstat -f: gives full information about jobs and queues.

qstat -j [job_id]: Gives the reason why the pending job (if any) is not being scheduled.

You can refer to the man pages for a complete description of all the options of the qstat command.
Monitoring Jobs by Electronic Mail
Another way to monitor your jobs is to make Grid Engine notify you by email on status of the job.

In your batch script or from the command line use the -m option to request that an email should be send and -M option to precise the email address where this should be sent. This will look like:

#$ -M myaddress@work
#$ -m beas

Where the (-m) option can select after which events you want to receive your email. In particular you can select to be notified at the beginning/end of the job, or when the job is aborted/suspended (see the sample script lines above).

And from the command line you can use the same options (for example):

qsub -M myaddress@work -m be job.sh
How do I control my jobs
Based on the status of the job displayed, you can control the job by the following actions:

Modify a job: As a user, you have certain rights that apply exclusively to your jobs. The Grid Engine command line used is qmod. Check the man pages for the options that you are allowed to use.

Suspend/(or Resume) a job: This uses the UNIX kill command, and applies only to running jobs, in practice you type
qmod -s/(or-r)job_id (where job_id is given by qstat or qsub).

Delete a job: You can delete a job that is running or spooled in the queue by using the qdel command like this
qdel job_id (where job_id is given by qstat or qsub).

Monitoring and controlling with QMON
You can also use the GUI QMON, which gives a convenient window dialog specifically designed for monitoring and controlling jobs, and the buttons are self explanatory.

For further information, see the SGE User's Guide ( PDF, HTML).

[Mar 15, 2017] SGE Queueing System - Scalable Computing Support Center - DukeWiki

Adding SGE Options

If you always need certain SGE options to be specified for a given job, you can embed those options into the SGE job script using lines that start with "#$":
?

#!/bin/tcsh
#
#$ -S /bin/tcsh -cwd
#$ -o simple.out -j y
#$ -l mem_free=500M
cd /home/username/seq/simple
myprog

The '-cwd' option tells SGE to run the job in the same directory that the qsub command was issued - i.e. SGE will 'cd' into the current working directory before it runs the job script. Again, the '-o' option is used to direct screen output to a file. The '-S' option is another way to indicate the shell type (tcsh, bash, or sh) to SGE. The '-j y' is a way to tell SGE to "join" the standard-error output with the standard-output (screen output) - in this case all of it will go to the file 'simple.out'. The '-l mem_free=500M' tells SGE to only run the job on a node with at least 500 megabytes of free RAM available. The '500M' here should be changed to match the actual amount of RAM you expect your job to use, e.g. '-l mem_free=750K' (750 kilobytes of RAM) or '-l mem_free=4G' (4 gigabytes of RAM). Help with estimating your program's memory use can be found here: Monitoring Memory Usage.

In any shell script, lines starting with a "#" are generally ignored as comments by a shell script. It is only SGE that interprets the lines that start with a "#$", other systems will consider them to be comments. This makes it possible for your SGE script to still run on other Linux machines, for example.

** NOTE ** Only the first contiguous block of comments is searched by SGE for "#$" lines. SGE stops processing the "#$" lines when it sees the first command or blank line. It is usually easiest to just follow the above example and put all SGE information at the very top of the file.

** NOTE ** SGE is very particular about the formatting of the queue-submission scripts. In particular, you should make sure that you have a blank line at the end of your script, and that the script is saved in the standard Unix text format (and NOT in the standard Windows-text format). Generally, this is only an issue if you copy job scripts from your Windows laptop to the cluster. If you do so, you may want to run the 'dos2unix' program on your script before submitting. If you are a 'vi' user and you see a '[ dos ]' tag on the status line at the bottom of the screen, you can do ':set fileformat=unix' and save the file; 'vi' will also show '[ noeol ]' on the status line if you don't have an end-of-line marker at the end of the file.

** NOTE ** The more memory your job requires, the more important it is to include a '-l mem_free' request in your script. See Monitoring Memory Usage for more info on determining how much the memory request should be.
Common SGE Options

-cwd use the current directory (where the job was submitted from) to store all output files, including the -o specified file

-M user@hostname -m b,e sends email to the specified account at the beginning (-m b) and end (-m e) of the job. This way, you know when the job has finally started if it got stuck in the queue. You can also use this to send yourself a text message.

-o file -e file directs the standard output (-o) and standard error-output (-e) to the specified files. Note that this is output or error-output that is not otherwise directed to files; ie. csh redirection (myprog > file.out) takes precedence

-j y "joins" the error-output with the standard output, thereby sending both to the same file (given by -o)

-S /bin/tcsh what shell to use, tcsh or sh or whatever you prefer

-N name use this name when displaying in the qstat output; defaults to the name of the script file

-hold_jid job_id_or_job_name if you have one job that must wait for another to complete (perhaps the first one creates an output file which is needed by the second program), then you can request that the job be held until that first job completes, see [SGE Job Dependencies]

-pe high 10-20 requests a high-priority "parallel environment" that spans several machines, in this case any number of CPUs between 10 and 20 (inclusive), a single number will request exactly that many CPUs. Note that if you request more CPUs than you actually have high-priority access to, your job will hang. See Submitting OpenMP Jobs or Submitting MPI Jobs

-q *@machineName-n* request a specific machine, or machine group

-l slots=2 requests that the job be given 2 slots (or 2 cpus) instead of 1; you MUST use this if your program is multi-threaded, you should NOT use it otherwise

-l mem_free=1.5G requests that only machines with 1.5GB (=1536MB) or more be used for this job; ie. the job requires a lot of memory and thus is not suitable for all hosts. Note that 1G is equal to 1024M (How do I determine how much memory my program needs? See the FAQ)

-l h_cpu=Xh requests that X hours be allocated for this job to run

-l scr_free=XG requests that only machines with X GB or more free disk space in the /scratch partition to be used for this job; ie. the job requires a lot of temporary file space and thus is not suitable for all hosts. Note that 1G is equal to 1024M

-l highprio requests that the job be placed in a high-priority queue or parallel environment

-soft putting -soft before a -l requirement indicates that it is a "soft" request; SGE will make a best-effort attempt to find a machine with the requested attribute, but the job may be queue with machines that do NOT have the attribute. Note that -soft applies to ALL -l options that come after it

-t start-stop:step submit an SGE "array" job, that is, run the same job multiple times, but set SGE_TASK_ID to the value start, then start+step, etc., up through stop (step may be omitted); it is up to your script to do something different for each task-ID; see [SGE Array Jobs]

Common Parallel Environment Options

-pe low-all 8 low priority use any machines (Note: No longer working)

-pe low-core 8 low priority only use core machines (Note: No longer working)

-pe threaded 8 low priority use multiple slots on one machine

-l highprio -pe high 8 high priority use any high-priority slots

** NOTE ** Jobs with no memory requests will be placed on ANY machine that is not heavily-loaded. If you need a significant amount of memory in order for your program to run, then you must explicitly request it with "-l mem_free=750M". (How do I determine this? FAQ)

** NOTE ** If you run an MPI job in low-priority mode and even one CPU gets slowed down due to a high-priority request, then all of your MPI tasks are likely to be slowed down as well. This will waste those computational resources that other people could have used.

** NOTE ** For shell script programmers, there are a number of "environment variables" that are automatically set by SGE, see [SGE Env Vars] (and see [SGE Array Jobs] for possible ways to use them).

** NOTE ** For users who "live" in multiple groups (perhaps you collaborate with multiple professors who each have machines in the DSCR), please see [SGE Multiple Groups] for info on how to properly allocate your runs to each group, if this is important to you.

See also [SGE Job Monitoring] or [FAQ]

[Nov 09, 2015] Basement Sun Grid Engine Quick Start

web.njit.edu

Submitting a job to the queue: qsub
Qsub is used to submit a job to SGE. The qsub command has the following syntax:
 qsub [ options ] [ scriptfile | -- [ script args ]] 
Binary files may not be submitted directly to SGE. For example, if we wanted to submit the "date" command to SGE we would need a script that looks like:
#!/bin/bash 
bin/date 
If the script were called sge-date, then we could simply run the following:
$ qsub sge-date 
SGE will then run the program, and place two files in your current directory:
sge-date.e# 
sge-date.o# 
where # is the job number assigned by SGE. The sge-date.e# file contains the output from standard error and the sge-date.o# file contains the output form standard out. The following basic options may be used to submit the job.
-A [account name] -- Specify the account under which to run the job 
-N [name] -- The name of the job 
-l h rt=hr:min:sec -- Maximum walltime for this job 
-r [y,n] -- Should this job be re-runnable (default y) 
-pe [type] [num] -- Request [num] amount of [type] nodes. 
-cwd -- Place the output files (.e,.o) in the current working directory. 
     The default is to place them in the users home directory. 
-S [shell path] -- Specify the shell to use when running the job script 
Although it is possible to use command line options and script wrappers to submit jobs, it is usually more convenient to use just a single script to include all options for the job. The next section describes how this is done.
Job Scripts

The most convenient method to submit a job to SGE is to use a "job script". The job script allows all options and the program file to placed in a single file. The following script will report the node on which it is running, sleep for 60 seconds, then exit. It also reports the start/end date and time as well as sending an email to user when the jobs starts and when the job finishes. Other SGE options are set as well. The example script can be found here as well.
#!/bin/sh
#
# Usage: sleeper.sh [time]]
#        default for time is 60 seconds

# -- our name ---
#$ -N Sleeper1
#$ -S /bin/sh
# Make sure that the .e and .o file arrive in the
# working directory
#$ -cwd
#Merge the standard out and standard error to one file
#$ -j y
/bin/echo Here I am: `hostname`. Sleeping now at: `date`
/bin/echo Running on host: `hostname`.
/bin/echo In directory: `pwd`
/bin/echo Starting on: `date`
# Send mail at submission and completion of script
#$ -m be
#$ -M deadline@kronos
time=60
if [ $# -ge 1 ]; then
   time=$1
fi
sleep $time

echo Now it is: `date`
The "#$" is used in the script to indicate an SGE option. If we name the script sleeper1.sh and then submit it to SGE as follows:
qsub sleeper1.sh
 
The output will be in the file Sleeper1.o#, where # is the job number assigned by SGE. Here is an example output file for the sleeper1.sh script. When submitting MPI or PVM jobs, we will need additional information in the job script. See below.

[Sep 23, 2014] Reserving resources (RAM, disc, GPU) by MerlinWiki

SGE - MerlinWiki

matyldaX

scratchX

ram_free, mem_free

disk_free, tmp_free

gpu

We have found that for some tasks, it is advantageous to specify the info on required resources to SGE. It has sense in case an excessive use of RAM/netowrk storage is expected. The limits are soft and hard (parameters -soft, -hard), the limits themselves are:
 -l resource=value
For example, in case a job needs at least 400MB RAM: qsub -l ram_free=400M my_script.sh Another often requested resource is the space in /tmp: qsub -l tmp_free=10G my_script.sh. Or both:
qsub -l ram_free=400M,tmp_free=10G my_script.sh
Of course, it is possible (and preferable if the number does not change) to use the construction #$ -l ram_free=400M directly in the script. The actual status of given resource on all nodes can be obtained by: qstat -F ram_free, or more things by: qstat -F ram_free,tmp_free.

Details on other standard available resources are in /usr/local/share/SGE/doc/load_parameters.asc. In case you do not specify value for given resource, implicit value will be used (for space on /tmp it is 1GB, for RAM 100MB)

WARNING: You need to distinguish, if you request resources that are available at the time of submission (so called non-consumable resources), or if you need to allocate given resource for the whole runtime of your computation - for example, your program will need 400MB of memory but in the first 10 min of computation, it will allocate only 100MB. In case you use the standard resource mem_free, and during the first 10min another jobs will be submitted to the given node, SGE will interpret it in the following way: you wanted 400MB but you finally use only 100MB so that the rest of 300MB will be given to someone else (i.e. it will submit another task requesting this memory).

For these purposes, it is better to use consumable resources, that are computed independently on the current status of the task - for memory it is ram_free, for disc tmp_free. For example, resource ram_free does not look at the actual free RAM, but it computes the occupation of RAM only based on the requests of individual scripts. It works with the size of RAM of the given machine and subtracts the amount requested by the job that should be run on this machine. In case the job does not specify ram_free, implicit value of ram_free=100M will be used.

For the disk space in /tmp (tmp_free), the situation is more tricky: in case a job does not clean up properly its mess after it finishes, the disk can actually have less space than defined by the resource. Unfortunately, nothing can be done about this.

Known problems with SGE

Use of paths - for home directory it is necessary to use the official path - i.e. /homes/kazi/... or /homes/eva (or simply the variable $HOME). In case the path of the internal mountpoint of the automounter is used - i.e. - /var/mnt/... an error will occur. (this is not an error of SGE, the internal path is not fully functional for access)

Availability of nodes - due to the existence of nodes with limited access (employees' PCs), it is necessary to specify a list of nodes, on which your job can run. This can be done using parameter -q. The machines that are available are nodes in IBM Blades and also some computer labs in case you turn the machines on over night. The list of queues for -q must be only on one line even if it is very long. For the availability of given groups of nodes, the parameter -q can be used in the following way:
#$ -q all.q@@blade,all.q@@PCNxxx,all.q@@servers
Main groups of computers are: @blade, @servers, @speech, @PCNxxx, @PCN2xxx - the full and actual list can be obtained by qconf -shgrpl

The syntax for access is QUEUE@OBJECT - i.e. all.q@OBJECT. The object is either one computer, for example all.q@svatava, or a group of computers (which begins also by @ - @blade) i.e. all.q@@blade.

The computers in the labs are sometimes restarted by students during computation - we can't do much about this. In case you really need the computation to finish (i.e. it is not easy to re-run a job in case it is brutally killed) use newly defined groups of computers:
@stable - @blade, @servers - servers that run all the time w/o restarting
@PCOxxx, @PCNxxx - computer labs, there is a possibility that any node might be restarted at any time,
      a student or someone can shut the machine down by error or "by error". It is more or less sure that these
      machines will run smoothly over night and during weekends. There is also a group for each independent lab e.g. @PCN103.
Runnnig other scripts than bash - it is necessary to specify the interpret on the first line of your script (it is probably already there), for example #!/usr/bin/perl, etc.

Does your script generate a heavy traffic on matyldas ? It is necessary to set -l matyldaX=10, (for example 10 - i.e. in total 100/10 = 10 concurrent jobs from given matyldaX), where X is the number of matylda used (in case you use several matyldas, specify -l matyldaX=Y several times). We have created an SGE resource for each matylda (each matylda has 100 points in total) and the jobs using -l matyldaX=Y are submitted until given matylda has free points. This can be used to balance the load of given storage server from the user side. The same holds for servers scratch0X.

Attention to parameter -cwd, is is not guaranteed that it will work all the time, better use cd /where/do/i/want at the beginning of your script.

In case a node is restarted, a job will still be shown in SGE, although it is not running any more. This is because SGE is waiting until the node confirms termination of the computation (i.e. until it boots Linux again and starts the SGE client). In case you use qdel to delete a job, it will be only marked by flag d. Jobs marked by this flag are automatically deleted by the server every hour.

Parallel jobs - OpenMP

For parallel tasks with threads, it is enough to use parallel environment smp and to set the number of threads:
#!/bin/sh 
#
#$ -N OpenMPjob
#$ -o $JOB_NAME.$JOB_ID.out
#$ -e $JOB_NAME.$JOB_ID.err
#
# PE_name    CPU_Numbers_requested
#$ -pe smp  4
#
cd SOME_DIR_WITH_YOUR_PROGRAM
export OMP_NUM_THREADS=$NSLOTS
 
./your_openmp_program [options]
Parallel jobs - OpenMPI

Open MPI is now fully supported, and it is the default parallel environment (mpirun is by default Open MPI)

The SGE parallel environment is openmpi

The allocation rule is $fill_in$ which means that the preferred allocation is on the same machine.

Open MPI is compiled with tight SGE integration:

mpirun will automatically submit to machines reserved by SGE

qdel will automatically clean all MPI stubs

In the parallel task, do not forget (preferably directly in the script) to use parameter -R y, this will turn on the reservation of slots, i.e. you won't be jumped by processes requesting less slots.

in case a parallel task is launched using qlogin, there is no variable containing information on what slots were reserved. A useful tool is then qstat -u `whoami` -g t | grep QLOGIN, which says what parallel jobs are running.

Listing follows:
#!/bin/bash
# ---------------------------
# our name 
#$ -N MPI_Job
#
# use reservation to stop starvation
#$ -R y
#
# pe request
#$ -pe openmpi 2-4
#
# ---------------------------
# 
#   $NSLOTS          
#       the number of tasks to be used

echo "Got $NSLOTS slots."

mpirun -n $NSLOTS /full/path/to/your/executable

Show only job that are not on hold in qstat

I'm running some jobs on an SGE cluster. Is there a way to make qstat show me only jobs that are not on hold?

qstat -s p shows pending jobs, which is all those with state "qw" and "hqw".
qstat -s h shows hold jobs, which is all those with state "hqw".
I want to be able to see all jobs with state "qw" only and NOT state "hqw". The man pages seem to suggest it isn't possible, but I want to be sure I didn't miss something. It would be REALLY useful and it's really frustrating me that I can't make it work.

Other cluster users have a few thousand jobs on hold ("hqw") and only a handful actually in the queue waiting to run ("qw"). I want to see quickly and easily the stuff that is not on hold so I can see where my jobs are in the queue. It's a pain to have to show everything and then scroll back up to find the relevant part of the output.

Laura
So I figured out a way to show what I want by piping the output of qstat into grep:
qstat -u "*" | grep " qw"
(Note that I need to search for " qw" not just "qw" or it will return the "hqw" states as well.)

But I'd still love to know if it's possible using qstat options only.

[Sep 18, 2014] Documentation-How do I setup a qsub script - Systems

columbia.edu

Besides the guide below here are another links I found covering qsub:

https://www.nbcr.net/pub/wiki/index.php?title=Sample_SGE_Script

http://www.it.uu.se/datordrift/maskinpark/albireo/gridengine.html

http://www.rbvi.ucsf.edu/Resources/sge/user_guide.html

Guide to Using the Grid Engine

The main access to the nodes of the Beowulf Cluster is done by the Sun Grid Engine batch system. Grid Engine will distribute requested jobs on the nodes, depending on the current load of the nodes, the priority of the job and the numbers of jobs a user has already running on the cluster (jobs in the queue of users, which have fewer jobs running, are preferred within the same priority level). Direct login onto the nodes and interactive executions of programs are strongly discourage, because it bypasses the monitoring system of the nodes by the Sun Gridengine and can cause incomplete execution of batch jobs. If interactive jobs are required by some users they can use the command qsh, which starts an xterm session through Grid Engine. Submitting a Job

Programs cannot be submitted directly to the grid engine. Instead they require a small shell script, which is a wrapper for the program to be run. Note that the script must be an executable (check with the ls -l command. If there is not an x in front of the shell script name, it is not executable. It can be changed with the command chmod +x <script name> ). If the program requires interactive input (e.g. Genesis) the input has to be piped in by either the echo command or an external file. The minimal script genesis.sh to run Genesis would be:
  #!/bin/bash
  #$ -S /bin/sh
  echo "lcls.in" | ~/bin/genesis
Note that this is a specific case, which requires that the executable of genesis is located in the directory bin of your home directory. After a check that the script runs correctly (typing ./genesis.sh at the prompt should execute genesis without an error), the job is submitted with the qsub command:
qsub genesis.sh 
The command qsub has many option which should be explicitly defined for each submitted job. There are three methods of doing so with increasing priority (a higher priority will overwrite an already defined option of a lower priority):
   * The default option in the file .sge_request, located in your home directy. The format is just one line with white space between the list of options (e.g. '-cwd -A reiche -j y')
   * Option embedded in your shell script. Normally lines starting with a pound sign are ignored, except if it is immediatly followed by a dollar sign. Everything behind #$ is filtered out by the grid engine as an option.
   * Command line arguments of the qsub command (e.g. qsub -cwd genesis.sh) 
In any case an option starts always with a minus sign and a keyword, followed - if necessary - by additional arguments. Following options are recommended to be set, preferable by the .sge_request file in the home directory:

-cwd Uses the directory, where the job has been submitted, as the working directory. Otherwise your home directory is used.

-C #$ Defines the letter sequence in the script which indicates additional option for submitting the job.

-A <login-name> Defines the user account of the job owner. If not defined it falls back to the user who submitted the job.

-j y Merges the normal output of the file and any error messages into one file, typically with the name <job-name>.o<job-id>.

-m aes Sun Grid Engine will notify the job owner by email if the job is either completed, suspended or aborted.

-M <email-address> The email address to where the notification is send.

-p 0 The priority level of the submitted jobs. Jobs with a higher priority are preferred to be submitted to a node by the grid engine.

-r forces grid engine to restart the job in the case the system has a crash or is rebooted (note, this does not apply if the job itself crashes).

Following option should be defined differently for each job, because they are defined in a context to the specific jobs which is not generally applicable for all jobs.

-N <job-name> Defines a short name for the job to identify it besides the job ID. If omitted the job name is the name of the shell script

-o <outputfile> Names the output file. If omitted the output filename is the defined by <job-name>.o<job-id>

-v <environment> Normally environment variables, defined in your .bash_profile or relarted file, are not exported to the node, where the job runs. With this option grid engine sets the environment variable prior to starting the job.

-notify If the code supports the signals SIGUSR1 and SIGUSR2, these signal will be sent to the program before it is terminated by the grid engine

-pe <parallel environment> Needed for executing parallel jobs

Use man qsub to see further option. All options can also be set/defined in an interactive way by using the job submission feature of qmon.

Monitoring a Job and the Queue

Once the job is submitted a job id is assigned and the job is placed in the queue. To see the status of the queue the command qstat prints a list of all running and pending jobs with a list of the most important information (job ID, job owner, job name, status, node). More information on a specific job can be optain with qstat -j <job-id>. The status of the job is indicated by one or more characters:

r - running
t - transfering to a node
qw - waiting in the queue
d - marked for deletion
R - marked for restart

Normally the status d is hardly observed with qstat and if a job hangs in the queue for a long time, marked for deletion, it indicates that the grid engine is not running properly. Please inform the system administrator about it.

To remove a job from the queue, the command qdel only requires the job-id. A job can also be changed after it has been submitted with the qalter command. It works similar to the qsub commmand but with the job-id instead of the shell script name.

The command qhost gives the status of all nodes. If the load is close to unity it indicates that the machine is busy and most likely running a job (use the qstat command to check - if not then a user might have logged directly onto the node to run a job interatively). Submitting an MPI-Job

To run a parallel job the script requires some additional information. First the option -pe has to be used to indicate the parallel environment. Right now only mpich is supported on the Beowulf cluster. The second mandatory argument for the pe-optionn is the number of requested nodes, which can be also defined as a range of needed nodes. Sun gridengine tries to maximized this number. It is recommmended to add this line to the shell script
    #$ -pe mpich N
where N is the number of the desired nodes. Right now it is limited to 14, corresponding loosely to one job per node/CPU. If mulitple instances per node are required, please contact the system administrator to increase the maximum number of slots.

The invocation of mpirun requires also some non-standard place holders (environmental variables), which is then filled by grid engine at the execution of the script. The format is (one line!)

/usr/local/mpich/bin/mpirun -np $NSLOTS -machinefile $TMPDIR/machines <path to mpi program + optional command line arguments>

Everything up to the path to the mpi program should be used as it is. $NSLOTS and $TMPDIR will be defined by the sun grid engine. Not also that this script does not run correctly if it is executed directly. Further information on MPICH can be found here. Interactive Sessions

If the user has to run interactive session (e.g. Oopics) it can log onto a node with the qsh command. The Sun Grid Engine will then mark that node as busy and do not submit any further job to it till the user has logged out. The command qstat will show INTERATIVE as the job name, indicating that an interactive session is running on that node.

For now the command qsh is not working properly, but the system adminsitrator is currently working on it to fix it. The Interactive Monitor QMON

QMON

is a user interface to replace all of the UNIX commands of the grid engine (e.g. qsub, qdel ...). It is started by typing qmon at the command prompt, follow by a space and an ampersand, so that the prompt is not blocked. For the normal user only the first three buttons are of importance. They correspond to qstat, qhost and qsub, respectively. The usage is mostly intuitive. You can ask also the system administrators for help. It is recommended that at least once the job submission panel is used to define your default parameters and to save the settings. After filling out the parameter press the 'Save Setting' button and name the file to be written. The generated file can be used as a template for .sge_request. Overview of the Most Common Gridengine Commands

qsub Submits a job to the queue. It requires a shell scripts, which is wrapped around the program to be run. Options can be either defined as command line arguments, in the script file or by the .sge_request file in your home directory. See above for more information.

qdel Marks a job for deletion. It requires the job-id and not the job name, which can be ambigious.

qalter Change the options for an already submitted job. The options are the same as for qsub but requires the job-id instead of the shell script name. If the job is already running it will be restarted.

qstat Shows the status of the queue or of a specific job if it is specified with the -j <job-id> option.

qhost Shows the status of the nodes.

qsh Starts an xterm session through the grid engine for interactive jobs.

qhold Puts a job, which hasn't been startet yet on hold and is not schedeled for execution by the gridengine till the hold is removed. Requires the job-id as argument.

qrls Releases a job from a hold. It will be put back in the queue and schedule for execution. Requires the job-id

qmon

Interactive monitor of the sun gridengine.

More information can be obtain by the man command at the command prompt. The User and Adminstration guide gives a complete discription of the sun gridengine, which most can be also found on the official homepage.

[Sep 18, 2014] Tutorial Submitting a job using qsub

wiki.ibest.uidaho.edu

qsub [options]
  [-a date_time]                           request a start time
  [-ac context_list]                       add context variable(s)
  [-ar ar_id]                              bind job to advance reservation
  [-A account_string]                      account string in accounting record
  [-b y[es]|n[o]]                          handle command as binary
  [-binding [env|pe|set] exp|lin|str]      binds job to processor cores
  [-c n s m x]                             define type of checkpointing for job
             n           no checkpoint is performed.
             s           checkpoint when batch server is shut down.
             m           checkpoint at minimum CPU interval.
             x           checkpoint when job gets suspended.
             <interval>  checkpoint in the specified time interval.
  [-ckpt ckpt-name]                        request checkpoint method
  [-clear]                                 skip previous definitions for job
  [-cwd]                                   use current working directory
  [-C directive_prefix]                    define command prefix for job script
  [-dc simple_context_list]                delete context variable(s)
  [-dl date_time]                          request a deadline initiation time
  [-e path_list]                           specify standard error stream path(s)
  [-h]                                     place user hold on job
  [-hard]                                  consider following requests "hard"
  [-help]                                  print this help
  [-hold_jid job_identifier_list]          define jobnet interdependencies
  [-hold_jid_ad job_identifier_list]       define jobnet array interdependencies
  [-i file_list]                           specify standard input stream file(s)
  [-j y[es]|n[o]]                          merge stdout and stderr stream of job
  [-js job_share]                          share tree or functional job share
  [-jsv jsv_url]                           job submission verification script to be used
  [-l resource_list]                       request the given resources
  [-m mail_options]                        define mail notification events
  [-masterq wc_queue_list]                 bind master task to queue(s)
  [-notify]                                notify job before killing/suspending it
  [-now y[es]|n[o]]                        start job immediately or not at all
  [-M mail_list]                           notify these e-mail addresses
  [-N name]                                specify job name
  [-o path_list]                           specify standard output stream path(s)
  [-P project_name]                        set job's project
  [-p priority]                            define job's relative priority
  [-pe pe-name slot_range]                 request slot range for parallel jobs
  [-q wc_queue_list]                       bind job to queue(s)
  [-R y[es]|n[o]]                          reservation desired
  [-r y[es]|n[o]]                          define job as (not) restartable
  [-sc context_list]                       set job context (replaces old context)
  [-shell y[es]|n[o]]                      start command with or without wrapping <loginshell> -c
  [-soft]                                  consider following requests as soft
  [-sync y[es]|n[o]]                       wait for job to end and return exit code
  [-S path_list]                           command interpreter to be used
  [-t task_id_range]                       create a job-array with these tasks
  [-tc max_running_tasks]                  throttle the number of concurrent tasks (experimental)
  [-terse]                                 tersed output, print only the job-id
  [-v variable_list]                       export these environment variables
  [-verify]                                do not submit just verify
  [-V]                                     export all environment variables
  [-w e|w|n|v|p]                           verify mode (error|warning|none|just verify|poke) for jobs
  [-wd working_directory]                  use working_directory
  [-@ file]                                read commandline input from file
  [{command|-} [command_args]]

What is qsub?
Qsub is the command used for job submission to the cluster. It takes several command line arguments and can also use special directives found in the submission scripts or command file. Several of the most widely used arguments are described in detail below.
Environment variables in qsub
The qsub command will pass certain environment variables in the Variable_List attribute of the job. These variables will be available to the job. The value for the following variables will be taken from the environment of the qsub command:

HOME (the path to your home directory)

LOGNAME (the name that you logged in with)

PATH (standard path to excecutables)

SHELL (command shell, i.e bash,sh,zsh,csh, ect.)

WORKDIR (time zone)

HOST computer name where submitted from

These values will be assigned to a new name which is the current name prefixed with the string "sge_o_". For example, the job will have access to an environment variable named sge_o_home which have the value of the variable HOME in the qsub command environment.
Arguments to control behavior and request resources
As stated before there are several arguments that you can use to get your jobs to behave a specific way or request resources. This is not an exhaustive list, but some of the most widely used and many that you will will probably need to accomplish specific tasks.

Declare the date/time a job becomes eligible for execution

To set the date/time which a job becomes eligible to run, use the -a argument. The date/time format is [[[[CC]YY]MM]DD]hhmm[.SS]. If -a is not specified qsub assumes that the job should be run immediately.

Try it out

To test -a get the current date from the command line and add a couple of minutes to it. It was 11:31 when I checked. Add hhmm to -a and submit a command from STDIN.
echo "sleep 30" | qsub -a 1133
Manipulate the output files

As a default all jobs will print all stdout (standard output) messages to a file with the name in the format <job_name>.o<job_id> and all stderr (standard error) messages will be sent to a file named <job_name>.e<job_id>. These files will be copied to your working directory when the job finishes. To rename the file or specify a different location for the standard output and error files, use the -o for standard output and -e for the standard error file. You can also combine the output using -j.

Try it out

Create a simple submission file:

sleep.sh
#!/bin/sh

for i in `seq 1 60` ; do
        echo $i
        sleep 1
done
Then submit your job with the standard output file renamed to sleep.log:
qsub -o sleep.log sleep.sh
Submit your job with the standard error file renamed:
qsub -e sleep.log sleep.sh
Mail job status at the start and end of a job

The mailing options are set using the -m and -M arguments. The -m argument sets the conditions under which the batch server will send a mail message about the job and -M will define the users that emails will be sent to (multiple users can be specified in a list seperated by commas). The conditions for the -m argument include:

a: mail is sent when the job is aborted.

b: mail is sent when the job begins.

e: main is sent when the job ends.

Try it out

Using the sleep.sh script created earlier, submit a job that emails you for all conditions:
# qsub -m abe -M [email protected] sleep.sh
Submitting a job that uses specific resources

For now lets look at checking resources available.

Submitting a job that is dependent on the output of another

To create a job that will not run until another job has completed, simple add the -hold_jid <job name> argument to your environment. This takes the place of PBS's '-W depend=afterok:$ID' argument.

An SGE script example: test2.sge'
#!/bin/sh
#$ -cwd
#$ -N test2
#$ -hold_jid test1

./test2
Now, the 'test2' job will not run until test1 has completed. Note that this is a fairly static way of doing things. If you are building a batch submit script that creates job dependency trees, you could not replace 'test1' with 'test$1', $1 being an argument or environment variable, in this submit script 'test2.sge'. This is because even though the #$ -hold_jid test$1 line is an active comment, since it is a comment in bash, the $1 is not evaluated and changed; it stays as the literal $1, and then is interpreted by SGE as an unset value. The solution is to simply call qsub with the argument in the program call:
qsub -hold_jid $WAITONJOB test2.sge
This allows you to make the job to wait on dynamic. This can be a way of submitting all the directives you want to qsub (like -cwd here) instead of with the active comments. But, if the argument is not dynamic, it complicated job submissions and is generally not a good way to go.

For more examples on dependent job submissions, see an example PBS pipeline. The -W depend=afterok:$ID directive would be replaced with our -hold_jid <job name> for SGE, and you would have the same thing.

Opening an interactive shell to the compute node

See SGE_Tutorial:_Interactive_jobs

Passing an environment variable to your job

You can pass user defined environment variables to a job by using the -v argument.
Try it out
To test this we will use a simple script that prints out an environment variable.

variable.sh
#!/bin/sh
if [ "x" == "x$MYVAR" ] ; then
     echo "Variable is not set"
else
     echo "Variable says: $MYVAR"
fi
Next use qsub without the -v and check your standard out file
# qsub variable.sh
Then use the -v to set the variable
# qsub -v MYVAR="hello" variable.sh

Reference

qsub - submit a batch job to Sun Grid Engine.

qsub [ options ] [ command | -- [ command_args ]]

Qsub submits batch jobs to the Sun Grid Engine queuing sys- tem. Sun Grid Engine supports single- and multiple-node jobs. Command can be a path to a binary or a script (see -b below) which contains the commands to be run by the job using a shell (for example, sh(1) or csh(1)). Arguments to the command are given as command_args to qsub . If command is handled as a script then it is possible to embed flags in the script. If the first two characters of a script line either match '#$' or are equal to the prefix string defined with the -C option described below, the line is parsed for embedded command flags.

For qsub, the administrator and the user may define default request files (see sge_request(5)) which can contain any of the options described below. If an option in a default request file is understood by qsub and qlogin but not by qsh the option is silently ignored if qsh is invoked. Thus you can maintain shared default request files for both qsub and qsh.

A cluster wide default request file may be placed under $SGE_ROOT/$SGE_CELL/common/sge_request. User private default request files are processed under the locations $HOME/.sge_request and $cwd/.sge_request. The working direc- tory local default request file has the highest precedence, then the home directory located file and then the cluster global file. The option arguments, the embedded script flags and the options in the default request files are pro- cessed in the following order: left to right in the script line, left to right in the default request files, from top to bottom of the script file (qsub only), from top to bottom of default request files, from left to right of the command line. In other words, the command line can be used to override the embedded flags and the default request settings. The embed- ded flags, however, will override the default settings.

Note, that the -clear option can be used to discard any previous settings at any time in a default request file, in the embedded script flags, or in a command-line option. It is, however, not available with qalter.

The options described below can be requested either hard or soft. By default, all requests are considered hard until the -soft option (see below) is encountered. The hard/soft status remains in effect until its counterpart is encountered again. If all the hard requests for a job cannot be met, the job will not be scheduled. Jobs which cannot be run at the present time remain spooled.

OPTIONS

-@ optionfile Forces qsub, to use the options contained in optionfile. The indicated file may contain all valid options. Comment lines must start with a "#" sign.
-a date_time Defines or redefines the time and date at which a job is eligible for execution. Date_time conforms to [[CC]]YY]MMDDhhmm[.SS], for the details, please see Date_time in: sge_types(1).
If this option is used with qsub or if a corresponding value is specified in qmon then a parameter named a and the value in the format CCYYMMDDhhmm.SS will be passed to the defined JSV instances (see -jsv option below or find more information concerning JSV in jsv(1))
-ac variable[=value],... Adds the given name/value pair(s) to the job's context. Value may be omitted. Sun Grid Engine appends the given argument to the list of context variables for the job. Multiple -ac, -dc, and -sc options may be given. The order is important here.
The outcome of the evaluation of all -ac, -dc, and -sc options or corresponding values in qmon is passed to defined JSV instances as parameter with the name ac. (see -jsv option below or find more information con- cerning JSV in jsv(1)) QALTER allows changing this option even while the job executes.
-ar ar_id Assigns the submitted job to be a part of an existing Advance Reservation. The complete list of existing Advance Reservations can be obtained using the qrstat(1) command.
Note that the -ar option adds implicitly the -w e option if not otherwise requested. jsv(1))
-A account_string Identifies the account to which the resource consump- tion of the job should be charged. The account_string should conform to the name definition in M sge_types 1 . In the absence of this parameter Sun Grid Engine will place the default account string "sge" in the accounting record of the job.
-binding [ binding_instance ] binding_strategy A job can request a specific processor core binding (processor affinity) with this parameter. This request is neither a hard nor a soft request, it is a hint for the execution host to do this if possible. Please note that the requested binding strategy is not used for resource selection within Sun Grid Engine. As a result an execution host might be selected where Sun Grid Engine does not even know the hardware topology and therefore is not able to apply the requested bind- ing.
To enforce Sun Grid Engine to select hardware on which the binding can be applied please use the -l switch in combination with the complex attribute m_topology.

binding_instance is an optional parameter. It might either be env, pe or set depending on which instance should accomplish the job to core binding. If the value for binding_instance is not specified then set will be used.

env means that the environment variable SGE_BINDING will be exported to the job environment of the job. This variable contains the selected operating system internal processor numbers. They might be more than selected cores in presence of SMT or CMT because each core could be represented by multiple processor iden- tifiers. The processor numbers are space separated.

pe means that the information about the selected cores appears in the fourth column of the pe_hostfile. Here the logical core and socket numbers are printed (they start at 0 and have no holes) in colon separated pairs (i.e. 0,0:1,0 which means core 0 on socket 0 and core 0 on socket 1). For more information about the $pe_hostfile check sge_pe(5)

set (default if nothing else is specified). The binding strategy is applied by Sun Grid Engine. How this is achieved depends on the underlying hardware architec- ture of the execution host where the submitted job will be started.

On Solaris 10 hosts a processor set will be created where the job can exclusively run in. Because of operating system limitations at least one core must remain unbound. This resource could of course used by an unbound job.

On Linux hosts a processor affinity mask will be set to restrict the job to run exclusively on the selected cores. The operating system allows other unbound processes to use these cores. Please note that on Linux the binding requires a Linux kernel version of 2.6.16 or greater. It might be even possible to use a kernel with lower version number but in that case addi- tional kernel patches have to be applied. The loadcheck tool in the utilbin directory can be used to check if the hosts capabilities. You can also use the -sep in combination with -cb of qconf(5) command to identify if Sun Grid Engine is able to recognize the hardware topology.

Possible values for binding_strategy are as follows:
```
linear:<amount>[:<socket>,<core>]
striding:<amount>:<n>[:<socket>,<core>]
explicit:[<socket>,<core>;...]<socket>,<core>
```
For the binding strategy linear and striding there is an optional socket and core pair attached. These denotes the mandatory starting point for the first core to bind on.
linear means that Sun Grid Engine tries to bind the job on amount successive cores. If socket and core is omit- ted then Sun Grid Engine first allocates successive cores on the first empty socket found. Empty means that there are no jobs bound to the socket by Sun Grid Engine. If this is not possible or is not sufficient Sun Grid Engine tries to find (further) cores on the socket with the most unbound cores and so on. If the amount of allocated cores is lower than requested cores, no binding is done for the job. If socket and core is specified then Sun Grid Engine tries to find amount of empty cores beginning with this starting point. If this is not possible then binding is not done.

striding means that Sun Grid Engine tries to find cores with a certain offset. It will select amount of empty cores with a offset of n -1 cores in between. Start point for the search algorithm is socket 0 core 0. As soon as amount cores are found they will be used to do the job binding. If there are not enough empty cores or if correct offset cannot be achieved then there will be no binding done.

explicit binds the specified sockets and cores that are mentioned in the provided socket/core list. Each socket/core pair has to be specified only once. If a socket/core pair is already in use by a different job the whole binding request will be ignored.

If this option or a corresponding value in qmon is specified then these values will be passed to defined JSV instances as parameters with the names binding_strategy, binding_type, binding_amount, binding_step, binding_socket, binding_core, binding_exp_n, binding_exp_socket<id>, binding_exp_core<id>.

Please note that the length of the socket/core value list of the explicit binding is reported as binding_exp_n. <id> will be replaced by the position of the socket/core pair within the explicit list (0 <= id < binding_exp_n). The first socket/core pair of the explicit binding will be reported with the parameter names binding_exp_socket0 and binding_exp_core0.

Values that do not apply for the specified binding will not be reported to JSV. E.g. binding_step will only be reported for the striding binding and all binding_exp_* values will passed to JSV if explicit binding was specified. (see -jsv option below or find more infor- mation concerning JSV in jsv(1))
-b y[es]|n[o] This option cannot be embedded in the script file itself.
Gives the user the possibility to indicate explicitly whether command should be treated as binary or script. If the value of -b is 'y', then command may be a binary or script. The command might not be accessible from the submission host. Nothing except the path of the command will be transferred from the submission host to the execution host. Path aliasing will be applied to the path of command before command will be executed.

If the value of -b is 'n' then command needs to be a script and it will be handled as script. The script file has to be accessible by the submission host. It will be transferred to the execution host. qsub/qrsh will search directive prefixes within script.

qsub will implicitly use -b n whereas qrsh will apply the -b y option if nothing else is specified.

The value specified with this option or the correspond- ing value specified in qmon will only be passed to defined JSV instances if the value is yes. The name of the parameter will be b. The value will be y also when then long form yes was specified during submission. (see -jsv option below or find more information con- cerning JSV in jsv(1)) Please note that submission of command as script (-b n) can have a significant performance impact, especially for short running jobs and big job scripts. Script submission adds a number of operations to the submis- sion process: The job script needs to be
- - parsed at client side (for special comments)
- - transferred from submit client to qmaster
- - spooled in qmaster
- - transferred to execd at job execution
- - spooled in execd
- - removed from spooling both in execd and qmaster once the job is done
If job scripts are available on the execution nodes, e.g. via NFS, binary submission can be the better choice.
-c occasion_specifier Defines or redefines whether the job should be check- pointed, and if so, under what circumstances. The specification of the checkpointing occasions with this option overwrites the definitions of the when parameter in the checkpointing environment (see checkpoint(5)) referenced by the qsub -ckpt switch. Possible values for occasion_specifier are
```
n           no checkpoint is performed.
s           checkpoint when batch server is shut down.
m           checkpoint at minimum CPU interval.
x           checkpoint when job gets suspended.
<interval>  checkpoint in the specified time interval.
```
```
          The minimum CPU interval is defined in the queue confi-
          guration  (see  queue_conf(5) for details).  <interval>
          has to be specified in the format hh:mm:ss.   The  max-
          imum of <interval> and the queue's minimum CPU interval
          is used if <interval> is specified.  This  is  done  to
          ensure  that a machine is not overloaded by checkpoints
          being generated too frequently.
```
The value specified with this option or the correspond- ing value specified in qmon will be passed to defined JSV instances. The <interval> will be available as parameter with the name c_interval. The character sequence specified will be available as parameter with the name c_occasion. Please note that if you change c_occasion via JSV then the last setting of c_interval will be overwritten and vice versa. (see -jsv option below or find more information concerning JSV in jsv(1))
-ckpt ckpt_name Selects the checkpointing environment (see check- point(5)) to be used for checkpointing the job. Also declares the job to be a checkpointing job. If this option or a corresponding value in qmon is specified then this value will be passed to defined JSV instances as parameter with the name ckpt. (see -jsv option below or find more information concerning JSV in jsv(1))
-clear Causes all elements of the job to be reset to the ini- tial default status prior to applying any modifications (if any) appearing in this specific command.
-cwd Execute the job from the current working directory. This switch will activate Sun Grid Engine's path alias- ing facility, if the corresponding configuration files are present (see sge_aliases(5)).
If this option or a corresponding value in qmon is specified then this value will be passed to defined JSV instances as parameter with the name cwd. The value of this parameter will be the absolute path to the current working directory. JSV scripts can remove the path from jobs during the verification process by setting the value of this parameter to an empty string. As a result the job behaves as if -cwd was not specified during job submission. (see -jsv option below or find more infor- mation concerning JSV in jsv(1))
-C prefix_string Prefix_string defines the prefix that declares a direc- tive in the job's command. The prefix is not a job attribute, but affects the behavior of qsub and qrsh. If prefix is a null string, the command will not be scanned for embedded directives. The directive prefix consists of two ASCII characters which, when appearing in the first two bytes of a script line, indicate that what follows is an Sun Grid Engine command. The default is "#$". The user should be aware that changing the first delim- iting character can produce unforeseen side effects. If the script file contains anything other than a "#" character in the first byte position of the line, the shell processor for the job will reject the line and may exit the job prematurely. If the -C option is present in the script file, it is ignored.
-dc variable,... Available for qsub, qsh, qrsh, qlogin and qalter only. Removes the given variable(s) from the job's context. Multiple -ac, -dc, and -sc options may be given. The order is important. Qalter allows changing this option even while the job executes. The outcome of the evaluation of all -ac, -dc, and -sc options or corresponding values in qmon is passed to defined JSV instances as parameter with the name ac. (see -jsv option below or find more information con- cerning JSV in jsv(1))
-dl date_time Specifies the deadline initiation time in [[CC]YY]MMDDhhmm[.SS] format (see -a option above). The deadline initiation time is the time at which a dead- line job has to reach top priority to be able to com- plete within a given deadline. Before the deadline ini- tiation time the priority of a deadline job will be raised steadily until it reaches the maximum as config- ured by the Sun Grid Engine administrator. This option is applicable only for users allowed to submit deadline jobs. If this option or a corresponding value in qmon is specified then this value will be passed to defined JSV instances as parameter with the name dl. The format for the date_time value is CCYYMMDDhhmm.SS (see -jsv option below or find more information concerning JSV in jsv(1))
-e [[hostname]:]path,... Defines or redefines the path used for the standard error stream of the job. For qsh, qrsh and qlogin only the standard error stream of prolog and epilog is redirected. If the path constitutes an absolute path name, the error-path attribute of the job is set to path, including the hostname. If the path name is rela- tive, Sun Grid Engine expands path either with the current working directory path (if the -cwd switch (see above) is also specified) or with the home directory path. If hostname is present, the standard error stream will be placed in the corresponding location only if the job runs on the specified host. If the path con- tains a ":" without a hostname, a leading ":" has to be specified. By default the file name for interactive jobs is /dev/null. For batch jobs the default file name has the form job_name.ejob_id and job_name.ejob_id.task_id for array job tasks (see -t option below). If path is a directory, the standard error stream of the job will be put in this directory under the default file name. If the pathname contains certain pseudo environment variables, their value will be expanded at runtime of the job and will be used to constitute the standard error stream path name. The following pseudo environment variables are supported currently: $HOME home directory on execution machine $USER user ID of job owner $JOB_ID current job ID $JOB_NAME current job name (see -N option) $HOSTNAME name of the execution host $TASK_ID array job task index number Alternatively to $HOME the tilde sign "~" can be used as common in csh(1) or ksh(1). Note, that the "~" sign also works in combination with user names, so that "~<user>" expands to the home directory of <user>. Using another user ID than that of the job owner requires corresponding permissions, of course. Qalter allows changing this option even while the job executes. The modified parameter will only be in effect after a restart or migration of the job, however. If this option or a corresponding value in qmon is specified then this value will be passed to defined JSV instances as parameter with the name e. (see -jsv option below or find more information concerning JSV in jsv(1))
-hard Signifies that all -q and -l resource requirements fol- lowing in the command line will be hard requirements and must be satisfied in full before a job can be scheduled. As Sun Grid Engine scans the command line and script file for Sun Grid Engine options and parameters it builds a list of resources required by a job. All such resource requests are considered as absolutely essen- tial for the job to commence. If the -soft option (see below) is encountered during the scan then all follow- ing resources are designated as "soft requirements" for execution, or "nice-to-have, but not essential". If the -hard flag is encountered at a later stage of the scan, all resource requests following it once again become "essential". The -hard and -soft options in effect act as "toggles" during the scan.
If this option or a corresponding value in qmon is specified then the corresponding -q and -l resource requirements will be passed to defined JSV instances as parameter with the names q_hard and l_hard. Find for information in the sections describing -q and -l. (see -jsv option below or find more information concerning JSV in jsv(1))
-h | -h {u|s|o|n|U|O|S}... Available for qsub (only -h), qrsh, qalter and qresub (hold state is removed when not set explicitly). List of holds to place on a job, a task or some tasks of a job.
```
`u'  denotes a user hold.
`s'  denotes a system hold.
`o'  denotes a operator hold.
`n'  denotes no hold (requires manager privileges).
```
As long as any hold other than `n' is assigned to the job the job is not eligible for execution. Holds can be released via qalter and qrls(1). In case of qalter this is supported by the following additional option specifiers for the -h switch:
```
`U'  removes a user hold.
`S'  removes a system hold.
`O'  removes a operator hold.
```
Sun Grid Engine managers can assign and remove all hold types, Sun Grid Engine operators can assign and remove user and operator holds, and users can only assign or remove user holds.
In the case of qsub only user holds can be placed on a job and thus only the first form of the option with the -h switch alone is allowed. As opposed to this, qalter requires the second form described above.

An alternate means to assign hold is provided by the qhold(1) facility.

If the job is a array job (see the -t option below), all tasks specified via -t are affected by the -h operation simultaneously.

If this option is specified with qsub or during the submission of a job in qmon then the parameter h with the value u will be passed to the defined JSV instances indicating that the job will be in user hold after the submission finishes. (see -jsv option below or find more information concerning JSV in jsv(1))
-help Prints a listing of all options.
-hold_jid wc_job_list Available for qsub, qrsh, and qalter only. See sge_types(1). for wc_job_list definition.
Defines or redefines the job dependency list of the submitted job. A reference by job name or pattern is only accepted if the referenced job is owned by the same user as the referring job. The submitted job is not eligible for execution unless all jobs referenced in the comma-separated job id and/or job name list have completed. If any of the referenced jobs exits with exit code 100, the submitted job will remain ineligible for execution.
With the help of job names or regular pattern one can specify a job dependency on multiple jobs satisfying the regular pattern or on all jobs with the requested name. The name dependencies are resolved at submit time and can only be changed via qalter. New jobs or name changes of other jobs will not be taken into account.
If this option or a corresponding value in qmon is specified then this value will be passed to defined JSV instances as parameter with the name hold_jid. (see -jsv option below or find more information concerning JSV in jsv(1))
-hold_jid_ad wc_job_list Defines or redefines the job array dependency list of the submitted job. A reference by job name or pattern is only accepted if the referenced job is owned by the same user as the referring job. Each sub-task of the submitted job is not eligible for execution unless the corresponding sub-tasks of all jobs referenced in the comma-separated job id and/or job name list have com- pleted. If any array task of the referenced jobs exits with exit code 100, the dependent tasks of the submit- ted job will remain ineligible for execution.
With the help of job names or regular pattern one can specify a job dependency on multiple jobs satisfying the regular pattern or on all jobs with the requested name. The name dependencies are resolved at submit time and can only be changed via qalter. New jobs or name changes of other jobs will not be taken into account.

If either the submitted job or any job in wc_job_list are not array jobs with the same range of sub-tasks (see -t option below), the request list will be rejected and the job create or modify operation will error.

If this option or a corresponding value in qmon is specified then this value will be passed to defined JSV instances as parameter with the name hold_jid_ad. (see -jsv option below or find more information concerning JSV in jsv(1))
-i [[hostname]:]file,... Defines or redefines the file used for the standard input stream of the job. If the file constitutes an absolute filename, the input-path attribute of the job is set to path, including the hostname. If the path name is relative, Sun Grid Engine expands path either with the current working directory path (if the -cwd switch (see above) is also specified) or with the home directory path. If hostname is present, the standard input stream will be placed in the corresponding loca- tion only if the job runs on the specified host. If the path contains a ":" without a hostname, a leading ":" has to be specified.
By default /dev/null is the input stream for the job.

It is possible to use certain pseudo variables, whose values will be expanded at runtime of the job and will be used to express the standard input stream as described in the -e option for the standard error stream.

If this option or a corresponding value in qmon is specified then this value will be passed to defined JSV instances as parameter with the name i. (see -jsv option below or find more information concerning JSV in jsv(1))
-j y[es]|n[o] Specifies whether or not the standard error stream of the job is merged into the standard output stream. If both the -j y and the -e options are present, Sun Grid Engine sets but ignores the error-path attribute.
-js job_share Defines or redefines the job share of the job relative to other jobs. Job share is an unsigned integer value. The default job share value for jobs is 0. The job share influences the Share Tree Policy and the Functional Policy. It has no effect on the Urgency and Override Policies (see share_tree(5), sched_conf(5) and the Sun Grid Engine Installation and Administration Guide for further information on the resource manage- ment policies supported by Sun Grid Engine). In case of the Share Tree Policy, users can distribute the tickets to which they are currently entitled among their jobs using different shares assigned via -js. If all jobs have the same job share value, the tickets are distributed evenly. Otherwise, jobs receive tickets relative to the different job shares. Job shares are treated like an additional level in the share tree in the latter case. In connection with the Functional Policy, the job share can be used to weight jobs within the functional job category. Tickets are distributed relative to any uneven job share distribution treated as a virtual share distribution level underneath the functional job category. If both the Share Tree and the Functional Policy are active, the job shares will have an effect in both pol- icies, and the tickets independently derived in each of them are added to the total number of tickets for each job. If this option or a corresponding value in qmon is specified then this value will be passed to defined JSV instances as parameter with the name js. (see -jsv option below or find more information concerning JSV in jsv(1))
-jsv jsv_url Available for qsub, qsh, qrsh and qlogin only. Defines a client JSV instance which will be executed to verify the job specification before the job is sent to qmaster.
In contrast to other options this switch will not be overwritten if it is also used in sge_request files. Instead all specified JSV instances will be executed to verify the job to be submitted.

The JSV instance which is directly passed with the com- mandline of a client is executed as first to verify the job specification. After that the JSV instance which might have been defined in various sge_request files will be triggered to check the job. Find more details in man page jsv(1) and sge_request(5).

The syntax of the jsv_url is specified in sge_types(1).()
-l resource=value,... Available for qsub, qsh, qrsh, qlogin and qalter only.
Launch the job in a Sun Grid Engine queue meeting the given resource request list. In case of qalter the previous definition is replaced by the specified one.

complex(5) describes how a list of available resources and their associated valid value specifiers can be obtained.

There may be multiple -l switches in a single command. You may request multiple -l options to be soft or hard both in the same command line. In case of a serial job multiple -l switches refine the definition for the sought queue.
-m b|e|a|s|n,... Defines or redefines under which circumstances mail is to be sent to the job owner or to the users defined with the -M option described below. The option argu- ments have the following meaning: `b' Mail is sent at the beginning of the job. `e' Mail is sent at the end of the job. `a' Mail is sent when the job is aborted or rescheduled. `s' Mail is sent when the job is suspended. `n' No mail is sent. Currently no mail is sent when a job is suspended. Qalter allows changing the b, e, and a option arguments even while the job executes. The modification of the b option argument will only be in effect after a restart or migration of the job, however. If this option or a corresponding value in qmon is specified then this value will be passed to defined JSV instances as parameter with the name m. (see -jsv option above or find more information concerning JSV in
-M user[@host],... Available for qsub, qsh, qrsh, qlogin and qalter only.
Defines or redefines the list of users to which the server that executes the job has to send mail, if the server sends mail about the job. Default is the job owner at the originating host.
-masterq wc_queue_list Only meaningful for parallel jobs, i.e. together with the -pe option.
Defines or redefines a list of cluster queues, queue domains and queue instances which may be used to become the so called master queue of this parallel job. A more detailed description of wc_queue_list can be found in sge_types(1). The master queue is defined as the queue where the parallel job is started. The other queues to which the parallel job spawns tasks are called slave queues. A parallel job only has one master queue.

This parameter has all the properties of a resource request and will be merged with requirements derived from the -l option described above.
-notify This flag, when set causes Sun Grid Engine to send "warning" signals to a running job prior to sending the signals themselves. If a SIGSTOP is pending, the job will receive a SIGUSR1 several seconds before the SIG- STOP. If a SIGKILL is pending, the job will receive a SIGUSR2 several seconds before the SIGKILL. This option provides the running job, before receiving the SIGSTOP or SIGKILL, a configured time interval to do e.g. cleanup operations. The amount of time delay is controlled by the notify parameter in each queue confi- guration (see queue_conf(5)).
Note that the Linux operating system "misused" the user signals SIGUSR1 and SIGUSR2 in some early Posix thread implementations. You might not want to use the -notify option if you are running multi-threaded applications in your jobs under Linux, particularly on 2.0 or ear- lier kernels.
-now y[es]|n[o] Available for qsub, qsh, qlogin and qrsh. -now y tries to start the job immediately or not at all. The command returns 0 on success, or 1 on failure (also if the job could not be scheduled immediately). For array jobs submitted with the -now option, if all tasks cannot be immediately scheduled, no tasks are scheduled. -now y is default for qsh, qlogin and qrsh With the -now n option, the job will be put into the pending queue if it cannot be executed immediately. -now n is default for qsub. The value specified with this option or the correspond- ing value specified in qmon will only be passed to defined JSV instances if the value is yes. The name of the parameter will be now. The value will be y also when then long form yes was specified during submis- sion. (see -jsv option above or find more information concerning JSV in jsv(1))
-N name Available for qsub, qsh, qrsh, qlogin and qalter only. The name of the job. The name should follow the "name" definition in sge_types(1). Invalid job names will be denied at submit time. If the -N option is not present, Sun Grid Engine assigns the name of the job script to the job after any directory pathname has been removed from the script- name. If the script is read from standard input, the job name defaults to STDIN. In the case of qsh or qlogin with the -N option is absent, the string `INTERACT' is assigned to the job. In the case of qrsh if the -N option is absent, the resulting job name is determined from the qrsh command line by using the argument string up to the first occurrence of a semicolon or whitespace and removing the directory pathname. Qalter allows changing this option even while the job executes. The value specified with this option or the correspond- ing value specified in qmon will be passed to defined JSV instances as parameter with the name N. (see -jsv option above or find more information concerning JSV in jsv(1))
-o [[hostname]:]path,... The path used for the standard output stream of the job. The path is handled as described in the -e option for the standard error stream. By default the file name for standard output has the form job_name.ojob_id and job_name.ojob_id.task_id for array job tasks (see -t option below). Qalter allows changing this option even while the job executes. The modified parameter will only be in effect after a restart or migration of the job, however. If this option or a corresponding value in qmon is specified then this value will be passed to defined JSV instances as parameter with the name o. (see -jsv option above or find more information concerning JSV in jsv(1))
-P project_name Available for qsub, qsh, qrsh, qlogin and qalter only.
Specifies the project to which this job is assigned. The administrator needs to give permission to indivi- dual users to submit jobs to a specific project. (see -aprj option to qconf(1)).

If this option or a corresponding value in qmon is specified then this value will be passed to defined JSV instances as parameter with the name ot. (see -jsv option above or find more information concerning JSV in jsv(1))
-p priority Defines or redefines the priority of the job relative to other jobs. Priority is an integer in the range -1023 to 1024. The default priority value for jobs is 0.
Users may only decrease the priority of their jobs. Sun Grid Engine managers and administrators may also increase the priority associated with jobs. If a pend- ing job has higher priority, it is earlier eligible for being dispatched by the Sun Grid Engine scheduler.

If this option or a corresponding value in qmon is specified and the priority is not 0 then this value will be passed to defined JSV instances as parameter with the name p. (see -jsv option above or find more information concerning JSV in jsv(1))
-pe parallel_environment n[-[m]]|[-]m,... Available for qsub, qsh, qrsh, qlogin and qalter only.
Parallel programming environment (PE) to instantiate. For more detail about PEs, please see the sge_types(1).
-q wc_queue_list Available for qsub, qrsh, qsh, qlogin and qalter.
Defines or redefines a list of cluster queues, queue domains or queue instances which may be used to execute this job. Please find a description of wc_queue_list in sge_types(1). This parameter has all the properties of a resource request and will be merged with requirements derived from the -l option described above. Qalter allows changing this option even while the job executes. The modified parameter will only be in effect after a restart or migration of the job, however. If this option or a corresponding value in qmon is specified the these hard and soft resource requirements will be passed to defined JSV instances as parameters with the names q_hard and q_soft. If regular expres- sions will be used for resource requests, then these expressions will be passed as they are. Also shortcut names will not be expanded. (see -jsv option above or find more information concerning JSV in jsv(1))

-R y[es]|n[o] Indicates whether a reservation for this job should be done. Reservation is never done for immediate jobs, i.e. jobs submitted using the -now yes option. Please note that regardless of the reservation request, job reservation might be disabled using max_reservation in sched_conf(5) and might be limited only to a certain number of high priority jobs.

By default jobs are submitted with the -R n option.

The value specified with this option or the correspond- ing value specified in qmon will only be passed to defined JSV instances if the value is yes. The name of the parameter will be R. The value will be y also when then long form yes was specified during submission. (see -jsv option above or find more information con- cerning JSV in jsv(1))

-r y[es]|n[o] Identifies the ability of a job to be rerun or not. If the value of -r is 'yes', the job will be rerun if the job was aborted without leaving a consistent exit state. (This is typically the case if the node on which the job is running crashes). If -r is 'no', the job will not be rerun under any circumstances. Interactive jobs submitted with qsh, qrsh or qlogin are not rerunnable.
-sc variable[=value],... Sets the given name/value pairs as the job's context. Value may be omitted. Sun Grid Engine replaces the job's previously defined context with the one given as the argument. Multiple -ac, -dc, and -sc options may be given. The order is important.
Contexts provide a way to dynamically attach and remove meta-information to and from a job. The context vari- ables are not passed to the job's execution context in its environment.

-shell y[es]|n[o] -shell n causes qsub to execute the command line directly, as if by exec(2). No command shell will be executed for the job. This option only applies when -b y is also used. Without -b y, -shell n has no effect.

This option can be used to speed up execution as some overhead, like the shell startup and sourcing the shell resource files is avoided.

This option can only be used if no shell-specific com- mand line parsing is required. If the command line con- tains shell syntax, like environment variable substitu- tion or (back) quoting, a shell must be started. In this case either do not use the -shell n option or exe- cute the shell as the command line and pass the path to the executable as a parameter.

If a job executed with the -shell n option fails due to a user error, such as an invalid path to the execut- able, the job will enter the error state.

-shell y cancels the effect of a previous -shell n. Otherwise, it has no effect.

See -b and -noshell for more information.

The value specified with this option or the correspond- ing value specified in qmon will only be passed to defined JSV instances if the value is yes. The name of the parameter will be shell. The value will be y also when then long form yes was specified during submis- sion. (see -jsv option above or find more information concerning JSV in jsv(1))
-soft Signifies that all resource requirements following in the command line will be soft requirements and are to be filled on an "as available" basis. As Sun Grid Engine scans the command line and script file for Sun Grid Engine options and parameters, it builds a list of resources required by the job. All such resource requests are considered as absolutely essential for the job to commence. If the -soft option is encountered during the scan then all following resources are designated as "soft requirements" for execution, or "nice-to-have, but not essential". If the -hard flag (see above) is encountered at a later stage of the scan, all resource requests following it once again become "essential". The -hard and -soft options in effect act as "toggles" during the scan.
If this option or a corresponding value in qmon is specified then the corresponding -q and -l resource requirements will be passed to defined JSV instances as parameter with the names q_soft and l_soft. Find for information in the sections describing -q and -l. (see -jsv option above or find more information concerning JSV in jsv(1))

-sync y[es]|n[o] -sync y causes qsub to wait for the job to complete before exiting. If the job completes successfully, qsub's exit code will be that of the completed job. If the job fails to complete successfully, qsub will print out a error message indicating why the job failed and will have an exit code of 1. If qsub is interrupted, e.g. with CTRL-C, before the job completes, the job will be canceled.

With the -sync n option, qsub will exit with an exit code of 0 as soon as the job is submitted successfully. -sync n is default for qsub.

If -sync y is used in conjunction with -now y, qsub will behave as though only -now y were given until the job has been successfully scheduled, after which time qsub will behave as though only -sync y were given. If -sync y is used in conjunction with -t n[-m[:i]], qsub will wait for all the job's tasks to complete before exiting. If all the job's tasks complete successfully, qsub's exit code will be that of the first completed job tasks with a non-zero exit code, or 0 if all job tasks exited with an exit code of 0. If any of the job's tasks fail to complete successfully, qsub will print out an error message indicating why the job task(s) failed and will have an exit code of 1. If qsub is interrupted, e.g. with CTRL-C, before the job completes, all of the job's tasks will be canceled.
-S [[hostname]:]pathname,... Specifies the interpreting shell for the job. Only one pathname component without a host specifier is valid and only one path name for a given host is allowed. Shell paths with host assignments define the interpret- ing shell for the job if the host is the execution host. The shell path without host specification is used if the execution host matches none of the hosts in the list.
Furthermore, the pathname can be constructed with pseudo environment variables as described for the -e option above.

In the case of qsh the specified shell path is used to execute the corresponding command interpreter in the xterm(1) (via its -e option) started on behalf of the interactive job. Qalter allows changing this option even while the job executes. The modified parameter will only be in effect after a restart or migration of the job, however.

If this option or a corresponding value in qmon is specified then this value will be passed to defined JSV instances as parameter with the name S. (see -jsv option above or find more information concerning JSV in jsv(1))
-t n[-m[:s]] Submits a so called Array Job, i.e. an array of identi- cal tasks being differentiated only by an index number and being treated by Sun Grid Engine almost like a series of jobs. The option argument to -t specifies the number of array job tasks and the index number which will be associated with the tasks. The index numbers will be exported to the job tasks via the environment variable SGE_TASK_ID. The option arguments n, m and s will be available through the environment variables SGE_TASK_FIRST, SGE_TASK_LAST and SGE_TASK_STEPSIZE.
Following restrictions apply to the values n and m:
```
1 <= n <= MIN(2^31-1, max_aj_tasks)
1 <= m <= MIN(2^31-1, max_aj_tasks)
n <= m
```
max_aj_tasks is defined in the cluster configuration (see sge_conf(5)) The task id range specified in the option argument may be a single number, a simple range of the form n-m or a range with a step size. Hence, the task id range speci- fied by 2-10:2 would result in the task id indexes 2, 4, 6, 8, and 10, for a total of 5 identical tasks, each with the environment variable SGE_TASK_ID containing one of the 5 index numbers.
All array job tasks inherit the same resource requests and attribute definitions as specified in the qsub or qalter command line, except for the -t option. The tasks are scheduled independently and, provided enough resources exist, concurrently, very much like separate jobs. However, an array job or a sub-array there of can be accessed as a single unit by commands like qmod(1) or qdel(1). See the corresponding manual pages for further detail.

Array jobs are commonly used to execute the same type of operation on varying input data sets correlated with the task index number. The number of tasks in a array job is unlimited.

STDOUT and STDERR of array job tasks will be written into different files with the default location
```
<jobname>.['e'|'o']<job_id>'.'<task_id>
```
In order to change this default, the -e and -o options (see above) can be used together with the pseudo environment variables $HOME, $USER, $JOB_ID, $JOB_NAME, $HOSTNAME, and $SGE_TASK_ID.
Note, that you can use the output redirection to divert the output of all tasks into the same file, but the result of this is undefined.

If this option or a corresponding value in qmon is specified then this value will be passed to defined JSV instances as parameters with the name t_min, t_max and t_step (see -jsv option above or find more information concerning JSV in jsv(1))
-tc max_running_tasks -allow users to limit concurrent array job task execu- tion. Parameter max_running_tasks specifies maximum number of simultaneously running tasks. For example we have running SGE with 10 free slots. We call qsub -t 1-100 -tc 2 jobscript. Then only 2 tasks will be scheduled to run even when 8 slots are free.
-terse -terse causes the qsub to display only the job-id of the job being submitted rather than the regular "Your job ..." string. In case of an error the error is reported on stderr as usual. This can be helpful for scripts which need to parse qsub output to get the job-id. Information that this switch was specified during sub- mission is not available in the JSV context. (see -jsv option above or find more information concerning JSV in jsv(1))
-v variable[=value],... Defines or redefines the environment variables to be exported to the execution context of the job. If the -v option is present Sun Grid Engine will add the environment variables defined as arguments to the switch and, optionally, values of specified variables, to the execution context of the job.
-verify Instead of submitting a job, prints detailed informa- tion about the would-be job as though qstat(1) -j were used, including the effects of command-line parameters and the external environment.
-V Specifies that all environment variables active within the qsub utility be exported to the context of the job.
All environment variables specified with -v, -V or the DISPLAY variable provided with -display will be exported to the defined JSV instances only optionally when this is requested explicitly during the job sub- mission verification. (see -jsv option above or find more information concerning JSV in jsv(1))

-w e|w|n|p|v Specifies a validation level applied to the job to be submitted (qsub, qlogin, and qsh) or the specified queued job (qalter). The information displayed indi- cates whether the job can possibly be scheduled assum- ing an empty system with no other jobs. Resource requests exceeding the configured maximal thresholds or requesting unavailable resource attributes are possible causes for jobs to fail this validation.

The specifiers e, w, n and v define the following vali- dation modes:

`e'  error - jobs with invalid requests will be
   rejected.
`w'  warning - only a warning will be displayed
   for invalid requests.
`n'  none - switches off validation; the default for
   qsub, qalter, qrsh, qsh
   and qlogin.
`p'  poke - does not submit the job but prints a
   validation report based on a cluster as is with
   all resource utilizations in place.
`v'  verify - does not submit the job but prints a
   validation report based on an empty cluster.

          Note, that the necessary checks are performance consum-
          ing  and hence the checking is switched off by default.
          It should also be noted that load values are not  taken
          into  account  with  the  verification  since  they are
          assumed to be too volatile. To cause -w e  verification
          to  be  passed  at  submission  time, it is possible to
          specify non-volatile values (non-consumables)  or  max-
          imum values (consumables) in complex_values.

-wd working_dir Execute the job from the directory specified in working_dir. This switch will activate Sun Grid Engine's path aliasing facility, if the corresponding configuration files are present (see sge_aliases(5)).
command The job's scriptfile or binary. If not present or if the operand is the single-character string '-', qsub reads the script from standard input. The command will be available in defined JSV instances as parameter with the name CMDNAME (see -jsv option above or find more information concerning JSV in jsv(1))
command_args Arguments to the job. Not valid if the script is entered from standard input.

ENVIRONMENTAL VARIABLES

SGE_ROOT Specifies the location of the Sun Grid Engine standard configuration files.

SGE_CELL If set, specifies the default Sun Grid Engine cell. To address a Sun Grid Engine cell qsub, qsh, qlogin or qalter use (in the order of precedence):

The name of the cell specified in the environment variable SGE_CELL, if it is set. The name of the default cell, i.e. default.

SGE_DEBUG_LEVEL If set, specifies that debug information should be written to stderr. In addition the level of detail in which debug information is generated is defined.

SGE_QMASTER_PORT If set, specifies the tcp port on which sge_qmaster(8) is expected to listen for com- munication requests. Most installations will use a services map entry for the service "sge_qmaster" instead to define that port.

In addition to those environment variables specified to be exported to the job via the -v or the -V option (see above) qsub, qsh, and qlogin add the following variables with the indicated values to the variable list:

SGE_O_HOME the home directory of the submitting client.
SGE_O_HOST the name of the host on which the submitting client is running.
SGE_O_LOGNAME the LOGNAME of the submitting client.
SGE_O_MAIL the MAIL of the submitting client. This is the mail directory of the submitting client.
SGE_O_PATH the executable search path of the submitting client.
SGE_O_SHELL the SHELL of the submitting client.
SGE_O_TZ the time zone of the submitting client.
SGE_O_WORKDIR the absolute path of the current working directory of the submitting client.

Furthermore, Sun Grid Engine sets additional variables into the job's environment, as listed below.

ARC
SGE_ARCH The Sun Grid Engine architecture name of the node on which the job is running. The name is compiled-in into the sge_execd(8) binary.
SGE_CKPT_ENV Specifies the checkpointing environment (as selected with the -ckpt option) under which a checkpointing job executes. Only set for checkpointing jobs.
SGE_CKPT_DIR Only set for checkpointing jobs. Contains path ckpt_dir (see checkpoint(5) ) of the checkpoint interface.
SGE_STDERR_PATH the pathname of the file to which the stan- dard error stream of the job is diverted. Commonly used for enhancing the output with error messages from prolog, epilog, parallel environment start/stop or checkpointing scripts.
SGE_STDOUT_PATH the pathname of the file to which the stan- dard output stream of the job is diverted. Commonly used for enhancing the output with messages from prolog, epilog, parallel environment start/stop or checkpointing scripts.
SGE_STDIN_PATH the pathname of the file from which the stan- dard input stream of the job is taken. This variable might be used in combination with SGE_O_HOST in prolog/epilog scripts to transfer the input file from the submit to the execution host.
SGE_JOB_SPOOL_DIR The directory used by sge_shepherd(8) to store job related data during job execution. This directory is owned by root or by a Sun Grid Engine administrative account and com- monly is not open for read or write access to regular users.
SGE_TASK_ID The index number of the current array job task (see -t option above). This is an unique number in each array job and can be used to reference different input data records, for example. This environment variable is set to "undefined" for non-array jobs. It is possi- ble to change the predefined value of this variable with -v or -V (see options above).
SGE_TASK_FIRST The index number of the first array job task (see -t option above). It is possible to change the predefined value of this variable with -v or -V (see options above).
SGE_TASK_LAST The index number of the last array job task (see -t option above). It is possible to change the predefined value of this variable with -v or -V (see options above).
SGE_TASK_STEPSIZE The step size of the array job specification (see -t option above). It is possible to change the predefined value of this variable with -v or -V (see options above).
ENVIRONMENT The ENVIRONMENT variable is set to BATCH to identify that the job is being executed under Sun Grid Engine control.
HOME The user's home directory path from the passwd(5) file.
HOSTNAME The hostname of the node on which the job is running.
JOB_ID A unique identifier assigned by the sge_qmaster(8) when the job was submitted. The job ID is a decimal integer in the range 1 to 99999.
JOB_NAME The job name. For batch jobs or jobs submit- ted by qrsh with a command, the job name is built as basename of the qsub script filename resp. the qrsh command. This default may be overwritten by the -N. option.
JOB_SCRIPT The path to the job script which is executed. The value can not be overwritten by the -v or -V option.
LOGNAME The user's login name from the passwd(5) file.
NHOSTS The number of hosts in use by a parallel job.
NQUEUES The number of queues allocated for the job (always 1 for serial jobs).
NSLOTS The number of queue slots in use by a paral- lel job.
PATH A default shell search path of: /usr/local/bin:/usr/ucb:/bin:/usr/bin
SGE_BINARY_PATH The path where the Sun Grid Engine binaries are installed. The value is the concatenation of the cluster configuration value binary_path and the architecture name $SGE_ARCH environment variable.
PE The parallel environment under which the job executes (for parallel jobs only).
PE_HOSTFILE The path of a file containing the definition of the virtual parallel machine assigned to a parallel job by Sun Grid Engine. See the description of the $pe_hostfile parameter in sge_pe(5) for details on the format of this file. The environment variable is only avail- able for parallel jobs.
QUEUE The name of the cluster queue in which the job is running.
REQUEST Available for batch jobs only. The request name of a job as specified with the -N switch (see above) or taken as the name of the job script file.
RESTARTED This variable is set to 1 if a job was res- tarted either after a system crash or after a migration in case of a checkpointing job. The variable has the value 0 otherwise.
SHELL The user's login shell from the passwd(5) file. Note: This is not necessarily the shell in use for the job.
TMPDIR The absolute path to the job's temporary working directory.
TMP The same as TMPDIR; provided for compatibil- ity with NQS.
TZ The time zone variable imported from sge_execd(8) if set.
USER The user's login name from the passwd(5) file.
SGE_JSV_TIMEOUT If the response time of the client JSV is greater than this timeout value, then the JSV will attempt to be re-started. The default value is 10 seconds, and this value must be greater than 0. If the timeout has been reached, the JSV will only try to re-start once, if the timeout is reached again an error will occur.

RESTRICTIONS There is no controlling terminal for batch jobs under Sun Grid Engine, and any tests or actions on a controlling ter- minal will fail. If these operations are in your .login or .cshrc file, they may cause your job to abort. Insert the following test before any commands that are not pertinent to batch jobs in your .login:

 if ( $?JOB_NAME) then
      echo "Sun Grid Engine spooled job"
      exit 0
 endif

Don't forget to set your shell's search path in your shell start-up before this code. EXIT STATUS The following exit values are returned:

0 Operation was executed successfully.
25 It was not possible to register a new job according to the configured max_u_jobs or max_jobs limit. Additional information may be found in sge_conf(5)
>0 Error occurred.

EXAMPLES

The following is the simplest form of a Sun Grid Engine script file.

#!/bin/csh
  a.out
=====================================================

The next example is a more complex Sun Grid Engine script.

=====================================================

#!/bin/csh

# Which account to be charged cpu time
#$ -A santa_claus

# date-time to run, format [[CC]yy]MMDDhhmm[.SS]
#$ -a 12241200

# to run I want 6 or more parallel processes
# under the PE pvm. the processes require
# 128M of memory
#$ -pe pvm 6- -l mem=128

# If I run on dec_x put stderr in /tmp/foo, if I
# run on sun_y, put stderr in /usr/me/foo
#$ -e dec_x:/tmp/foo,sun_y:/usr/me/foo

# Send mail to these users
#$ -M santa@nothpole,claus@northpole

# Mail at beginning/end/on suspension
#$ -m bes

# Export these environmental variables
#$ -v PVM_ROOT,FOOBAR=BAR

# The job is located in the current
# working directory.
#$ -cwd

FILES

     $REQUEST.oJID[.TASKID]      STDOUT of job #JID
     $REQUEST.eJID[.TASKID]      STDERR of job
     $REQUEST.poJID[.TASKID]     STDOUT of par. env. of job
     $REQUEST.peJID[.TASKID]     STDERR of par. env. of job

     $cwd/.sge_aliases         cwd path aliases
     $cwd/.sge_request         cwd default request
     $HOME/.sge_aliases        user path aliases
     $HOME/.sge_request        user default request
     <sge_root>/<cell>/common/sge_aliases
                               cluster path aliases
     <sge_root>/<cell>/common/sge_request
                               cluster default request
     <sge_root>/<cell>/common/act_qmaster
                               Sun Grid Engine master host file

Etc

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers : Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda : SE quotes : Language Design and Programming Quotes : Random IT-related quotes : Somerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Bernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds : Larry Wall : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOS : Programming Languages History : PL/1 : Simula 67 : C : History of GCC development : Scripting Languages : Perl history : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-Month : How to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D

Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to to buy a cup of coffee for authors of this site

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last modified: July, 28, 2019

Top updates <p>Your browser does not support iframes.</p>
Bulletin	Latest	Past week	Past month	Google Search

-cwd	use the current directory (where the job was submitted from) to store all output files, including the -o specified file
-M user@hostname -m b,e	sends email to the specified account at the beginning (-m b) and end (-m e) of the job. This way, you know when the job has finally started if it got stuck in the queue. You can also use this to send yourself a text message.
-o file -e file	directs the standard output (-o) and standard error-output (-e) to the specified files. Note that this is output or error-output that is not otherwise directed to files; ie. csh redirection (myprog > file.out) takes precedence
-j y	"joins" the error-output with the standard output, thereby sending both to the same file (given by -o)
-S /bin/tcsh	what shell to use, tcsh or sh or whatever you prefer
-N name	use this name when displaying in the qstat output; defaults to the name of the script file
-hold_jid job_id_or_job_name	if you have one job that must wait for another to complete (perhaps the first one creates an output file which is needed by the second program), then you can request that the job be held until that first job completes, see [SGE Job Dependencies]
-pe high 10-20	requests a high-priority "parallel environment" that spans several machines, in this case any number of CPUs between 10 and 20 (inclusive), a single number will request exactly that many CPUs. Note that if you request more CPUs than you actually have high-priority access to, your job will hang. See Submitting OpenMP Jobs or Submitting MPI Jobs
-q @machineName-n	request a specific machine, or machine group
-l slots=2	requests that the job be given 2 slots (or 2 cpus) instead of 1; you MUST use this if your program is multi-threaded, you should NOT use it otherwise
-l mem_free=1.5G	requests that only machines with 1.5GB (=1536MB) or more be used for this job; ie. the job requires a lot of memory and thus is not suitable for all hosts. Note that 1G is equal to 1024M (How do I determine how much memory my program needs? See the FAQ)
-l h_cpu=Xh	requests that X hours be allocated for this job to run
-l scr_free=XG	requests that only machines with X GB or more free disk space in the /scratch partition to be used for this job; ie. the job requires a lot of temporary file space and thus is not suitable for all hosts. Note that 1G is equal to 1024M
-l highprio	requests that the job be placed in a high-priority queue or parallel environment
-soft	putting -soft before a -l requirement indicates that it is a "soft" request; SGE will make a best-effort attempt to find a machine with the requested attribute, but the job may be queue with machines that do NOT have the attribute. Note that -soft applies to ALL -l options that come after it
-t start-stop:step	submit an SGE "array" job, that is, run the same job multiple times, but set SGE_TASK_ID to the value start, then start+step, etc., up through stop (step may be omitted); it is up to your script to do something different for each task-ID; see [SGE Array Jobs]

-pe low-all 8	low priority	use any machines (Note: No longer working)
-pe low-core 8	low priority	only use core machines (Note: No longer working)
-pe threaded 8	low priority	use multiple slots on one machine
-l highprio -pe high 8	high priority	use any high-priority slots