May the source be with you, but remember the KISS principle ;-)
Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and  bastardization of classic Unix

Torque PBS

News OpenPBS, PBSpro and Torque Recommended Links PBS Professional Installation of open source version of PBSpro PBS User Guide Starting and stopping PBSpro
Torque Torque installation from EPEL RPMs Maui Scheduler     Humor Etc
The Terascale Open-source Resource and QUEue Manager (TORQUE)[4] is a derivative of PBS scheduler.

Torque PBS is an open source version of Portable Batch System, and it controls the running of jobs on a HPC cluster. PBS queues, starts, controls, and stops jobs. It also has utilities to report on the status of jobs and nodes.

A TORQUE cluster consists of one head node and many compute nodes. The head node runs the pbs_server daemon and the compute nodes run the pbs_mom daemon. Client commands for submitting and managing jobs can be installed on any host (including hosts not running pbs_server or pbs_mom).

The head node also runs a scheduler daemon. The scheduler interacts with pbs_server to make local policy decisions for resource usage and allocate nodes to jobs. A simple FIFO scheduler, and code to construct more advanced schedulers, is provided in the TORQUE source distribution. Most TORQUE users choose to use a packaged, advanced scheduler such as Maui or Moab.

Users submit jobs to pbs_server using the qsub command. When pbs_server receives a new job, it informs the scheduler. When the scheduler finds nodes for the job, it sends instructions to run the job with the node list to pbs_server. Then, pbs_server sends the new job to the first node in the node list and instructs it to launch the job. This node is designated the execution host and is called Mother Superior. Other nodes in a job are called sister moms.

TORQUE can integrate with the non-commercial Maui Cluster Scheduler or the commercial Moab Workload Manager to improve overall utilization, scheduling and administration on a cluster.

Developer(s) Adaptive Computing
Initial release 2003 (2003)
Stable release 6.1.0 / 10 November 2016; 6 months ago (2016-11-10)
Development status Active
Written in ANSI C
Operating system Unix-like
Platform Cross-platform
Size 5MB
Available in English
Type distributed resource manager
License OpenPBS version 2.3[1][2] (non-free in DFSG[3]), or TORQUE v2.5+ Software License v1.1[citation needed]
Website TORQUE website

The TORQUE community has extended the original PBS to extend scalability, fault tolerance, and functionality. Contributors include NCSA, OSC, USC, the US DOE, Sandia, PNNL, UB, TeraGrid, and other HPC organizations. TORQUE is described by its developers as open-source software,[2] using the OpenPBS version 2.3 license[1] 

TORQUE provides enhancements over standard OpenPBS in the following areas:

TORQUE provides control over batch jobs and distributed computing resources. It is an  open-source product based on the original PBS project. It incorporates significant advances in the areas of scalability, reliability, and functionality and is currently in use at tens of thousands of leading government, academic, and commercial sites throughout the world.

TORQUE may be freely used, modified, and distributed under the constraints of the included license.

In Torque a scheduler is decoupled from the queue, job and process management. That means that one can run different schedulers.

The two no cost schedulers provided with Torque are

There is also MOAB, a commercially available scheduler for torque.

Users submit jobs to pbs_server using the qsub command. When pbs_server receives a new job, it informs the scheduler. When the scheduler finds nodes for the job, it sends instructions to run the job with the node list to pbs_server. Then, pbs_server sends the new job to the first node in the node list and instructs it to launch the job. This node is designated the execution host and is called Mother Superior. Other nodes in a job are called sister moms.

Top Visited
Past week
Past month


Old News ;-)

[Jun 16, 2017] Tutorial - Submitting a job using qsub by Sreedhar Manchu

Notable quotes:
"... (the path to your home directory) ..."
"... (which language you are using) ..."
"... (the name that you logged in with) ..."
"... (standard path to excecutables) ..."
"... (location of the users mail file) ..."
"... (command shell, i.e bash,sh,zsh,csh, ect.) ..."
"... (the name of the host upon which the qsub command is running) ..."
"... (the hostname of the pbs_server which qsub submits the job to) ..."
"... (the name of the original queue to which the job was submitted) ..."
"... (the absolute path of the current working directory of the qsub command) ..."
"... (each member of a job array is assigned a unique identifier) ..."
"... (set to PBS_BATCH to indicate the job is a batch job, or to PBS_INTERACTIVE to indicate the job is a PBS interactive job) ..."
"... (the job identifier assigned to the job by the batch system) ..."
"... (the job name supplied by the user) ..."
"... (the name of the file contain the list of nodes assigned to the job) ..."
"... (the name of the queue from which the job was executed from) ..."
"... (the walltime requested by the user or default walltime allotted by the scheduler) ..."

Last modified by Yanli Zhang on Jul 10, 2012

qsub Tutorial

  1. Synopsis
  2. What is qsub
  3. What does qsub do?
  4. Arguments to control behavior
Synopsis qsub Synopsis ?
qsub [-a date_time] [-A account_string] [-b secs] [-c checkpoint_options] n No checkpointing is to be performed. s Checkpointing is to be performed only when the server executing the job is shutdown . c Checkpointing is to be performed at the default minimum time for the server executing the job. c=minutes Checkpointing is to be performed at an interval of minutes, which is the integer number of minutes of CPU time used by the job. This value must be greater than zero. [-C directive_prefix] [-d path] [-D path] [-e path] [-f] [-h] [-I ] [-j join ] [-k keep ] [-l resource_list ] [-m mail_options] [-N name] [-o path] [-p priority] [-P user[:group]] [-q destination] [-r c] [-S path_list] [-t array_request] [-u user_list] [- v variable_list] [-V ] [-W additional_attributes] [-X] [-z] [script]

For detailed information, see this page .

What is qsub?

qsub is the command used for job submission to the cluster. It takes several command line arguments and can also use special directives found in the submission scripts or command file. Several of the most widely used arguments are described in detail below.

Useful Information

For more information on qsub do More information on qsub ?
$ man qsub

What does qsub do?


All of our clusters have a batch server referred to as the cluster management server running on the headnode. This batch server monitors the status of the cluster and controls/monitors the various queues and job lists. Tied into the batch server, a scheduler makes decisions about how a job should be run and its placement in the queue. qsub interfaces into the the batch server and lets it know that there is another job that has requested resources on the cluster. Once a job has been received by the batch server, the scheduler decides the placement and notifies the batch server which in turn notifies qsub (Torque/PBS) whether the job can be run or not. The current status (whether the job was successfully scheduled or not) is then returned to the user. You may use a command file or STDIN as input for qsub.

Environment variables in qsub

The qsub command will pass certain environment variables in the Variable_List attribute of the job. These variables will be available to the job. The value for the following variables will be taken from the environment of the qsub command:

These values will be assigned to a new name which is the current name prefixed with the string "PBS_O_". For example, the job will have access to an environment variable named PBS_O_HOME which have the value of the variable HOME in the qsub command environment.

In addition to these standard environment variables, there are additional environment variables available to the job.

Arguments to control behavior

As stated before there are several arguments that you can use to get your jobs to behave a specific way. This is not an exhaustive list, but some of the most widely used and many that you will will probably need to accomplish specific tasks.

Declare the date/time a job becomes eligible for execution

To set the date/time which a job becomes eligible to run, use the -a argument. The date/time format is [[[[CC]YY]MM]DD]hhmm[.SS]. If -a is not specified qsub assumes that the job should be run immediately.


To test -a get the current date from the command line and add a couple of minutes to it. It was 10:45 when I checked. Add hhmm to -a and submit a command from STDIN.

Example: Set the date/time which a job becomes eligible to run ?
$ echo "sleep 30" | qsub -a 1047

Handy Hint

This option can be added to pbs script with a PBS directive such as Equivalent PBS Directive ?
#PBS -a 1047
Defining the working directory path to be used for the job

To define the working directory path to be used for the job -d option can be used. If it is not specified, the default working directory is the home directory.

Example: Define the working directory path to be used for the job ?
$ pwd /home/manchu $ cat dflag.pbs echo "Working directory is $PWD" $ qsub dflag.pbs 5596682.hpc0. local $ cat dflag.pbs.o5596682 Working directory is /home/manchu $ mv dflag.pbs random_pbs/ $ qsub -d /home/manchu/random_pbs/ /home/manchu/random_pbs/dflag.pbs 5596703.hpc0. local $ cat random_ps/dflag.pbs.o5596703 Working directory is /home/manchu/random_pbs $ qsub /home/manchu/random_pbs/dflag.pbs 5596704.hpc0. local $ cat dflag.pbs.o5596704 Working directory is /home/manchu

Handy Hint

This option can be added to pbs script with a PBS directive such as Equivalent PBS Directive ?
#PBS -d /home/manchu/random_pbs

Manipulate the output files

As a default all jobs will print all stdout (standard output) messages to a file with the name in the format <job_name>.o<job_id> and all stderr (standard error) messages will be sent to a file named <job_name>.e<job_id>. These files will be copied to your working directory as soon as the job starts. To rename the file or specify a different location for the standard output and error files, use the -o for standard output and -e for the standard error file. You can also combine the output using -j.

Create a simple submission file: ?
$ cat sleep .pbs #!/bin/sh for i in {1..60} ; do echo $i sleep 1 done
Create a simple submission file: ?
$ qsub -o sleep .log sleep .pbs

Handy Hint

This option can be added to pbs script with a PBS directive such as Equivalent PBS Directive ?
#PBS -o sleep.log
Submit your job with the standard error file renamed: ?
$ qsub -e sleep .log sleep .pbs

Handy Hint

This option can be added to pbs script with a PBS directive such as Equivalent PBS Directive ?
#PBS -e sleep.log
Combine them using the name sleep.log: ?
$ qsub -o sleep .log -j oe .pbs

Handy Hint

This option can be added to pbs script with a PBS directive such as Equivalent PBS Directive ?
#PBS -o sleep.log #PBS -j oe


The order of two letters next to flag -j is important. It should always start with the letter that's been already defined before, in this case 'o'. Place the joined output in another location other than the working directory: ?
$ qsub -o $HOME/tutorials/logs/sleep.log -j oe sleep .pbs

Mail job status at the start and end of a job

The mailing options are set using the -m and -M arguments. The -m argument sets the conditions under which the batch server will send a mail message about the job and -M will define the users that emails will be sent to (multiple users can be specified in a list seperated by commas). The conditions for the -m argument include:

Using the sleep.pbs script created earlier, submit a job that emails you for all conditions: ? $ qsub -m abe -M [email protected] sleep .pbs

Handy Hint

This option can be added to pbs script with a PBS directive such as Equivalent PBS Directive ?
#PBS -m abe #PBs -M [email protected]

Submit a job to a specific queue

You can select a queue based on walltime needed for your job. Use the 'qstat -q' command to see the maximum job times for each queue.

Submit a job to the bigmem queue: ?
$ qsub -q bigmem sleep .pbs

Handy Hint

This option can be added to pbs script with a PBS directive such as Equivalent PBS Directive ?
#PBS -q bigmem

Submitting a job that is dependent on the output of another

Often you will have jobs that will be dependent on another for output in order to run. To add a dependency, we will need to use the -W (additional attributes) with the depend option. We will be using the afterok rule, but there are several other rules that may be useful. (man qsub)


To illustrate the ability to hold execution of a specific job until another has completed, we will write two submission scripts. The first will create a list of random numbers. The second will sort those numbers. Since the second script will depend on the list that is created we will need to hold execution until the first has finished.

random.pbs ?
$ cat random.pbs #!/bin/sh cd $HOME sleep 120 for i in {1..100}; do echo $RANDOM >> rand.list done
sort.pbs ?
$ cat sort .pbs #!/bin/sh cd $HOME sort -n rand.list > sorted.list sleep 30

Once the file are created, lets see what happens when they are submitted at the same time:

Submit at the same time ?
$ qsub random.pbs ; qsub sort .pbs 5594670.hpc0. local 5594671.hpc0. local $ ls random.pbs sorted.list sort .pbs sort .pbs.e5594671 sort .pbs.o5594671 $ cat sort .pbs.e5594671 sort : open failed: rand.list: No such file or directory

Since they both ran at the same time, the sort script failed because the file rand.list had not been created yet. Now submit them with the dependencies added.

Submit them with the dependencies added ?
$ qsub random.pbs 5594674.hpc0. local $ qsub -W depend=afterok:5594674.hpc0. local sort .pbs 5594675.hpc0. local $ qstat -u $USER hpc0. local : Req 'd Req' d Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time -------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - ----- 5594674.hpc0.loc manchu ser2 random.pbs 18029 1 1 -- 48:00 R 00:00 5594675.hpc0.loc manchu ser2 sort .pbs 1 1 -- 48:00 H --

We now see that the sort.pbs job is in a hold state. And once the dependent job completes the sort job runs and we see:

Job status with the dependencies added ?
$ qstat -u $USER hpc0. local : Req 'd Req' d Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time -------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - ----- 5594675.hpc0.loc manchu ser2 sort .pbs 18165 1 1 -- 48:00 R --

Useful Information

Submitting multiple jobs in a loop that depend on output of another job

This example show how to submit multiple jobs in a loop where each job depends on output of job submitted before it.


Let's say we need to write numbers from 0 to 999999 in order onto a file output.txt. We can do 10 separate runs to achieve this, where each run has a separate pbs script writing 100,000 numbers to output file. Let's see what happens if we submit all 10 jobs at the same time.

The script below creates required pbs scripts for all the runs.

Create PBS Scripts for all the runs ?
$ cat #!/bin/bash for i in {0..9} do cat > pbs.script.$i << EOF #!/bin/bash #PBS -l nodes=1:ppn=1,walltime=600 cd \$PBS_O_WORKDIR for ((i=$((i*100000)); i<$(((i+1)*100000)); i++)) { echo "\$i" >> output.txt } exit 0; EOF done
Change permission to make it an executable ?
$ chmod u+x
Run the Script ?
$ ./
List of Created PBS Scripts ?
$ ls -l pbs.script.* -rw-r--r-- 1 manchu wheel 134 Oct 27 16:32 pbs.script.0 -rw-r--r-- 1 manchu wheel 139 Oct 27 16:32 pbs.script.1 -rw-r--r-- 1 manchu wheel 139 Oct 27 16:32 pbs.script.2 -rw-r--r-- 1 manchu wheel 139 Oct 27 16:32 pbs.script.3 -rw-r--r-- 1 manchu wheel 139 Oct 27 16:32 pbs.script.4 -rw-r--r-- 1 manchu wheel 139 Oct 27 16:32 pbs.script.5 -rw-r--r-- 1 manchu wheel 139 Oct 27 16:32 pbs.script.6 -rw-r--r-- 1 manchu wheel 139 Oct 27 16:32 pbs.script.7 -rw-r--r-- 1 manchu wheel 139 Oct 27 16:32 pbs.script.8 -rw-r--r-- 1 manchu wheel 140 Oct 27 16:32 pbs.script.9
PBS Script ?
$ cat pbs.script.0 #!/bin/bash #PBS -l nodes=1:ppn=1,walltime=600 cd $PBS_O_WORKDIR for ((i=0; i<100000; i++)) { echo "$i" >> output.txt } exit 0;
Submit Multiple Jobs at a Time ?
$ for i in {0..9}; do qsub pbs.script.$i ; done 5633531.hpc0. local 5633532.hpc0. local 5633533.hpc0. local 5633534.hpc0. local 5633535.hpc0. local 5633536.hpc0. local 5633537.hpc0. local 5633538.hpc0. local 5633539.hpc0. local 5633540.hpc0. local $
output.txt ?
$ tail output.txt 699990 699991 699992 699993 699994 699995 699996 699997 699998 699999 - bash -3.1$ grep -n 999999 $_ 210510:999999 $

This clearly shows the nubmers are in no order like we wanted. This is because all the runs wrote to the same file at the same time, which is not what we wanted.

Let's submit jobs using qsub dependency feature. This can be achieved with a simple script shown below.

Simple Script to Submit Multiple Dependent Jobs ?
$ cat dependency.pbs #!/bin/bash job=`qsub pbs.script.0` for i in {1..9} do job_next=`qsub -W depend=afterok:$job pbs.script.$i` job=$job_next done
Let's make it an executable ?
$ chmod u+x dependency.pbs
Submit dependent jobs by running the script ?
$ ./dependency.pbs $ qstat -u manchu hpc0. local : Req 'd Req' d Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time -------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - ----- 5633541.hpc0.loc manchu ser2 pbs.script.0 28646 1 1 -- 00:10 R -- 5633542.hpc0.loc manchu ser2 pbs.script.1 -- 1 1 -- 00:10 H -- 5633543.hpc0.loc manchu ser2 pbs.script.2 -- 1 1 -- 00:10 H -- 5633544.hpc0.loc manchu ser2 pbs.script.3 -- 1 1 -- 00:10 H -- 5633545.hpc0.loc manchu ser2 pbs.script.4 -- 1 1 -- 00:10 H -- 5633546.hpc0.loc manchu ser2 pbs.script.5 -- 1 1 -- 00:10 H -- 5633547.hpc0.loc manchu ser2 pbs.script.6 -- 1 1 -- 00:10 H -- 5633548.hpc0.loc manchu ser2 pbs.script.7 -- 1 1 -- 00:10 H -- 5633549.hpc0.loc manchu ser2 pbs.script.8 -- 1 1 -- 00:10 H -- 5633550.hpc0.loc manchu ser2 pbs.script.9 -- 1 1 -- 00:10 H -- $
Output after first run ?
$ tail output.txt 99990 99991 99992 99993 99994 99995 99996 99997 99998 99999 $
Output after final run ?
$ tail output.txt 999990 999991 999992 999993 999994 999995 999996 999997 999998 999999 $ grep -n 100000 output.txt 100001:100000 $ grep -n 999999 output.txt 1000000:999999 $

This shows that numbers are written in order to output.txt. Which in turn shows that jobs ran one after successful completion of another.

Opening an interactive shell to the compute node

To open an interactive shell to a compute node, use the -I argument. This is often used in conjunction with the -X (X11 Forwarding) and the -V (pass all of the users environment)

Open an interactive shell to a compute node ?
$ qsub -I

Passing an environment variable to your job

You can pass user defined environment variables to a job by using the -v argument.


To test this we will use a simple script that prints out an environment variable.

Passing an environment variable ?
$ cat variable.pbs #!/bin/sh if [ "x" == "x$MYVAR" ] ; then echo "Variable is not set" else echo "Variable says: $MYVAR" fi

Next use qsub without the -v and check your standard out file

qsub without -v ?
$ qsub variable.pbs 5596675.hpc0. local $ cat variable.pbs.o5596675 Variable is not set

Then use the -v to set the variable

qsub with -v ?
$ qsub - v MYVAR= "hello" variable.pbs 5596676.hpc0. local $ cat variable.pbs.o5596676 Variable says: hello

Handy Hint

This option can be added to pbs script with a PBS directive such as Equivalent PBS Directive ?
#PBS -v MYVAR="hello"

Useful Information

Multiple user defined environment variables can be passed to a job at a time. Passing Multiple Variables ?
$ cat variable.pbs #!/bin/sh echo "$VAR1 $VAR2 $VAR3" > output.txt $ $ qsub - v VAR1= "hello" ,VAR2= "Sreedhar" ,VAR3= "How are you?" variable.pbs 5627200.hpc0. local $ cat output.txt hello Sreedhar How are you? $

Passing your environment to your job

You may declare that all of your environment variables are passed to the job by using the -V argument in qsub.


Use qsub to perform an interactive login to one of the nodes:

Passing your environment: qsub with -V ?
$ qsub -I -V

Handy Hint

This option can be added to pbs script with a PBS directive such as Equivalent PBS Directive ?

Once the shell is opened, use the env command to see that your environment was passed to the job correctly. You should still have access to all your modules that you loaded previously.

Submitting an array job: Managing groups of jobs .hostname would have PBS_ARRAYID set to 0. This will allow you to create job arrays where each job in the array will perform slightly different actions based on the value of this variable, such as performing the same tasks on different input files. One other difference in the environment between jobs in the same array is the value of the PBS_JOBNAME variable.


First we need to create data to be read. Note that in a real application, this could be data, configuration setting or anything that your program needs to run.

Create Input Data

To create input data, run this simple one-liner:

Creating input data ?
$ for i in {0..4}; do echo "Input data file for an array $i" > input.$i ; done $ ls input.* input.0 input.1 input.2 input.3 input.4 $ cat input.0 Input data file for an array 0

Submission Script
Submission Script: array.pbs ?
$ cat array.pbs #!/bin/sh #PBS -l nodes=1:ppn=1,walltime=5:00 #PBS -N arraytest cd ${PBS_O_WORKDIR} # Take me to the directory where I launched qsub # This part of the script handles the data. In a real world situation you will probably # be using an existing application. cat input.${PBS_ARRAYID} > output.${PBS_ARRAYID} echo "Job Name is ${PBS_JOBNAME}" >> output.${PBS_ARRAYID} sleep 30 exit 0;

Submit & Monitor

Instead of running five qsub commands, we can simply enter:

Submitting and Monitoring Array of Jobs ?
$ qsub -t 0-4 array.pbs 5534017[].hpc0. local

qstat ?
$ qstat -u $USER hpc0. local : Req 'd Req' d Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time -------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - ----- 5534017[].hpc0.l sm4082 ser2 arraytest 1 1 -- 00:05 R -- $ qstat -t -u $USER hpc0. local : Req 'd Req' d Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time -------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - ----- 5534017[0].hpc0. sm4082 ser2 arraytest-0 12017 1 1 -- 00:05 R -- 5534017[1].hpc0. sm4082 ser2 arraytest-1 12050 1 1 -- 00:05 R -- 5534017[2].hpc0. sm4082 ser2 arraytest-2 12084 1 1 -- 00:05 R -- 5534017[3].hpc0. sm4082 ser2 arraytest-3 12117 1 1 -- 00:05 R -- 5534017[4].hpc0. sm4082 ser2 arraytest-4 12150 1 1 -- 00:05 R -- $ ls output.* output.0 output.1 output.2 output.3 output.4 $ cat output.0 Input data file for an array 0 Job Name is arraytest-0


pbstop by default doesn't show all the jobs in the array. Instead, it shows a single job in just one line in the job information. Pressing 'A' shows all the jobs in the array. Same can be achieved by giving the command line option '-A'. This option along with '-u <NetID>' shows all of your jobs including array as well as normal jobs.

pbstop ?
$ pbstop -A -u $USER


Typing 'A' expands/collapses array job representation.

Comma delimited lists

The -t option of qsub also accepts comma delimited lists of job IDs so you are free to choose how to index the members of your job array. For example:

Comma delimited lists ?
$ rm output.* $ qsub -t 2,5,7-9 array.pbs 5534018[].hpc0. local $ qstat -u $USER hpc0. local : Req 'd Req' d Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time -------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - ----- 5534018[].hpc0.l sm4082 ser2 arraytest 1 1 -- 00:05 Q -- $ qstat -t -u $USER hpc0. local : Req 'd Req' d Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time -------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - ----- 5534018[2].hpc0. sm4082 ser2 arraytest-2 12319 1 1 -- 00:05 R -- 5534018[5].hpc0. sm4082 ser2 arraytest-5 12353 1 1 -- 00:05 R -- 5534018[7].hpc0. sm4082 ser2 arraytest-7 12386 1 1 -- 00:05 R -- 5534018[8].hpc0. sm4082 ser2 arraytest-8 12419 1 1 -- 00:05 R -- 5534018[9].hpc0. sm4082 ser2 arraytest-9 12452 1 1 -- 00:05 R -- $ ls output.* output.2 output.5 output.7 output.8 output.9 $ cat output.2 Input data file for an array 2 Job Name is arraytest-2

A more general for loop - Arrays with step size

By default, PBS doesn't allow array jobs with step size. qsub -t 0-10 <pbs.script> increments PBS_ARRAYID in 1. To submit jobs in steps of a certain size, let's say step size of 3 starting at 0 and ending at 10, one has to do

qsub -t 0,3,6,9 <pbs.script>

To make it easy for users we have put a wrapper which takes starting point, ending point and step size as arguments for -t flag. This avoids default necessity that PBS_ARRAYID increment be 1. The above request can be accomplished with (which happens behind the scenes with the help of wrapper)

qsub -t 0-10:3 <pbs.script>

Here, 0 is the starting point, 10 is the ending point and 3 is the step size. It is not necessary that starting point must be 0. It can be any number. Incidentally, in a situation in which the upper-bound is not equal to the lower-bound plus an integer-multiple of the increment, for example

qsub -t 0-10:3 <pbs.script>

wrapper automatically changes the upper bound as shown in the example below.

Arrays with step size ?
[sm4082@login-0-0 ~]$ qsub -t 0-10:3 array.pbs 6390152[].hpc0. local [sm4082@login-0-0 ~]$ qstat -u $USER hpc0. local : Req 'd Req' d Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time -------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - ----- 6390152[].hpc0.l sm4082 ser2 arraytest -- 1 1 -- 00:05 Q -- [sm4082@login-0-0 ~]$ qstat -t -u $USER hpc0. local : Req 'd Req' d Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time -------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - ----- 6390152[0].hpc0. sm4082 ser2 arraytest-0 25585 1 1 -- 00:05 R -- 6390152[3].hpc0. sm4082 ser2 arraytest-3 28227 1 1 -- 00:05 R -- 6390152[6].hpc0. sm4082 ser2 arraytest-6 8515 1 1 -- 00:05 R 00:00 6390152[9].hpc0. sm4082 ser2 arraytest-9 505 1 1 -- 00:05 R -- [sm4082@login-0-0 ~]$ ls output.* output.0 output.3 output.6 output.9 [sm4082@login-0-0 ~]$ cat output.9 Input data file for an array 9 Job Name is arraytest-9 [sm4082@login-0-0 ~]$


By default, PBS doesn't support arrays with step size. On our clusters, it's been achieved with a wrapper. This option might not be there on clusters at other organizations/schools that use PBS/Torque.


If you're trying to submit jobs through ssh to login nodes from your pbs scripts with statement such as ?
ssh login-0-0 "cd ${PBS_O_WORKDIR};`which qsub` -t 0-10:3 <pbs.script>"

arrays with step size wouldn't work unless you either add

shopt -s expand_aliases

to your pbs script that's in bash or add this to your .bashrc in your home directory. Adding this makes alias for qsub come into effect there by making wrapper act on command line options to qsub (For that matter this brings any alias to effect for commands executed via SSH).

If you have

#PBS -t 0-10:3

in your pbs script you don't need to add this either to your pbs script or to your .bashrc in your home directory.

A List of Input Files/Pulling data from the ith line of a file

Suppose we have a list of 1000 input files, rather than input files explicitly indexed by suffix, in a file file_list.text one per line:

A List of Input Files/Pulling data from the ith line of a file ?
[sm4082@login-0-2 ~]$ cat array.list #!/bin/bash #PBS -S /bin/bash #PBS -l nodes=1:ppn=1,walltime=1:00:00 INPUT_FILE=` awk "NR==$PBS_ARRAYID" file_list.text` # # ...or use sed: # sed -n -e "${PBS_ARRAYID}p" file_list.text # # ...or use head/tail # $(cat file_list.text | head -n $PBS_ARRAYID | tail -n 1) ./executable < $INPUT_FILE

In this example, the '-n' option suppresses all output except that which is explicitly printed (on the line equal to PBS_ARRAYID).

qsub -t 1-1000 array.list

Let's say you have a list of 1000 numbers in a file, one number per line. For example, the numbers could be random number seeds for a simulation. For each task in an array job, you want to get the ith line from the file, where i equals PBS_ARRAYID, and use that value as the seed. This is accomplished by using the Unix head and tail commands or awk or sed just like above.

A List of Input Files/Pulling data from the ith line of a file ?
[sm4082@login-0-2 ~]$ cat array.seed #!/bin/bash #PBS -S /bin/bash #PBS -l nodes=1:ppn=1,walltime=1:00:00 SEEDFILE=~/data/seeds SEED=$( cat $SEEDFILE | head -n $PBS_ARRAYID | tail -n 1) ~/programs/executable $SEED > ~/results/output.$PBS_ARRAYID
qsub -t 1-1000 array.seed 

You can use this trick for all sorts of things. For example, if your jobs all use the same program, but with very different command-line options, you can list all the options in the file, one set per line, and the exercise is basically the same as the above, and you only have two files to handle (or 3, if you have a perl script generate the file of command-lines).

Delete all jobs in array

We can delete all the jobs in array with a single command.

Deleting array of jobs ?
$ qsub -t 2-5 array.pbs 5534020[].hpc0. local $ qstat -u $USER hpc0. local : Req 'd Req' d Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time -------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - ----- 5534020[].hpc0.l sm4082 ser2 arraytest 1 1 -- 00:05 R -- $ qdel 5534020[] $ qstat -u $USER $

Delete a single job in array

Delete a single job in array, e.g. number 4,5 and 7

Deleting a single job in array ?

$ qsub -t 0-8 array.pbs 5534021[].hpc0. local $ qstat -u $USER hpc0. local : Req 'd Req' d Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time ----------- -- ---- ---------- ---- ---- -- ----- --- - --- 5534021[].hpc0.l sm4082 ser2 arraytest 1 1 -- 00:05 Q -- $ qstat -t -u $USER hpc0. local : Req 'd Req' d Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time -------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - ----- 5534021[0].hpc0. sm4082 ser2 arraytest-0 26618 1 1 -- 00:05 R -- 5534021[1].hpc0. sm4082 ser2 arraytest-1 14271 1 1 -- 00:05 R -- 5534021[2].hpc0. sm4082 ser2 arraytest-2 14304 1 1 -- 00:05 R -- 5534021[3].hpc0. sm4082 ser2 arraytest-3 14721 1 1 -- 00:05 R -- 5534021[4].hpc0. sm4082 ser2 arraytest-4 14754 1 1 -- 00:05 R -- 5534021[5].hpc0. sm4082 ser2 arraytest-5 14787 1 1 -- 00:05 R -- 5534021[6].hpc0. sm4082 ser2 arraytest-6 10711 1 1 -- 00:05 R -- 5534021[7].hpc0. sm4082 ser2 arraytest-7 10744 1 1 -- 00:05 R -- 5534021[8].hpc0. sm4082 ser2 arraytest-8 9711 1 1 -- 00:05 R -- $ qdel 5534021[4] $ qdel 5534021[5] $ qdel 5534021[7] $ qstat -t -u $USER hpc0. local : Req 'd Req' d Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time -------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - ----- 5534021[0].hpc0. sm4082 ser2 arraytest-0 26618 1 1 -- 00:05 R -- 5534021[1].hpc0. sm4082 ser2 arraytest-1 14271 1 1 -- 00:05 R -- 5534021[2].hpc0. sm4082 ser2 arraytest-2 14304 1 1 -- 00:05 R -- 5534021[3].hpc0. sm4082 ser2 arraytest-3 14721 1 1 -- 00:05 R -- 5534021[6].hpc0. sm4082 ser2 arraytest-6 10711 1 1 -- 00:05 R -- 5534021[8].hpc0. sm4082 ser2 arraytest-8 9711 1 1 -- 00:05 R -- $ qstat -t -u $USER $

The Torque job scheduler

Using the Torque job scheduler

To run an application, the user launches a job on one of the ARC systems. A job contains both the details of the processing to carry out (name and version of the application, input and output, etc.) and directives for the computer resources needed (number of cpus, amount of memory).

Jobs are run as batch jobs, i.e. in an unattended manner. Typically, a user logs in on one of the ARC systems, sends a job to the execution queue and often logs out.

Jobs are managed by a job scheduler, a piece of software which is in charge of

Running a job involves at the minimum the following steps

The ARC systems use a job scheduler called Torque Resource Manager, an advanced open-source product based on the original PBS project. Torque integrates with the Maui Workload Manager in order to improve overall utilisation, scheduling and administration on a cluster.

This guide describes basic job submission and monitoring for Torque:

In addition, some more advanced topics are covered:

Preparing a submission script

A submission script is a shell script that

Suppose we want to run a molecular dynamics MPI application called foo with the following requirements

Assuming the number of cores available on each cluster node is 16, so a total of 2 nodes are required to map one MPI process to a physical core. Supposing no input needs to be specified, the following submission script runs the application in a single job


# set the number of nodes and processes per node
#PBS -l nodes=2:ppn=16

# set max wallclock time
#PBS -l walltime=100:00:00

# set name of job
#PBS -N protein123

# mail alert at start, end and abortion of execution
#PBS -m bea

# send mail to this address #PBS -M [email protected] # use submission environment #PBS -V # start job from the directory it was submitted cd $PBS_O_WORKDIR # define MPI host details (ARC specific script) . # run through the mpirun launcher mpirun $MPI_HOSTS foo

The script starts with #!/bin/bash (also called a shebang), which makes the submission script also a Linux bash script.

The script continues with a series of lines starting with #. For bash scripts these are all comments and are ignored. For Torque, the lines starting with #PBS are directives requesting job scheduling resources. (NB: it's very important that you put all the directives at the top of a script, before any other commands; any #PBS directive coming after a bash script command is ignored!)

The final part of a script is normal Linux bash scripting and describes the set of operations to follow as part of the job. In this case, this involves running the MPI-based application foo through the MPI utility mpirun.

The resource request #PBS -l nodes=n:ppn=m is the most important of the directives in a submission script. The first part (nodes=n) is imperative and is determines how many compute nodes a job is allocated by the scheduler. The second part (ppn=m) is used by the scheduler to prepare the environment for a MPI parallel run with m processes per each compute nodes (e.g. writing a hostifile for the job, pointed to by $PBS_NODEFILE). However, it is up to the user and the submission script to use the environment generated from ppn adequately. In the example above, this is done first sourcing (which uses $PBS_NODEFILE to prepare the variable $MPI_HOSTS) and then running the application through mpirun.

A note of caution is on threaded single process applications (e.g. Gaussian and Matlab). They cannot run on more than a single compute node; allocating more (e.g. #PBS -l nodes=2) will end up with the first node being allocated and the rest idle. Moreover, since there is no automatic effect on runs from using ppn, the only relevant resource scheduling request in the case of single process applications remains #PBS -l nodes=1. This gives a job user-exclusive access to a single compute node, allowing the application to use all available cores and physical memory on the node; these vary from system to system, see the Table below.

Arcus 16 cores 64GB
Caribou 16 cores 128GB

Examples of Torque submission scripts are given here for some of the more popular applications.

PBS job submission directives

Directives are job specific requirements given to the job scheduler.

The most important directives are those that request resources. The most common are the wallclock time limit (the maximum time the job is allowed to run) and the number of processors required to run the job. For example, to run an MPI job with 16 processes for up to 100 hours on a cluster with 8 cores per compute node, the PBS directives are

#PBS -l walltime=100:00:00
#PBS -l nodes=2:ppn=16

A job submitted with these requests runs for 100 hours at most; after this limit expires, the job is terminated regardless of whether the processing finished or not. Normally, the wallclock time should be conservative, allowing the job to finish normally (and terminate) before the limit is reached.

Also, the job is allocated two compute nodes (nodes=2) and each node is scheduled to run 16 MPI processes (ppn=8). (ppn is an abbreviation of Processes Per Node.) It is the task of the user to instruct mpirun to use this allocation appropriately, i.e. to start 32 processes which are mapped to the 32 cores available for the job. More information on how to run MPI application can be found in this guide.

Submitting jobs with the command qsub

Supposing you already have a submission script ready (call it, the job is submitted to the execution queue with the command qsub The queueing system prints a number (the job id) almost immediately and returns control to the linux prompt. At this point the job is already in the submission queue.

Once you have submitted the job it will sit in a pending queue for some time (how long depends on the demands of your job and the demand on the service). You can monitor the progress of the job using the command qstat.

Once the job is run you will see files with names like "job.e1234" and "job.o1234", either in your home directory or in the directory you submitted the job from (depending on how your job submission script is written). The ".e" files contain error messages. The ".o" files contain "standard output" which is essentially what the application you ran would normally have printed onto the screen. The ".e" file contains the possible error messages issued by the application; on a correct execution without errors, this file can be empty.

Read all the options for qsub on the Linux manual using the command man qsub.

Monitoring jobs with the command qstat

qstat is the main command for monitoring the state of systems, groups of jobs or individual jobs. The simple qstat command gives a list of jobs which looks something like this:

Job id            Name             User              Time Use S Queue
----------------  ---------------- ----------------  -------- - -----
1121.headnode1    jobName1         bob               15:45:05 R priorityq       
1152.headnode1    jobName2         mary              12:40:56 R workq       
1226.headnode1    jobName3         steve                    0 Q workq

The first column gives the job ID, the second the name of the job (specified by the user in the submission script) and the third the owner of the job. The fourth column gives the elapsed time for each particular job. The fifth column is the status of the job (R=running, Q=waiting, E=exiting, H=held, S=suspended). The last column is the queue for the job (a job scheduler can manage different queues serving different purposes).

Some other useful qstat features include:

Read all the options for qstat on the Linux manual using the command man qstat.

Deleting jobs with the command qdel

Use the qdel command to delete a job, e.g. qdel 1121 to delete job with id 1121. A user can delete own jobs at any time, whether the job is pending (waiting in the queue) or running. A user cannot delete the jobs of another user. Normally, there is a (small) delay between the execution of the qdel command and the time when the job is dequeued and killed. Occasionally a job may not delete properly, in which case, the ARC support team can delete it.

Environment variables

At the time a job is launched into execution, Torque defines multiple environment variables, which can be used from within the submission script to define the correct workflow of the job. The most useful of these environment variables are the following:

PBS_O_WORKDIR is typically used at the beginning of a script to go to the directory where the qsub command was issued, which is frequently also the directory containing the input data for the job, etc. The typical use is


inside a submission script.

PBS_NODEFILE is typically used to define the environment for the parallel run, for mpirun in particular. Normally, this usage is hidden from users inside a script (e.g., which defines the environment for the user.

PBS_JOBID is useful to tag job specific files and directories, typically output files or run directories. For instance, the submission script line

myApp > $PBS_JOBID.out

runs the application myApp and redirects the standard output to a file whose name is given by the job id. (NB: the job id is a number assigned by Torque and differs from the character string name given to the job in the submission script by the user.)

TMPDIR is the name of a scratch disk directory unique to the job. The scratch disk space typically has faster access than the disk space where the user home and data areas reside and benefits applications that have a sustained and large amount of I/O. Such a job normally involves copying the input files to the scratch space, running the application on scratch and copying the results to the submission directory. This usage is discussed in a separate section.

Array jobs

Arrays are a feature of Torque which allows users to submit a series of jobs using a single submission command and a single submission script. A typical use of this is the need to batch process a large number of very similar jobs, which have similar input and output, e.g. a parameter sweep study.

A job array is a single job with a list of sub-jobs. To submit an array job, use the -t flag to describe a range of sub-job indices. For example

qsub -t 1-100

submits a job array whose sub-jobs are indexed from 1 to 100. Also,

qsub -t 100-200

submits a job array whose sub-jobs are indexed from 100 to 200. Furthermore,

qsub -t 100,200,300

submits a job array whose sub-jobs indices are 100, 200 and 300.

The typical submission script for a job array uses the index of each sub-job to define the task specific for each sub-job, e.g. the name of the input file or of the output directory. The sub-job index is given by the PBS variable PBS_ARRAYID. To illustrate its use, consider the application myApp processes some files named input_*.dat (taken as input), with * ranging from 1 to 100. This processing is described in a single submission script called, which contains the following line

myApp < input_$PBS_ARRAYID.dat > output_$PBS_ARRAYID.dat

A job array is submitted using this script, with the command qsub -t 1-100 When a sub-job is executed, the file names in the line above are expanded using the sub-job index, with the result that each sub-job processes a unique input file and outputs the result to a unique output file.

Once submitted, all the array sub-jobs in the queue can be monitored using the extra -t flag to qstat.

Using scratch disk space

At present, the use of scratch space (pointed to by the variable TMPDIR) does not offer any performance advantages over the disk space pointed to by $DATA. Users are then advised to avoid using the scratch space on the ARC resources. We have plans for infrastructure upgrade, in which performant storage can be used as fast access scratch disk space from within jobs.

Jobs with conditional execution

It is possible to start a job on the condition that another one completes beforehand; this may be necessary for instance if the input to one job is generated by another job. Job dependency is defined using the -W flag.

To illustrate with an example, suppose you need to start a job using the script after another job finished successfully. Assume the first job is started using script and the command to start the first job


returns the job ID 7777. Then, the command to start the second job is

qsub -W depend=after:7777

This job dependency can be further automated (possibly to be included in a bash script) using environment variables:

JOB_ID_2=`qsub -W depend=after:$JOB_ID_1`

Furthermore, the conditional execution above can be changed so that the execution of the second job starts on the condition that the execution of the first was successful. This is achieved replacing after with afterok, e.g.

JOB_ID_2=`qsub -W depend=afterok:$JOB_ID_1`

Conditional submission (as well as conditional submission after successful execution) is also possible with job arrays. This is useful, for example, to submit a "synchronization" job (script after the successful execution of an entire array of jobs (defined by The conditional execution uses afterokarray instead of afterok:

JOB_ARRAY_ID=`qsub -t 2-6`
JOB_SYNC_ID=`qsub -W depend=afterokarray:$JOB_ARRAY_ID`

Recommended Links

Google matched content

Softpanorama Recommended

Top articles


Top articles



Install torque on a single node Centos 6.4 - Discngine

linux - Torque installation on RHEL 6.5 - Stack Overflow

Muddy Bootsnstalling Torque/PBS job scheduler on Ubuntu 14.04 LTS / 16.04 LTS




Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy


War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes


Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law


Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Haterís Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D

Copyright © 1996-2021 by Softpanorama Society. was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to to buy a cup of coffee for authors of this site


The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last modified: March, 12, 2019