Softpanorama May the source be with you, but remember the KISS principle ;-)	Home	Switchboard	Unix Administration	Red Hat	TCP/IP Networks	Neoliberalism	Toxic Managers
	(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and bastardization of classic Unix

OpenPBS, PBSpro and Torque

News	Cluster job schedulers	Recommended Links	PBS Professional	PBS User Guide	Installation of open source version of PBSpro	Torque
Torque installation from EPEL RPMs	Building RPM packages from PBSpro source	Maui Scheduler	Grid engine	OAR	Humor	Etc

Introduction
Installation of PBS
- Installation from source of free version of PBS Pro using the configure script
Using EPEL RPMs to install Torque on a single node cluster in CENTOS of RHEL 6.x (headnode and one or two computational node)

Introduction

PBS is an acronym for Portable Batch System, a cluster scheduler developed initially by NASA in the early to mid-1990s, and still available as open source . Now there are at least three forks of the original codebase"

Open source (both projects are severely underfunded and lack good installation documentation )
- Community version of PBSpro. PBSpro uses PostgreSQL to store nodes info so it is tiled toward large clusters. The open source code compiles OK on CentOS 7.2 and 7.3. There was also a post that one can successfully compile it on CentOS 6.7.
- Torque: the open source successor to OpenPBS. The change of the name is due to trademark issues, so the fork is called Torque. Torque matured into a quality product. You have the option of using the more flexible and open source MAUI scheduler in place of the default FIFO scheduler included with Torque.
  - Broken RPMs exist on EPEL. Use https://kojipkgs.fedoraproject.org//packages/torque/4.2.10/5.el6/ instead.
  - You can compile it from the source code.
Commercial
- PBSpro is the pay-for-core version (around $18 per core per year). It uses PostgreSQL to store nodes info, so it is tiled toward large clusters. No that it work impressively on a really large clusters (say with 512 or more nodes).
- Moab Lite Paying for support is optional on small systems (below 32 sockets, which mean iether 8 two socket node of 16 one socket nodes), but is required on larger systems.

OpenPBS was discontinued, but still works. There is no much sense to use this old codebase unless you want to maintain it by yourself and strive to avoid overcomplexity (even in this case older version of Torque, like 4.2.10-5, might be a better deal) You have two theotetically open source options (due to thecomplexity of the codebase this is not actually true open source):

Community version of PBSpro
Torque.

It might make sense to use them instead, although not necessary the latest versions, as commercial vendor tend to add bell whistles that are useful mainly for very large cluster: their main customers. This additional complexity is useless or harmful for small installation with less then 64 nodes.

PBS and Torque consists of 3 pieces:

pbs_server, runs the show. It instantiates the PBS on the headnode of your cluster and exchanges information with the computational nodes. The job server (pbs_server) handles basic queuing services such as creating and modifying a batch job and placing a job into execution when it’s scheduled to be run.
pbs_sched. a simple FIFO scheduler. It knows the site’s policies and rules about when and where jobs can be run. In Torque installations pbs_sched is often replaced by the (free) and more capable MAUI scheduler. That makes sense to do only if you have a sizable cluster (say 16 nodes or more).
pbs_mom -- daemon that is running of computational nodes (MOM: Machine Oriented Mini-server). This is the the daemon that actually runs jobs.

In Torque there is also:

trqauthd is a daemon used by TORQUE client utilities to authorize user connections to pbs_server. Once started, it remains resident. TORQUE client utilities then communicate with trqauthd on port 15005 on the loopback interface. It is multi-threaded and can handle large volumes of simultaneous requests. This daemon should be started automatically by pbs_mom and pbs_server, but due to a bug it should be started manually as root Bug 1215207 – pbs_server does not start trqauthd

The pbs_server daemon interacts with many pbs_moms (one per compute node) and a single the scheduler (for example, pbs_sched )

Torque allows you to write your own pbs_sched scheduler, or use one of the few that were made.

Installation of PBS

You actually should not care about the version, as for a small installation it does not really matter much. Other things equal it is preferable to build you own version from the source code. Which requires more time and efforts, but in many cases is necessary as RPMs simply do not work.

Also you can use free version of PBSpro on CentOS7 -- source code and packages are downloadable from the vendor site. PBS pro installs OK on RHEL/CentOS 7.7.

Installation from source of free version of PBS Pro using the configure script

Adapted from PostgreSQL Linux downloads (Debian) in June 2017:

Install the prerequisite packages for building PBS Pro. For CentOS systems you should run the following command as root:

yum install -y gcc make rpm-build libtool hwloc-devel \ 
    libX11-devel libXt-devel libedit-devel libical-devel \ 
    ncurses-devel perl postgresql-devel python-devel tcl-devel \ 
   tk-devel swig expat-devel openssl-devel libXext libXft \ 
   autoconf automake

For OpenSUSE systems you should run the following command as root:

zypper install gcc make rpm-build libtool hwloc-devel \ 
   libX11-devel libXt-devel libedit-devel libical-devel \ 
   ncurses-devel perl postgresql-devel python-devel tcl-devel \
   tk-devel swig libexpat-devel libopenssl-devel libXext-devel \ 
   libXft-devel fontconfig autoconf automake

For Debian systems you should run the following command as root:

sudo apt-get install gcc make libtool libhwloc-dev libx11-dev \ 
   libxt-dev libedit-dev libical-dev ncurses-dev perl \ 
   python-dev tcl-dev tk-dev swig \ 
   libexpat-dev libssl-dev libxext-dev libxft-dev autoconf \ 
  automake

sudo apt-get install postgresql-server-dev-all

Note: it is recommended to install PostgreSQL separately on Debian. See instructions at PostgreSQL Linux downloads (Debian)

Install the prerequisite packages for running PBS Pro on nodes
In addition to the commands below, you should also install a text editor of your choosing (vim, Emacs, gedit, etc.).

For CentOS systems you should run the following command as root:
```
yum install -y expat libedit postgresql-server python sendmail sudo tcl tk libical 
```
For OpenSUSE systems you should run the following command as root:
```
zypper install expat libedit postgresql-server python sendmail sudo tcl tk libical1 
```
For Debian systems you should run the following command as root:
```
apt-get install expat libedit2 postgresql python sendmail-bin sudo tcl tk libical1a
```
Open a terminal as a normal (non-root) user, unpack the PBS Pro tarball, and cd to the package directory.
```
tar -xpvf pbspro-14.0.1.tar.gz 
cd pbspro-14.0.1 
```
Generate the configure script and Makefiles. (See note 1 below)
```
./autogen.sh 
```
Display the available build parameters.
```
./configure --help 
```
Configure the build for your environment. You may utilize the 66 parameters displayed in the previous step. (See note 2 below)
For CentOS and Debian systems you should run the following command:
```
./configure --prefix=/opt/pbs 
```
For OpenSUSE systems (see note 3 below) you should run the following command:
```
./configure --prefix=/opt/pbs --libexecdir=/opt/pbs/libexec 
```
Build PBS Pro by running "make". (See note 4 below)
```
make
```
Configure sudo to allow your user account to run commands as root. Refer to the online manual pages for sudo, sudoers, and visudo.
Install PBS Pro. Use sudo to run the command as root.
```
sudo make install 
```
Configure PBS Pro by executing the post-install script.
```
sudo /opt/pbs/libexec/pbs_postinstall 
```
Edit /etc/pbs.conf to configure the PBS Pro services that should be started. If you are installing PBS Pro on only 96 one system, you should change the value of PBS_START_MOM from zero to one.
If you use vi as your editor, you would run:
```
sudo vi /etc/pbs.conf 
```
Some file permissions must be modified to add SUID privilege.

sudo chmod 4755 /opt/pbs/sbin/pbs_iff /opt/pbs/sbin/pbs_rcp
Start the PBS Pro services
```
sudo /etc/init.d/pbs start 
```
All configured PBS services should now be running. Update your PATH and MANPATH variables by sourcing the appropriate PBS Pro profile or logging out and back in.
For Bourne shell (or similar) run the following:
```
. /etc/profile.d/pbs.sh
```

You should now be able to run PBS Pro commands to submit and query jobs. Some examples follow.

bash$ qstat -B 
Server Max Tot Que Run Hld Wat Trn Ext Status 
---- ----- ----- ----- ----- ----- ----- ----- ----- ----------- 
host1 0 0 0 0 0 0 0 0 Active

bash$ pbsnodes -a 
host1 <
Mom = host1  
ntype = PBS  
state = free  
pcpus = 2  
resources_available.arch = linux  
resources_available.host = host1  
resources_available.mem = 2049248kb  
resources_available.ncpus = 2  
resources_available.vnode = host1  
resources_assigned.accelerator_memory = 0kb  
resources_assigned.mem = 0kb  
resources_assigned.naccelerators = 0  
resources_assigned.ncpus = 0  
resources_assigned.vmem = 0kb  
resv_enable = True  
sharing = default_shared  
license = l

Now you can run a test

bash$ echo "sleep 60" | qsub  
0.host1  

bash$ qstat -a  

host1:  
Req'd Req'd Elap  
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time  
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----  
0.host1 mike workq STDIN 2122 1 1 -- -- R 00:00  
--------------------------------------------------------------------

NOTES:

If you modify configure.ac or adjust timestamps on any files that are automatically generated, you will need to regenerate them by re-running autogen.sh.
It is advisable to create a simple shell script that calls configure with the appropriate options for your environment. This ensures configure will be called with the same arguments during subsequent invocations. If you have already run configure you can regenerate all of the Makefiles by running "./config.status". The first few lines of config.status will reveal the options that were specified when configure was run. If you set environment variables such as CFLAGS it is best to do so as an argument to configure (e.g. ./configure CFLAGS="-O0 -g" --prefix=/opt/pbs). This will ensure consistency when config.status regenerates the Makefiles.
The openSUSE rpm package expands %_libexecdir to /opt/pbs/lib rather than /opt/pbs/libexec which causes problems for the post- install scripts. Providing the --libexecdir value to configure overrides this behavior.
You need to use a POSIX (or nearly POSIX) make. GNU make works quite well in this regard; BSD make does not. If you are having any sort of build problems, your make should be a prime suspect. Tremendous effort has been expended to provide proper dependency generation and makefiles without relying on any non-POSIX features. The build should work fine with a simple call to make, however, complicating things by using various make flags is not guaranteed to work. Don't be surprised if the first thing that make does is call configure again.

Using EPEL RPMs to install Torque on a single node cluster in CENTOS of RHEL 6.x (headnode and one or two computational node)

There is a strong initial tendency of CentOS and RHEL users is to rely on RPMs for initial installation of new packages. For small or widely used packages this approach usually works OK. For complex and rarely used packaged you are often to nasty surprises. Typically you face "libraries hell" problem. Also some packagers are pretty perverted and add to the package additional dependencies or configure is is a completely bizarre way. They are typically volunteers and nobody controls what they are doing.

As the result you can spend the amount of time that vastly exceed the amount of time and effort of compiling executables from the source.

There are RPMs fro Torque available from Fedora ELEL repository but those RMS are fools gold: the current version for RHEL 6.x is broken due to SNAFU committed by maintainer.

As usually for semi-open source packages, installation and configuration documentation is almost non existent. This page might slightly compensate for that.

First you need to download the correct version of RPMs for installation of RHEL/CentOs 6.x. The one that works. The version that yum picks up from EPEL repository does not work. The package maintainer recklessly enabled NUMA memory and screwed the application, instead of creating a separate set of packages for NUMA-enabled Torque. NUMA is not needed for typical Intel boxes. So this was a typical package maintainer perversion, for which users paid dearly.

Sh*t happens, especially with complex open source packages, which do not have adequate manpower for development, testing or packaging, but this was a real SNAFU that affected many naive users with real and pretty large clusters:

Gavin Nelson 2016-04-06 13:04:57 EDT
Please remove the NUMA support from this package group, or create an alternate package group.

My cluster has been dead for almost 2 weeks and the scientists are getting cranky. This feature does not play well with the MAUI scheduler and, apparently, not at all with the built-in scheduler (http://www.clusterresources.com/pipermail/torqueusers/2013-September/016136.html). Requiring this feature means having to introduce a whole host of changes to the Torque environment as well as forcing recompile of OpenMPI (last I checked epel version of openmpi does not have Torque support) and MAUI, which then means recompiling all the analysis applications, etc... I've tried...I really have.
I even tried rebuilding the package group from the src rpm, but when I remove the enable-numa switch from the torque.spec file it still builds with numa support (not sure what I'm missing there). ;

You need to download pre-numa version (see Bug 1321154 – numa enabled torque don't work )

nucleo 2016-03-24 15:33:31 EDT
Description of problem:
After updating from torque-4.2.10-5.el6 to torque-4.2.10-9.el6 pbs_mom service don't stat.

Version-Release number of selected component (if applicable):
torque-mom-4.2.10-9.el6.x86_64

Actual results:
pbs_mom.9607;Svr;pbs_mom;LOG_ERROR::No such file or directory (2) in read_layout_file, Unable to 
   read the layout file in /var/lib
/torque/mom_priv/mom.layout
If I create empty file /var/lib/torque/mom_priv/mom.layout then pbs_mom service starts but never connects to to torque server, so node shown as down.
Expected results:

pbs_mom service should start and work correctly after update without creating any additional files such as mom.layout.

Additional info:

After downgrading to torque-4.2.10-5.el6 pbs_mom works fine without mom.layout file.

NUMA support was enabled in 4.2.10-6, so last working version is 4.2.10-5. It can be downloaded from https://kojipkgs.fedoraproject.org//packages/torque/4.2.10/5.el6/

Packages required

Access to EPEL should be configured before installation starts.

Packages/libraries hell is the most distinct feature of all Linux distributions. In this case you may be able get thru and not get burned. You need to install the following packages downloaded from the link https://kojipkgs.fedoraproject.org//packages/torque/4.2.10/5.el6/x86_64/ not from EPEL:

-rw-r--r-- 1 root root   82428 Jun 30  2015 torque-4.2.10-5.el6.x86_64.rpm
-rw-r--r-- 1 root root  332860 Jun 30  2015 torque-client-4.2.10-5.el6.x86_64.rpm
-rw-r--r-- 1 root root 3548332 Jun 30  2015 torque-debuginfo-4.2.10-5.el6.x86_64.rpm
-rw-r--r-- 1 root root  200576 Jun 30  2015 torque-devel-4.2.10-5.el6.x86_64.rpm
-rw-r--r-- 1 root root   34792 Jun 30  2015 torque-drmaa-4.2.10-5.el6.x86_64.rpm
-rw-r--r-- 1 root root   41852 Jun 30  2015 torque-drmaa-devel-4.2.10-5.el6.x86_64.rpm
-rw-r--r-- 1 root root  243432 Jun 30  2015 torque-gui-4.2.10-5.el6.x86_64.rpm
-rw-r--r-- 1 root root  128116 Jun 30  2015 torque-libs-4.2.10-5.el6.x86_64.rpm
-rw-r--r-- 1 root root  252312 Jun 30  2015 torque-mom-4.2.10-5.el6.x86_64.rpm
-rw-r--r-- 1 root root   19832 Jun 30  2015 torque-pam-4.2.10-5.el6.x86_64.rpm
-rw-r--r-- 1 root root   75084 Jun 30  2015 torque-scheduler-4.2.10-5.el6.x86_64.rpm
-rw-r--r-- 1 root root  314052 Jun 30  2015 torque-server-4.2.10-5.el6.x86_64.rpm

Packages that are prerequisite

hwloc-1.5-3.el6_5.x86_64
munge-0.5.10-1.el6.x86_64

RPMs that should be installed on the headnode

torque.x86_64 0:4.2.10-5.el6 
--> Dependency: munge for package: torque-4.2.10-5.el6.x86_64
torque-client.x86_64 0:4.2.10-5.el6 will be installed
torque-debuginfo.x86_64 0:4.2.10-5.el6 will be installed
Package torque-devel.x86_64 0:4.2.10-5.el6 will be installed
Package torque-drmaa.x86_64 0:4.2.10-5.el6 will be installed
Package torque-drmaa-devel.x86_64 0:4.2.10-5.el6 will be installed
Package torque-libs.x86_64 0:4.2.10-5.el6 will be installed
Package torque-mom.x86_64 0:4.2.10-5.el6 will be installed
Package torque-pam.x86_64 0:4.2.10-5.el6 will be installed
Package torque-scheduler.x86_64 0:4.2.10-5.el6 will be installed
Package torque-server.x86_64 0:4.2.10-5.el6 will be installed

If you use more then one node installation of the client is not necessary, but it is useful for testing, as it is easier to make the client work on the same server as the headnode. You can de-install it later.

After you downloaded all the necessary packages into some directory you can install them using the command

yum localinstall *.rpm

Here is how it looks like:

[root@centos x86_64]# yum localinstall *.rpm
Loaded plugins: fastestmirror, refresh-packagekit, security
Setting up Local Package Process
Examining torque-4.2.10-5.el6.x86_64.rpm: torque-4.2.10-5.el6.x86_64
Marking torque-4.2.10-5.el6.x86_64.rpm to be installed
Loading mirror speeds from cached hostfile
 * base: mirrors.centos.webair.com
 * epel: epel.mirror.constant.com
 * extras: mirror.cs.vt.edu
 * updates: mirror.cc.columbia.edu
Examining torque-client-4.2.10-5.el6.x86_64.rpm: torque-client-4.2.10-5.el6.x86_64
Marking torque-client-4.2.10-5.el6.x86_64.rpm to be installed
Examining torque-debuginfo-4.2.10-5.el6.x86_64.rpm: torque-debuginfo-4.2.10-5.el6.x86_64
Marking torque-debuginfo-4.2.10-5.el6.x86_64.rpm to be installed
Examining torque-devel-4.2.10-5.el6.x86_64.rpm: torque-devel-4.2.10-5.el6.x86_64
Marking torque-devel-4.2.10-5.el6.x86_64.rpm to be installed
Examining torque-drmaa-4.2.10-5.el6.x86_64.rpm: torque-drmaa-4.2.10-5.el6.x86_64
Marking torque-drmaa-4.2.10-5.el6.x86_64.rpm to be installed
Examining torque-drmaa-devel-4.2.10-5.el6.x86_64.rpm: torque-drmaa-devel-4.2.10-5.el6.x86_64
Marking torque-drmaa-devel-4.2.10-5.el6.x86_64.rpm to be installed
Examining torque-libs-4.2.10-5.el6.x86_64.rpm: torque-libs-4.2.10-5.el6.x86_64
Marking torque-libs-4.2.10-5.el6.x86_64.rpm to be installed
Examining torque-mom-4.2.10-5.el6.x86_64.rpm: torque-mom-4.2.10-5.el6.x86_64
Marking torque-mom-4.2.10-5.el6.x86_64.rpm to be installed
Examining torque-pam-4.2.10-5.el6.x86_64.rpm: torque-pam-4.2.10-5.el6.x86_64
Marking torque-pam-4.2.10-5.el6.x86_64.rpm to be installed
Examining torque-scheduler-4.2.10-5.el6.x86_64.rpm: torque-scheduler-4.2.10-5.el6.x86_64
Marking torque-scheduler-4.2.10-5.el6.x86_64.rpm to be installed
Examining torque-server-4.2.10-5.el6.x86_64.rpm: torque-server-4.2.10-5.el6.x86_64
Marking torque-server-4.2.10-5.el6.x86_64.rpm to be installed
Resolving Dependencies
--> Running transaction check
---> Package torque.x86_64 0:4.2.10-5.el6 will be installed
--> Processing Dependency: munge for package: torque-4.2.10-5.el6.x86_64
---> Package torque-client.x86_64 0:4.2.10-5.el6 will be installed
---> Package torque-debuginfo.x86_64 0:4.2.10-5.el6 will be installed
---> Package torque-devel.x86_64 0:4.2.10-5.el6 will be installed
---> Package torque-drmaa.x86_64 0:4.2.10-5.el6 will be installed
---> Package torque-drmaa-devel.x86_64 0:4.2.10-5.el6 will be installed
---> Package torque-libs.x86_64 0:4.2.10-5.el6 will be installed
---> Package torque-mom.x86_64 0:4.2.10-5.el6 will be installed
---> Package torque-pam.x86_64 0:4.2.10-5.el6 will be installed
---> Package torque-scheduler.x86_64 0:4.2.10-5.el6 will be installed
---> Package torque-server.x86_64 0:4.2.10-5.el6 will be installed
--> Running transaction check
---> Package munge.x86_64 0:0.5.10-1.el6 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

========================================================================================================================
 Package                    Arch           Version                Repository                                       Size
========================================================================================================================
Installing:
 torque                     x86_64         4.2.10-5.el6           /torque-4.2.10-5.el6.x86_64                     178 k
 torque-client              x86_64         4.2.10-5.el6           /torque-client-4.2.10-5.el6.x86_64              680 k
 torque-debuginfo           x86_64         4.2.10-5.el6           /torque-debuginfo-4.2.10-5.el6.x86_64            16 M
 torque-devel               x86_64         4.2.10-5.el6           /torque-devel-4.2.10-5.el6.x86_64               421 k
 torque-drmaa               x86_64         4.2.10-5.el6           /torque-drmaa-4.2.10-5.el6.x86_64                51 k
 torque-drmaa-devel         x86_64         4.2.10-5.el6           /torque-drmaa-devel-4.2.10-5.el6.x86_64          40 k
 torque-libs                x86_64         4.2.10-5.el6           /torque-libs-4.2.10-5.el6.x86_64                280 k
 torque-mom                 x86_64         4.2.10-5.el6           /torque-mom-4.2.10-5.el6.x86_64                 563 k
 torque-pam                 x86_64         4.2.10-5.el6           /torque-pam-4.2.10-5.el6.x86_64                 8.3 k
 torque-scheduler           x86_64         4.2.10-5.el6           /torque-scheduler-4.2.10-5.el6.x86_64           115 k
 torque-server              x86_64         4.2.10-5.el6           /torque-server-4.2.10-5.el6.x86_64              701 k
Installing for dependencies:
 munge                      x86_64         0.5.10-1.el6           epel                                            111 k

Transaction Summary
========================================================================================================================
Install      12 Package(s)

Total size: 19 M
Total download size: 111 k
Installed size: 19 M
Is this ok [y/N]: y
Downloading Packages:
munge-0.5.10-1.el6.x86_64.rpm                                                                    | 111 kB     00:00
Running rpm_check_debug
Running Transaction Test
Transaction Test Succeeded
Running Transaction
  Installing : munge-0.5.10-1.el6.x86_64                                                                           1/12
  Installing : torque-libs-4.2.10-5.el6.x86_64                                                                     2/12
  Installing : torque-4.2.10-5.el6.x86_64                                                                          3/12
  Installing : torque-devel-4.2.10-5.el6.x86_64                                                                    4/12
  Installing : torque-drmaa-4.2.10-5.el6.x86_64                                                                    5/12
  Installing : torque-drmaa-devel-4.2.10-5.el6.x86_64                                                              6/12
  Installing : torque-server-4.2.10-5.el6.x86_64                                                                   7/12
  Installing : torque-mom-4.2.10-5.el6.x86_64                                                                      8/12
  Installing : torque-client-4.2.10-5.el6.x86_64                                                                   9/12
  Installing : torque-scheduler-4.2.10-5.el6.x86_64                                                               10/12
  Installing : torque-pam-4.2.10-5.el6.x86_64                                                                     11/12
  Installing : torque-debuginfo-4.2.10-5.el6.x86_64                                                               12/12
  Verifying  : torque-4.2.10-5.el6.x86_64                                                                          1/12
  Verifying  : torque-drmaa-devel-4.2.10-5.el6.x86_64                                                              2/12
  Verifying  : torque-libs-4.2.10-5.el6.x86_64                                                                     3/12
  Verifying  : torque-debuginfo-4.2.10-5.el6.x86_64                                                                4/12
  Verifying  : torque-server-4.2.10-5.el6.x86_64                                                                   5/12
  Verifying  : torque-devel-4.2.10-5.el6.x86_64                                                                    6/12
  Verifying  : torque-mom-4.2.10-5.el6.x86_64                                                                      7/12
  Verifying  : torque-pam-4.2.10-5.el6.x86_64                                                                      8/12
  Verifying  : torque-drmaa-4.2.10-5.el6.x86_64                                                                    9/12
  Verifying  : torque-client-4.2.10-5.el6.x86_64                                                                  10/12
  Verifying  : torque-scheduler-4.2.10-5.el6.x86_64                                                               11/12
  Verifying  : munge-0.5.10-1.el6.x86_64                                                                          12/12

Installed:
  torque.x86_64 0:4.2.10-5.el6            torque-client.x86_64 0:4.2.10-5.el6  torque-debuginfo.x86_64 0:4.2.10-5.el6
  torque-devel.x86_64 0:4.2.10-5.el6      torque-drmaa.x86_64 0:4.2.10-5.el6   torque-drmaa-devel.x86_64 0:4.2.10-5.el6
  torque-libs.x86_64 0:4.2.10-5.el6       torque-mom.x86_64 0:4.2.10-5.el6     torque-pam.x86_64 0:4.2.10-5.el6
  torque-scheduler.x86_64 0:4.2.10-5.el6  torque-server.x86_64 0:4.2.10-5.el6

Dependency Installed:
  munge.x86_64 0:0.5.10-1.el6

RPMs that should be installed on the computational nodes:

torque.x86_64
torque-libs.x86_64
torque-mom
trqauthd and pbs_mom daemons should run as root in the client node.

Required changes in your servers configuration

The following requirements should be met before you start

Passwordless ssh from the headnode to the clients should be enabled
/etc/hosts for servers that constitute the cluster should be identical on all hosts

Make sure that /etc/hosts on all of the boxes in the cluster contains the hostnames of every PC in the cluster. Ensure that hostname of the server and nodes are identical in /etc/hosts.

Never use localhost as the name of you headnode and execution node. Use hostname defined in /etc/host for the main interface.
Firewall should be temporarily shut down. Installation should be done with firewall daemon disabled; you will have enough problems without it ;-)

Be sure to open TCP for all machines using TORQUE or disable the firewall. The pbs_server (server) and pbs_mom (client) by default use TCP and UDP ports 15001-15004. pbs_mom (client) also uses UDP ports 1023 and below if privileged ports are configured (the default).
NFS
is desirable

Unlike SGE, one does not need to use NFS with PBS, but doing so simplifies the installation of packages on the nodes. Usually cluster have a shared filesystem for all nodes, so this requirement is automatically met.

Configuration

The instructions below use the env. variable $PBS_HOME ? This is the base directory for configuration directories. Defaults to /var/lib/torque

Like SGE, PBS rely of certain environment variables to operate. But RPM does not provide such and environment setting file for /etc/profile.d. In case of Fedora RPMs PBS_HOME and other critical environment variables are hardwired directly in each init script. For example, as I mentioned before PBS_HOME is set to /var/lib/torque via instruction inside each init script:

export PBS_HOME=/var/lib/torque

You need to set this variable befor the configuration.

After that three configuration files need to be created or updated.

$PBS_HOME/server_name Each node needs to know what machine is running the server. This is conveyed through the $PBS_HOME/server_name file, which, for our configuration, should contain the result of execution of the command hostname --long on the headnode. For example: master
```
vi $PBS_HOME/server_name
```
$PBS_HOME/server_priv/nodes The pbs_server daemon must know which nodes are available for executing jobs. This information is kept in a file called $PBS_HOME/server_priv/nodes, and the file reside on the headnode only. You can set various properties for each node listed in the nodes file, but for this simple configuration, only the number of processors is included. $PBS_HOME/server_priv/nodes
The following lines can serve as an example:
```
master np=8
node01 np=8
```
```
vi $PBS_HOME/server_priv/nodes
```
$PBS_HOME/mom_priv/config Each pbs_mom daemon needs some basic information to participate in the batch system. This configuration information is contained in $PBS_HOME/mom_priv/config on every node.
```
vi $PBS_HOME/mom_priv/config
```
The following lines can serve as an example:
```
# Configuration for pbs_mom.
$pbsserver master
$logevent  0x0ff
```
The $pbsserver directive tells each Mom where the headnode is. The default suitable for minimal configuration when server and client share the same server is localhost.

In our case the server is called master.

The $logevent directive specifies what information should be logged during operation. A value of 0x0ff causes all messages except debug messages to be logged, while 0x1ff causes all messages, including debug messages, to be logged.

After that you also need to create the initial configuration for the server. The server configuration is maintained in a file named serverdb, located in $PBS_HOME/server_priv. The serverdb file contains all parameters pertaining to the operation of Torque plus all of the queues that are in the configuration. If you have a previous configuration that you want to save you need to save this file.

You can initialize serverdb in two different ways:

The preferable way is to use pbs_server -t create (see -t option)
```
/usr/sbin/pbs_server -D -t create
```
Warning: this will remove any existing serverdb file located at /var/lib/torque/server_priv/serverdb

You need to Ctrl^C the pbs_server after it started: it will only take a couple of seconds to create this file.
As root, execute ./torque.setup script, which in addition to setting up the /var/lib/torque/server_priv/serverdb also creates the initial queue, called batch.The drawback is that in this particular version will wipe out you $PBS_HOME/server_priv/nodes file and you need to restore it. You must view the torque.setup and extract the command to create the queue for further use.

The script needs to be executed as a root user. You might need to include the path of torque binaries and libs into corresponding variables (root user doesn't use the path variable of normal users).

The script is located in /usr/share/doc/torque-4.2.10/

[1] root@centos: # cd /usr/share/doc/torque-4.2.10/
17/07/03 00:34 /usr/share/doc/torque-4.2.10 ============================centos
[0]root@centos: # ll
total 220
drwxr-xr-x    2 root root   4096 Jul  3 00:17 ./
drwxr-xr-x. 845 root root  36864 Jul  3 00:17 ../
-rw-r--r--    1 root root 143903 Mar 19  2015 CHANGELOG
-rw-r--r--    1 root root   4123 Mar 19  2015 PBS_License_2.3.txt
-rw-r--r--    1 root root   4123 Mar 19  2015 PBS_License.txt
-rw-r--r--    1 root root   2066 Jun 30  2015 README.Fedora
-rw-r--r--    1 root root   1541 Mar 19  2015 README.torque
-rw-r--r--    1 root root   3351 Mar 19  2015 Release_Notes
-rw-r--r--    1 root root   1884 Mar 19  2015 torque.setup

sudo su -

cd /usr/share/doc/torque-4.2.10/torque.setup
./torque.setup root localhost

NOTE: As a side effect this will wipe out you $PBS_HOME/server_priv/nodes file

Starting Up PBS

NOTE: all daemons should be started as root.

For some reason this version of EPEL Torque is built with munge support. As if trqauthd is not enough. This package should be installed on all nodes. The munge package should already be installed and configured before you start Torque. Configuration consists of distribution of the key from the headnode to all computational nodes. First you need to create a munge key on the headnode using the command:

/usr/sbin/create-munge-key

Start munge service on the headnode
```
service munge start
```
```
 
```
Start trqauthd on the headnode by running init script created by RPMs. This should be done on the headnode and all nodes.
```
service trqauthd start 
```
```
service trqauthd start
```
Start MPM daemon on all nodes (or of the headnode if this the only one that is configured). It’s better if the MOM daemons on computational nodes (and the headnode if you use it for computations too) are started first to be ready to communicate with the server daemon on the headnode once it’s launched.
```
service pbs_mom start
```
Start pbs_server and pbs_sched deamons.
```
service pbs_server start
```
ATTENTION: If you did not use ./torque.setup script before that point (see above), the first time you run pbs_server, you need to start it with the -t create flag to initialize the server configuration. In this case do not use init script. Use the command line invocation (you need to set up environment first):
```
pbs_server -t create
PBS_Server master: Create mode and server database exists,
do you wish to continue y/(n)?y
```

Configure the queue batch

Create a new queue which we name batch :

qmgr -c "create queue batch queue_type=execution"
qmgr -c "set queue batch enabled=true"
qmgr -c "set queue batch started=true"
qmgr -c "set server scheduling=True"

or (taken from torque.setup script):

qmgr -c "s s scheduling=true"
qmgr -c "c q batch queue_type=execution"
qmgr -c "s q batch started=true"
qmgr -c "s q batch enabled=true"
qmgr -c "s q batch resources_default.nodes=1"
qmgr -c "s q batch resources_default.walltime=3600"
qmgr -c "s s default_queue=batch"
qmgr -c "c n master" # Add one batch worker to your pbs_server. If this is a single server that will be master

Check ,if node is visible and is listed as free

# pbsnodes
centos68
     state = free
     np = 2
     ntype = cluster
     status = rectime=1499059096,varattr=,jobs=,state=free,netload=878975846,gres=,loadave=0.00,ncpus=2,physmem=1878356kb,availmem=5699504kb,totmem=6072656kb,idletime=5350,nusers=0,nsessions=0,uname=Linux centos68 2.6.32-642.el6.x86_64 #1 SMP Tue May 10 17:27:01 UTC 2016 x86_64,opsys=linux
     mom_service_port = 15002
     mom_manager_port = 15003

Register daemons via chkconfig

If pbsnodes command works OK, use chkconfig to start the services at boot time.

/sbin/chkconfig munge on
/sbin/chkconfig trqauthd on
/sbin/chkconfig pbs_mom on
/sbin/chkconfig pbs_server on
/sbin/chkconfig pbs_sched on

Verifying cluster status

To check the status of the cluster, issue the following:

$ pbsnodes -a

First test job submission

Switch to a regular user. Tests should be run as a regular user, not as root.

A trivial test is to simply run sleep:

$ echo "sleep 30" | qsub

[0]bezroun@centos68: $ qstat
Job ID                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
3.centos68                 STDIN            bezroun                0 R batch

the job should be visible in queue for 30 sec and then disappear. To check the queue use qstat command

The second test should produce some output

Note: STDOUT and STDERR for a queued job will be logged by default in the form text files corresponding to the respective outputs pid.o and pid.e and will be written to the path from which the qsub command was issued.

For example you try the script test.sh containing the following lines :

#!/bin/bash
#PBS -l walltime=00:1:00
#PBS -l nice=19
#PBS -q test
date
sleep 10
date

Now submit it via qsub command

qsub test.sh

This should run during 10 seconds. Check if the job is inside the queue using qstat. Torque should produce also a test.sh.e# and test.sh.o# file as output:

$ ls test*
test.sh  test.sh.e4  test.sh.o4

Output should look like:

$ cat test.sh.o4
Mon Jul  3 02:11:16 EDT 2017
Mon Jul  3 02:11:26 EDT 2017

As a user not as root run the following

qsub <<EOF
hostname
echo "Hi I am a batch job running in torque"
sleep 10
EOF

Monitor the state of that job with qstat.

Checking job status

qstat is used to check work status.

Append the -n switch to see which nodes are doing which jobs.

Top Visited <p>Your browser does not support iframes.</p>					Switchboard
					Latest
					Past week
					Past month

NEWS CONTENTS

20180621 : How to install pbs on compute node and configure the server and compute node - Users-Site Administrators - PBS Professional Op ( Jun 21, 2018 , community.pbspro.org )
20170701 : Bug 1321154 – numa enabled torque don't work ( Bug 1321154 – numa enabled torque don't work, Jul 01, 2017 )
20170616 : Tutorial - Submitting a job using qsub by Sreedhar Manchu ( Tutorial - Submitting a job using qsub , Jun 16, 2017 )
20170616 : PBS (Portable Batch System) - Center for High Performance Computing ( PBS (Portable Batch System) - Center for High Performance Computing , )

Old News ;-)

[Jun 21, 2018] How to install pbs on compute node and configure the server and compute node - Users-Site Administrators - PBS Professional Op

Jun 21, 2018 | community.pbspro.org

How to install pbs on compute node and configure the server and compute node? Users/Site Administrators You have selected 0 posts.

select all

cancel selecting Jun 2016 1 / 9 Jul 2016 Apr 2017 Joey Jun '16 Hi guys,
I am new to HPC and PBS or torque. I am able to install PBS pro from source code on my head node . But not sure how to install the compute node and cconfigure it. I didn't see any documentation in the github either. Can anyone give me some help? Thanks

created Jun '16

last reply Apr '17

buchmann Jun '16 Install is pretty similar on the compute nodes - however, you do not need the "server" parts.
There are OK docs on the Altair "pro" site, see answer to previous question "documentation-is-missing/81".
In short, you the Altair docs for v13, and/or the INSTALL file procedure. (Or install from pre-build binaries).
Actual method will depend on your system type etc.

I prefer to install using pre-compiled RPMs (CentOS72 systems), which presently means that I will compile these from tarball+spec-file (slightly modified spec-file).

Hope this helps.
/Bjarne subhasisb Jun '16 @Joey thanks for joining the pbspro forum.

You can find the documentation about pbspro here: https://pbspro.atlassian.net/wiki/display/PBSPro/User+Documentation 730

Kindly do not hesitate to post questions about any specific issues you are facing.

Thanks,
Subhasis Joey Jun '16 1 Thanks for your reply.

I rebuild the CentOS72 rpm with the src from Centos7.zip
installed pbspro-server-14.1.0-13.1.x86_64.rpm on mye headnode
installed pbspro-execution-14.1.0-13.1.x86_64.rpm on my compute node.
On the head node
create /var/spool/pbs/server_priv/nodes with following:

computenode1 np=1

/etc/pbs.conf:
PBS_SERVER=headnode
PBS_START_SERVER=1
PBS_START_SCHED=1
PBS_START_COMM=1
PBS_START_MOM=0
PBS_EXEC=/opt/pbs
PBS_HOME=/var/spool/pbs
PBS_CORE_LIMIT=unlimited
PBS_SCP=/bin/scp

on the compute node

/var/spool/pbs/mom_priv/config as following

$logevent 0x1ff
$clienthost headnode
$restrict_user_maxsysid 999

/etc/pbs.conf
PBS_SERVER=headnode
PBS_START_SERVER=0
PBS_START_SCHED=0
PBS_START_COMM=0
PBS_START_MOM=1
PBS_EXEC=/opt/pbs
PBS_HOME=/var/spool/pbs
PBS_CORE_LIMIT=unlimited
PBS_SCP=/bin/scp

after that I start the pbs on headnode and compute node without error:
#/etc/init.d/pbs start
But when I try to run pbsnodes -a, it tells me:
pbsnodes: Server has no node list
If I run a script it will just Queue there.

Both server firewalld are turned off and pingable.

Can anyone give me some help? Thanks

subhasisb Jul '16 Hi @Joey ,

Unlike torque, pbspro uses a real relational database underneath to store information about nodes, queues, jobs etc. Thus creating a nodes file is not supported under pbspro.

To add a node to pbs cluster, use the qmgr command as follows:

qmgr -c "create node hostname"

HTH
regards,
Subhasis

Joey Jul '16 Thanks for your reply. I thought PBS and torque are the same except one is open source and one is commerical.

subhasisb Jul '16 Hi @Joey

They might feel similar since Torque was based on the OpenPBS codebase. OpenPBS was a version of PBS released as opensource many years back.

Post that, Altair engineering has put in a huge amount of effort towards PBS Professional and added tons of features and improvements in terms of scalability, robustness and ease of use over decades which resulted in it becoming the number one work load manager in the HPC world. Altair has now open-sourced PBS Professional.

So, pbspro is actually very different from torque in terms of capability and performance, and is actually a completely different product.

Let us know if you need further information in switching to pbspro.

Thanks and Regards,
Subhasis

10 months later sxy Apr '17 Hi Subhasis,

To add a node to pbs cluster, use the qmgr command as follows:

qmgr -c "create node hostname"

if a site has a few hundreds of compute nodes, the above method is very tedious.
would there be any easy/quick ways to register computer nodes with pbs server like the nodes file in torque?

Thanks,

Sue

mkaro Apr '17 This is one way to accomplish it

while read line; do [ -n "$line" ] && qmgr -c "create node $line"; done <nodefile

where nodefile contains the list of nodes, one per line.

[Jul 01, 2017] Bug 1321154 – numa enabled torque don't work

The response from the maintainer David Brown is really pathetic. BTW the broken version `torque-4.2.10-9.el6` is still distributed from EPEL as of June 2017. He recklessly screwed the package instead of created the second package with NUMA unabled (for large clusters it is idiotic to use Fedora supplied RPMs, although strange things happen, so the audience of FEDORA packages is mostly limited to those who run Torque on one or two nodes). After this SNAFU he tells us "Okay, I got some time to test things out."

nucleo 2016-03-24 15:33:31 EDT
Description of problem:
After updating from torque-4.2.10-5.el6 to torque-4.2.10-9.el6 pbs_mom service don't stat.
Version-Release number of selected component (if applicable):
torque-mom-4.2.10-9.el6.x86_64
Actual results:
pbs_mom.9607;Svr;pbs_mom;LOG_ERROR::No such file or directory (2) in read_layout_file, Unable to 
   read the layout file in /var/lib/torque/mom_priv/mom.layout
If I create empty file /var/lib/torque/mom_priv/mom.layout then pbs_mom service starts but never connects to to torque server, so node shown as down.
Expected results:

pbs_mom service should start and work correctly after update without creating any additional files such as mom.layout.

Additional info:

After downgrading to torque-4.2.10-5.el6 pbs_mom works fine without mom.layout file.
Gavin Nelson 2016-04-06 13:04:57 EDT

Please remove the NUMA support from this package group, or create an alternate package group.

My cluster has been dead for almost 2 weeks and the scientists are getting cranky. This feature does not play well with the MAUI scheduler and, apparently, not at all with the built-in scheduler (http://www.clusterresources.com/pipermail/torqueusers/2013-September/016136.html). Requiring this feature means having to introduce a whole host of changes to the Torque environment as well as forcing recompile of OpenMPI (last I checked epel version of openmpi does not have Torque support) and MAUI, which then means recompiling all the analysis applications, etc... I've tried...I really have.
I even tried rebuilding the package group from the src rpm, but when I remove the enable-numa switch from the torque.spec file it still builds with numa support (not sure what I'm missing there). ;

nucleo 2016-04-06 13:47:00 EDT
NUMA support enabled in 4.2.10-6, so last working version is 4.2.10-5.
It can be downloaded here 
https://kojipkgs.fedoraproject.org//packages/torque/4.2.10/5.el6/

Older packages For other EPEL and Fedora releases can be found here
https://kojipkgs.fedoraproject.org//packages/torque/4.2.10/
David Brown 2016-04-08 22:32:03 EDT
Okay, I got some time to test things out.
Just to reference for everyone involved, I think I mentioned this on another bug and on the torque users mailing list. I use chef to do some testing to build a virtual cluster and setup torque https://github.com/dmlb2000/torque-cookbook. Check out the templates directory, there are several files that need to be rendered correctly to make things work. For the numa support I had to change the server's nodes file and each mom got the mom.layout file.
I've tested multiple CPUs with multiple nodes (2x2) and am able to run MPI jobs just fine. However, the RHEL/CentOS version of openmpi is built without torque support. This means that you have to setup your hostsfile and specify the `-np` option to mpirun in order to use OpenMPI in a run and make it work.
#PBS -l nodes=2:ppn=2
mpirun -hostfile hostfile -np 4 ./mpi_hello
As, MAUI is not in EPEL I can't really setup and support a configuration of that and I consider it out of scope of support from EPEL's point of view. As I don't have a version of MAUI to target I can't ensure interoperability between the two pieces of software.
If you are having issues building and running torque with MAUI or MOAB you should ask the user mailing list as well to get help.
As to the status of the original bug I could include a basic mom.layout file. The one from the chef cookbook for example. However, this would have to be changed for most installations as that just flattens the cores on the node.

[Jun 16, 2017] Tutorial - Submitting a job using qsub by Sreedhar Manchu

Notable quotes:

"... (the path to your home directory) ..."

"... (which language you are using) ..."

"... (the name that you logged in with) ..."

"... (standard path to excecutables) ..."

"... (location of the users mail file) ..."

"... (command shell, i.e bash,sh,zsh,csh, ect.) ..."

"... (the name of the host upon which the qsub command is running) ..."

"... (the hostname of the pbs_server which qsub submits the job to) ..."

"... (the name of the original queue to which the job was submitted) ..."

"... (the absolute path of the current working directory of the qsub command) ..."

"... (each member of a job array is assigned a unique identifier) ..."

"... (set to PBS_BATCH to indicate the job is a batch job, or to PBS_INTERACTIVE to indicate the job is a PBS interactive job) ..."

"... (the job identifier assigned to the job by the batch system) ..."

"... (the job name supplied by the user) ..."

"... (the name of the file contain the list of nodes assigned to the job) ..."

"... (the name of the queue from which the job was executed from) ..."

"... (the walltime requested by the user or default walltime allotted by the scheduler) ..."

Last modified by Yanli Zhang on Jul 10, 2012

Access

Managing Data

Available Software

Compiling & Debugging

Running Jobs

Queues

Getting Started & FAQs

qsub Tutorial

Synopsis

What is qsub

What does qsub do?

Overview

Environment variables in qsub

Arguments to control behavior

Declare the date/time a job becomes eligible for execution

Defining the working directory path to be used for the job

Manipulate the output files

Mail job status at the start and end of a job

Submit a job to a specific queue

Submitting a job that is dependent on the output of another

Submitting multiple jobs in a loop that depend on output of another job

Opening an interactive shell to the compute node

Passing an environment variable to your job

Passing your environment to your job

Submitting an array job: Managing groups of jobs

Example

Create input data

Submission Script

Submit & Monitor

qstat

pbstop

Comma delimited lists

A more general for loop - Arrays with step size

A List of Input Files/Pulling data from the ith line of a file

Delete

Delete all jobs in array

Delete a single job in array

Synopsis qsub Synopsis ?

qsub [-a date_time] [-A account_string] [-b secs][-c checkpoint_options] n No checkpointing is to be performed.s Checkpointing is to be performed only when the server executing the job isshutdown . c Checkpointing is to be performed at the default minimum timefor the server executing the job. c=minutes Checkpointing is to be performed at an interval of minutes,which is the integer number of minutes of CPU time used by the job. This value must be greater than zero.[-C directive_prefix] [-d path] [-D path] [-e path] [-f] [-h][-I ] [-j join ] [-k keep ] [-l resource_list ] [-m mail_options] [-N name] [-o path] [-p priority] [-P user[:group]][-q destination] [-r c] [-S path_list] [-t array_request] [-u user_list] [- vvariable_list] [-V ] [-W additional_attributes][-X] [-z] [script]

For detailed information, see this page .
What is qsub?
qsub is the command used for job submission to the cluster. It takes several command line arguments and can also use special directives found in the submission scripts or command file. Several of the most widely used arguments are described in detail below.

Useful Information
For more information on qsub do More information on qsub ?

$ man qsub

What does qsub do?
Overview
All of our clusters have a batch server referred to as the cluster management server running on the headnode. This batch server monitors the status of the cluster and controls/monitors the various queues and job lists. Tied into the batch server, a scheduler makes decisions about how a job should be run and its placement in the queue. qsub interfaces into the the batch server and lets it know that there is another job that has requested resources on the cluster. Once a job has been received by the batch server, the scheduler decides the placement and notifies the batch server which in turn notifies qsub (Torque/PBS) whether the job can be run or not. The current status (whether the job was successfully scheduled or not) is then returned to the user. You may use a command file or STDIN as input for qsub.
Environment variables in qsub
The qsub command will pass certain environment variables in the Variable_List attribute of the job. These variables will be available to the job. The value for the following variables will be taken from the environment of the qsub command:

HOME (the path to your home directory)

LANG (which language you are using)

LOGNAME (the name that you logged in with)

PATH (standard path to excecutables)

MAIL (location of the users mail file)

SHELL (command shell, i.e bash,sh,zsh,csh, ect.)

TZ (time zone)

These values will be assigned to a new name which is the current name prefixed with the string "PBS_O_". For example, the job will have access to an environment variable named PBS_O_HOME which have the value of the variable HOME in the qsub command environment.

In addition to these standard environment variables, there are additional environment variables available to the job.

PBS_O_HOST (the name of the host upon which the qsub command is running)

PBS_SERVER (the hostname of the pbs_server which qsub submits the job to)

PBS_O_QUEUE (the name of the original queue to which the job was submitted)

PBS_O_WORKDIR (the absolute path of the current working directory of the qsub command)

PBS_ARRAYID (each member of a job array is assigned a unique identifier)

PBS_ENVIRONMENT (set to PBS_BATCH to indicate the job is a batch job, or to PBS_INTERACTIVE to indicate the job is a PBS interactive job)

PBS_JOBID (the job identifier assigned to the job by the batch system)

PBS_JOBNAME (the job name supplied by the user)

PBS_NODEFILE (the name of the file contain the list of nodes assigned to the job)

PBS_QUEUE (the name of the queue from which the job was executed from)

PBS_WALLTIME (the walltime requested by the user or default walltime allotted by the scheduler)

Arguments to control behavior
As stated before there are several arguments that you can use to get your jobs to behave a specific way. This is not an exhaustive list, but some of the most widely used and many that you will will probably need to accomplish specific tasks.
Declare the date/time a job becomes eligible for execution
To set the date/time which a job becomes eligible to run, use the -a argument. The date/time format is [[[[CC]YY]MM]DD]hhmm[.SS]. If -a is not specified qsub assumes that the job should be run immediately.

Example

To test -a get the current date from the command line and add a couple of minutes to it. It was 10:45 when I checked. Add hhmm to -a and submit a command from STDIN.
Example: Set the date/time which a job becomes eligible to run ?

$ echo "sleep 30" | qsub -a 1047

Handy Hint
This option can be added to pbs script with a PBS directive such as Equivalent PBS Directive ?

#PBS -a 1047

Defining the working directory path to be used for the job
To define the working directory path to be used for the job -d option can be used. If it is not specified, the default working directory is the home directory.

Example
Example: Define the working directory path to be used for the job ?

$ pwd /home/manchu $ catdflag.pbs echo "Working directory is $PWD" $ qsub dflag.pbs 5596682.hpc0. local $ cat dflag.pbs.o5596682 Working directory is /home/manchu$ mv dflag.pbs random_pbs/ $ qsub -d /home/manchu/random_pbs//home/manchu/random_pbs/dflag.pbs 5596703.hpc0. local$ cat random_ps/dflag.pbs.o5596703 Working directory is /home/manchu/random_pbs$ qsub /home/manchu/random_pbs/dflag.pbs5596704.hpc0. local $ cat dflag.pbs.o5596704Working directory is /home/manchu

Handy Hint
This option can be added to pbs script with a PBS directive such as Equivalent PBS Directive ?

#PBS -d /home/manchu/random_pbs

Manipulate the output files
As a default all jobs will print all stdout (standard output) messages to a file with the name in the format <job_name>.o<job_id> and all stderr (standard error) messages will be sent to a file named <job_name>.e<job_id>. These files will be copied to your working directory as soon as the job starts. To rename the file or specify a different location for the standard output and error files, use the -o for standard output and -e for the standard error file. You can also combine the output using -j.

Example
Create a simple submission file: ?

$ cat sleep .pbs #!/bin/shfor i in {1..60} ; doecho $i sleep 1 done

Create a simple submission file: ?

$ qsub -o sleep .log sleep .pbs

Handy Hint
This option can be added to pbs script with a PBS directive such as Equivalent PBS Directive ?

#PBS -o sleep.log

Submit your job with the standard error file renamed: ?

$ qsub -e sleep .log sleep .pbs

Handy Hint
This option can be added to pbs script with a PBS directive such as Equivalent PBS Directive ?

#PBS -e sleep.log

Combine them using the name sleep.log: ?

$ qsub -o sleep .log -j oe .pbs

Handy Hint
This option can be added to pbs script with a PBS directive such as Equivalent PBS Directive ?

#PBS -o sleep.log #PBS -j oe

Warning
The order of two letters next to flag -j is important. It should always start with the letter that's been already defined before, in this case 'o'. Place the joined output in another location other than the working directory: ?

$ qsub -o $HOME/tutorials/logs/sleep.log -j oe sleep .pbs

Mail job status at the start and end of a job
The mailing options are set using the -m and -M arguments. The -m argument sets the conditions under which the batch server will send a mail message about the job and -M will define the users that emails will be sent to (multiple users can be specified in a list seperated by commas). The conditions for the -m argument include:

a: mail is sent when the job is aborted.

b: mail is sent when the job begins.

e: main is sent when the job ends.

Example
Using the sleep.pbs script created earlier, submit a job that emails you for all conditions: ? $ qsub -m abe -M [email protected] sleep .pbs
Handy Hint
This option can be added to pbs script with a PBS directive such as Equivalent PBS Directive ?

#PBS -m abe #PBs -M [email protected]

Submit a job to a specific queue
You can select a queue based on walltime needed for your job. Use the 'qstat -q' command to see the maximum job times for each queue.

Example
Submit a job to the bigmem queue: ?

$ qsub -q bigmem sleep .pbs

Handy Hint
This option can be added to pbs script with a PBS directive such as Equivalent PBS Directive ?

#PBS -q bigmem

Submitting a job that is dependent on the output of another
Often you will have jobs that will be dependent on another for output in order to run. To add a dependency, we will need to use the -W (additional attributes) with the depend option. We will be using the afterok rule, but there are several other rules that may be useful. (man qsub)

Example

To illustrate the ability to hold execution of a specific job until another has completed, we will write two submission scripts. The first will create a list of random numbers. The second will sort those numbers. Since the second script will depend on the list that is created we will need to hold execution until the first has finished.
random.pbs ?

$ cat random.pbs #!/bin/sh cd$HOME sleep 120 for iin {1..100}; do echo$RANDOM >> rand.list done

sort.pbs ?

$ cat sort .pbs #!/bin/shcd $HOME sort -n rand.list > sorted.listsleep 30

Once the file are created, lets see what happens when they are submitted at the same time:
Submit at the same time ?

$ qsub random.pbs ; qsub sort .pbs 5594670.hpc0.local 5594671.hpc0. local $ ls random.pbs sorted.list sort .pbs sort .pbs.e5594671 sort .pbs.o5594671 $ cat sort .pbs.e5594671 sort: open failed: rand.list: No such fileor directory

Since they both ran at the same time, the sort script failed because the file rand.list had not been created yet. Now submit them with the dependencies added.
Submit them with the dependencies added ?

$ qsub random.pbs 5594674.hpc0. local $ qsub -W depend=afterok:5594674.hpc0.local sort .pbs5594675.hpc0. local $ qstat -u $USER hpc0.local : Req 'd Req' d Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----5594674.hpc0.loc manchu ser2 random.pbs 18029 1 1 -- 48:00 R 00:005594675.hpc0.loc manchu ser2 sort .pbs 1 1 -- 48:00 H --

We now see that the sort.pbs job is in a hold state. And once the dependent job completes the sort job runs and we see:
Job status with the dependencies added ?

$ qstat -u $USER hpc0. local : Req 'd Req' d Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----5594675.hpc0.loc manchu ser2sort .pbs 18165 1 1 -- 48:00 R --

Useful Information

afterany:jobid[:jobid...] implies that job may be scheduled for execution after jobs jobid have terminated, with or without errors.

afterok:jobid[:jobid...] implies that job may be scheduled for execution only after jobs jobid have terminated with no errors.

afternotok:jobid[:jobid...] implies that job may be scheduled for execution only after jobs jobid have terminated with errors.

Submitting multiple jobs in a loop that depend on output of another job
This example show how to submit multiple jobs in a loop where each job depends on output of job submitted before it.

Example

Let's say we need to write numbers from 0 to 999999 in order onto a file output.txt. We can do 10 separate runs to achieve this, where each run has a separate pbs script writing 100,000 numbers to output file. Let's see what happens if we submit all 10 jobs at the same time.

The script below creates required pbs scripts for all the runs.
Create PBS Scripts for all the runs ?

$ cat creation.sh #!/bin/bash for i in {0..9} do cat > pbs.script.$i << EOF #!/bin/bash #PBS -l nodes=1:ppn=1,walltime=600cd \$PBS_O_WORKDIR for ((i=$((i*100000)); i<$(((i+1)*100000)); i++)) { echo "\$i" >> output.txt } exit 0; EOF done

Change permission to make it an executable ?

$ chmod u+x creation.sh

Run the Script ?

$ ./creation.sh

List of Created PBS Scripts ?

$ ls -l pbs.script.* -rw-r--r-- 1 manchu wheel 134 Oct 27 16:32 pbs.script.0-rw-r--r-- 1 manchu wheel 139 Oct 27 16:32 pbs.script.1-rw-r--r-- 1 manchu wheel 139 Oct 27 16:32 pbs.script.2 -rw-r--r-- 1 manchu wheel 139 Oct 27 16:32 pbs.script.3-rw-r--r-- 1 manchu wheel 139 Oct 27 16:32 pbs.script.4-rw-r--r-- 1 manchu wheel 139 Oct 27 16:32 pbs.script.5-rw-r--r-- 1 manchu wheel 139 Oct 27 16:32 pbs.script.6 -rw-r--r-- 1 manchu wheel 139 Oct 27 16:32 pbs.script.7-rw-r--r-- 1 manchu wheel 139 Oct 27 16:32 pbs.script.8-rw-r--r-- 1 manchu wheel 140 Oct 27 16:32 pbs.script.9

PBS Script ?

$ cat pbs.script.0 #!/bin/bash #PBS -l nodes=1:ppn=1,walltime=600 cd $PBS_O_WORKDIR for ((i=0; i<100000; i++)) { echo "$i" >> output.txt }exit 0;

Submit Multiple Jobs at a Time ?

$ for i in {0..9}; do qsub pbs.script.$i ; done 5633531.hpc0.local 5633532.hpc0. local 5633533.hpc0.local 5633534.hpc0. local 5633535.hpc0.local 5633536.hpc0. local 5633537.hpc0.local 5633538.hpc0. local 5633539.hpc0.local 5633540.hpc0. local $

output.txt ?

$ tail output.txt 699990 699991699992 699993 699994 699995 699996 699997 699998 699999 -bash -3.1$ grep -n 999999 $_ 210510:999999 $

This clearly shows the nubmers are in no order like we wanted. This is because all the runs wrote to the same file at the same time, which is not what we wanted.

Let's submit jobs using qsub dependency feature. This can be achieved with a simple script shown below.
Simple Script to Submit Multiple Dependent Jobs ?

$ cat dependency.pbs #!/bin/bash job=`qsub pbs.script.0` for i in {1..9}do job_next=`qsub -W depend=afterok:$job pbs.script.$i`job=$job_next done

Let's make it an executable ?

$ chmod u+x dependency.pbs

Submit dependent jobs by running the script ?

$ ./dependency.pbs $ qstat -u manchu hpc0. local : Req 'd Req' d ElapJob ID Username Queue Jobname SessID NDS TSK Memory Time S Time -------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----5633541.hpc0.loc manchu ser2 pbs.script.0 28646 1 1 -- 00:10 R --5633542.hpc0.loc manchu ser2 pbs.script.1 -- 1 1 -- 00:10 H -- 5633543.hpc0.loc manchu ser2 pbs.script.2 -- 1 1 -- 00:10 H --5633544.hpc0.loc manchu ser2 pbs.script.3 -- 1 1 -- 00:10 H --5633545.hpc0.loc manchu ser2 pbs.script.4 -- 1 1 -- 00:10 H --5633546.hpc0.loc manchu ser2 pbs.script.5 -- 1 1 -- 00:10 H -- 5633547.hpc0.loc manchu ser2 pbs.script.6 -- 1 1 -- 00:10 H -- 5633548.hpc0.loc manchu ser2 pbs.script.7 -- 1 1 -- 00:10 H --5633549.hpc0.loc manchu ser2 pbs.script.8 -- 1 1 -- 00:10 H --5633550.hpc0.loc manchu ser2 pbs.script.9 -- 1 1 -- 00:10 H --$

Output after first run ?

$ tail output.txt 99990 9999199992 99993 99994 99995 9999699997 99998 99999 $

Output after final run ?

$ tail output.txt 999990 999991999992 999993 999994 999995 999996 999997 999998 999999 $grep -n 100000 output.txt 100001:100000 $ grep -n 999999 output.txt 1000000:999999$

This shows that numbers are written in order to output.txt. Which in turn shows that jobs ran one after successful completion of another.

Opening an interactive shell to the compute node
To open an interactive shell to a compute node, use the -I argument. This is often used in conjunction with the -X (X11 Forwarding) and the -V (pass all of the users environment)

Example
Open an interactive shell to a compute node ?

$ qsub -I

Passing an environment variable to your job
You can pass user defined environment variables to a job by using the -v argument.

Example

To test this we will use a simple script that prints out an environment variable.
Passing an environment variable ?

$ cat variable.pbs #!/bin/sh if[ "x" == "x$MYVAR" ] ;then echo "Variable is not set" else echo "Variable says: $MYVAR" fi

Next use qsub without the -v and check your standard out file
qsub without -v ?

$ qsub variable.pbs 5596675.hpc0. local $cat variable.pbs.o5596675 Variable is not set

Then use the -v to set the variable
qsub with -v ?

$ qsub - v MYVAR= "hello" variable.pbs5596676.hpc0. local $ cat variable.pbs.o5596676Variable says: hello

Handy Hint
This option can be added to pbs script with a PBS directive such as Equivalent PBS Directive ?

#PBS -v MYVAR="hello"

Useful Information
Multiple user defined environment variables can be passed to a job at a time. Passing Multiple Variables ?

$ cat variable.pbs #!/bin/sh echo"$VAR1 $VAR2 $VAR3" > output.txt $ $ qsub - v VAR1= "hello" ,VAR2="Sreedhar" ,VAR3= "How are you?" variable.pbs5627200.hpc0. local $ cat output.txthello Sreedhar How are you? $

Passing your environment to your job
You may declare that all of your environment variables are passed to the job by using the -V argument in qsub.

Example

Use qsub to perform an interactive login to one of the nodes:
Passing your environment: qsub with -V ?

$ qsub -I -V

Handy Hint
This option can be added to pbs script with a PBS directive such as Equivalent PBS Directive ?

#PBS -V

Once the shell is opened, use the env command to see that your environment was passed to the job correctly. You should still have access to all your modules that you loaded previously.

Submitting an array job: Managing groups of jobs .hostname would have PBS_ARRAYID set to 0. This will allow you to create job arrays where each job in the array will perform slightly different actions based on the value of this variable, such as performing the same tasks on different input files. One other difference in the environment between jobs in the same array is the value of the PBS_JOBNAME variable.

Example

First we need to create data to be read. Note that in a real application, this could be data, configuration setting or anything that your program needs to run.

Create Input Data

To create input data, run this simple one-liner:
Creating input data ?

$ for i in {0..4}; do echo "Input data file for an array $i" > input.$i ;done $ ls input.* input.0 input.1 input.2 input.3 input.4$ cat input.0 Input data file for an array 0

Submission Script
Submission Script: array.pbs ?

$ cat array.pbs #!/bin/sh #PBS -l nodes=1:ppn=1,walltime=5:00#PBS -N arraytest cd ${PBS_O_WORKDIR}# Take me to the directory where I launched qsub # This part of the script handles the data. In a real world situation you will probably# be using an existing application. cat input.${PBS_ARRAYID} > output.${PBS_ARRAYID}echo "Job Name is ${PBS_JOBNAME}" >> output.${PBS_ARRAYID} sleep 30 exit 0;

Submit & Monitor

Instead of running five qsub commands, we can simply enter:
Submitting and Monitoring Array of Jobs ?

$ qsub -t 0-4 array.pbs 5534017[].hpc0. local

qstat
qstat ?

$ qstat -u $USER hpc0. local : Req 'd Req' d Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----5534017[].hpc0.l sm4082 ser2 arraytest 1 1 -- 00:05 R --$ qstat -t -u $USER hpc0.local : Req 'd Req' d Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----5534017[0].hpc0. sm4082 ser2 arraytest-0 12017 1 1 -- 00:05 R --5534017[1].hpc0. sm4082 ser2 arraytest-1 12050 1 1 -- 00:05 R -- 5534017[2].hpc0. sm4082 ser2 arraytest-2 12084 1 1 -- 00:05 R --5534017[3].hpc0. sm4082 ser2 arraytest-3 12117 1 1 -- 00:05 R --5534017[4].hpc0. sm4082 ser2 arraytest-4 12150 1 1 -- 00:05 R --$ ls output.* output.0 output.1 output.2 output.3 output.4$ cat output.0Input data file for an array 0 Job Name is arraytest-0

pbstop

pbstop by default doesn't show all the jobs in the array. Instead, it shows a single job in just one line in the job information. Pressing 'A' shows all the jobs in the array. Same can be achieved by giving the command line option '-A'. This option along with '-u <NetID>' shows all of your jobs including array as well as normal jobs.
pbstop ?

$ pbstop -A -u $USER

Note
Typing 'A' expands/collapses array job representation.

Comma delimited lists

The -t option of qsub also accepts comma delimited lists of job IDs so you are free to choose how to index the members of your job array. For example:
Comma delimited lists ?

$ rm output.* $ qsub -t 2,5,7-9 array.pbs5534018[].hpc0. local $ qstat -u $USER hpc0.local : Req 'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----5534018[].hpc0.l sm4082 ser2 arraytest 1 1 -- 00:05 Q --$ qstat -t -u $USER hpc0. local :Req 'd Req' d Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----5534018[2].hpc0. sm4082 ser2 arraytest-2 12319 1 1 -- 00:05 R --5534018[5].hpc0. sm4082 ser2 arraytest-5 12353 1 1 -- 00:05 R --5534018[7].hpc0. sm4082 ser2 arraytest-7 12386 1 1 -- 00:05 R --5534018[8].hpc0. sm4082 ser2 arraytest-8 12419 1 1 -- 00:05 R --5534018[9].hpc0. sm4082 ser2 arraytest-9 12452 1 1 -- 00:05 R -- $ls output.* output.2 output.5 output.7 output.8 output.9$ cat output.2 Input data filefor an array 2 Job Name is arraytest-2

A more general for loop - Arrays with step size

By default, PBS doesn't allow array jobs with step size. qsub -t 0-10 <pbs.script> increments PBS_ARRAYID in 1. To submit jobs in steps of a certain size, let's say step size of 3 starting at 0 and ending at 10, one has to do
?

qsub -t 0,3,6,9 <pbs.script>

To make it easy for users we have put a wrapper which takes starting point, ending point and step size as arguments for -t flag. This avoids default necessity that PBS_ARRAYID increment be 1. The above request can be accomplished with (which happens behind the scenes with the help of wrapper)
?

qsub -t 0-10:3 <pbs.script>

Here, 0 is the starting point, 10 is the ending point and 3 is the step size. It is not necessary that starting point must be 0. It can be any number. Incidentally, in a situation in which the upper-bound is not equal to the lower-bound plus an integer-multiple of the increment, for example
?

qsub -t 0-10:3 <pbs.script>

wrapper automatically changes the upper bound as shown in the example below.
Arrays with step size ?

[sm4082@login-0-0 ~]$ qsub -t 0-10:3 array.pbs 6390152[].hpc0.local [sm4082@login-0-0 ~]$ qstat -u $USER hpc0.local : Req 'd Req' d Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----6390152[].hpc0.l sm4082 ser2 arraytest -- 1 1 -- 00:05 Q --[sm4082@login-0-0 ~]$ qstat -t -u $USER hpc0. local: Req 'd Req' d ElapJob ID Username Queue Jobname SessID NDS TSK Memory Time S Time -------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----6390152[0].hpc0. sm4082 ser2 arraytest-0 25585 1 1 -- 00:05 R --6390152[3].hpc0. sm4082 ser2 arraytest-3 28227 1 1 -- 00:05 R --6390152[6].hpc0. sm4082 ser2 arraytest-6 8515 1 1 -- 00:05 R 00:006390152[9].hpc0. sm4082 ser2 arraytest-9 505 1 1 -- 00:05 R -- [sm4082@login-0-0 ~]$ ls output.* output.0 output.3 output.6 output.9[sm4082@login-0-0 ~]$ cat output.9Input data file for an array 9 Job Name is arraytest-9 [sm4082@login-0-0 ~]$

Note
By default, PBS doesn't support arrays with step size. On our clusters, it's been achieved with a wrapper. This option might not be there on clusters at other organizations/schools that use PBS/Torque.

Note
If you're trying to submit jobs through ssh to login nodes from your pbs scripts with statement such as ?

ssh login-0-0 "cd ${PBS_O_WORKDIR};`which qsub` -t 0-10:3 <pbs.script>"

arrays with step size wouldn't work unless you either add
?

shopt -s expand_aliases

to your pbs script that's in bash or add this to your .bashrc in your home directory. Adding this makes alias for qsub come into effect there by making wrapper act on command line options to qsub (For that matter this brings any alias to effect for commands executed via SSH).

If you have
?

#PBS -t 0-10:3

in your pbs script you don't need to add this either to your pbs script or to your .bashrc in your home directory.

A List of Input Files/Pulling data from the ith line of a file

Suppose we have a list of 1000 input files, rather than input files explicitly indexed by suffix, in a file file_list.text one per line:
A List of Input Files/Pulling data from the ith line of a file ?

[sm4082@login-0-2 ~]$ cat array.list #!/bin/bash#PBS -S /bin/bash #PBS -l nodes=1:ppn=1,walltime=1:00:00 INPUT_FILE=` awk "NR==$PBS_ARRAYID" file_list.text` # # ...or use sed:# sed -n -e "${PBS_ARRAYID}p" file_list.text # # ...or use head/tail # $(cat file_list.text | head -n $PBS_ARRAYID | tail -n 1)./executable < $INPUT_FILE

In this example, the '-n' option suppresses all output except that which is explicitly printed (on the line equal to PBS_ARRAYID).
?

qsub -t 1-1000 array.list

Let's say you have a list of 1000 numbers in a file, one number per line. For example, the numbers could be random number seeds for a simulation. For each task in an array job, you want to get the ith line from the file, where i equals PBS_ARRAYID, and use that value as the seed. This is accomplished by using the Unix head and tail commands or awk or sed just like above.
A List of Input Files/Pulling data from the ith line of a file ?

[sm4082@login-0-2 ~]$ cat array.seed #!/bin/bash#PBS -S /bin/bash #PBS -l nodes=1:ppn=1,walltime=1:00:00 SEEDFILE=~/data/seeds SEED=$( cat $SEEDFILE |head -n $PBS_ARRAYID | tail -n 1)~/programs/executable $SEED > ~/results/output.$PBS_ARRAYID

?
qsub -t 1-1000 array.seed 
You can use this trick for all sorts of things. For example, if your jobs all use the same program, but with very different command-line options, you can list all the options in the file, one set per line, and the exercise is basically the same as the above, and you only have two files to handle (or 3, if you have a perl script generate the file of command-lines).

Delete

Delete all jobs in array

We can delete all the jobs in array with a single command.
Deleting array of jobs ?

$ qsub -t 2-5 array.pbs 5534020[].hpc0. local $ qstat -u $USER hpc0. local : Req 'd Req' d Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----5534020[].hpc0.l sm4082 ser2 arraytest 1 1 -- 00:05 R --$ qdel 5534020[] $ qstat -u $USER$

Delete a single job in array

Delete a single job in array, e.g. number 4,5 and 7
Deleting a single job in array ?

$ qsub -t 0-8 array.pbs 5534021[].hpc0. local $ qstat -u $USER hpc0. local : Req 'd Req' d Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time----------- -- ---- ---------- ---- ---- -- ----- --- - ---5534021[].hpc0.l sm4082 ser2 arraytest 1 1 -- 00:05 Q --$ qstat -t -u $USER hpc0. local :Req 'd Req' d Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----5534021[0].hpc0. sm4082 ser2 arraytest-0 26618 1 1 -- 00:05 R --5534021[1].hpc0. sm4082 ser2 arraytest-1 14271 1 1 -- 00:05 R --5534021[2].hpc0. sm4082 ser2 arraytest-2 14304 1 1 -- 00:05 R --5534021[3].hpc0. sm4082 ser2 arraytest-3 14721 1 1 -- 00:05 R --5534021[4].hpc0. sm4082 ser2 arraytest-4 14754 1 1 -- 00:05 R -- 5534021[5].hpc0. sm4082 ser2 arraytest-5 14787 1 1 -- 00:05 R --5534021[6].hpc0. sm4082 ser2 arraytest-6 10711 1 1 -- 00:05 R --5534021[7].hpc0. sm4082 ser2 arraytest-7 10744 1 1 -- 00:05 R --5534021[8].hpc0. sm4082 ser2 arraytest-8 9711 1 1 -- 00:05 R --$ qdel 5534021[4] $ qdel 5534021[5] $ qdel 5534021[7]$ qstat -t -u $USER hpc0. local :Req 'd Req' d Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----5534021[0].hpc0. sm4082 ser2 arraytest-0 26618 1 1 -- 00:05 R --5534021[1].hpc0. sm4082 ser2 arraytest-1 14271 1 1 -- 00:05 R --5534021[2].hpc0. sm4082 ser2 arraytest-2 14304 1 1 -- 00:05 R --5534021[3].hpc0. sm4082 ser2 arraytest-3 14721 1 1 -- 00:05 R --5534021[6].hpc0. sm4082 ser2 arraytest-6 10711 1 1 -- 00:05 R -- 5534021[8].hpc0. sm4082 ser2 arraytest-8 9711 1 1 -- 00:05 R --$ qstat -t -u $USER$

PBS (Portable Batch System) - Center for High Performance Computing

The Portable Batch System (PBS) was written as a joint project between the Numerical Aerospace Simulation (NAS) Systems Division of NASA AMES Research Center and National Energy Research Supercomputer Center (NERSC) of Lawrence Livermore National Laboratory. It is a batch software processing system that provides a simple consistent mean of job submission to a flexible and configurable job scheduler. It is currently being employed across all of our HPC systems.

See the PBS Homepage for more details.

Although development (compiles, short runs, etc.) is allowed, there is a kernel limit of 15 CPU minutes on all interactive processes. Processes that exceed 15 minutes and are not submitted through the PBS system are subject to immediate termination without prior notification.

Batch Environment Variables

These are the environment variables set by PBS for a job run in October 2000 by user jhc.PBS_O_HOME=/home/chpc/jhc PBS_O_LANG=en_US PBS_O_LOGNAME=jhc PBS_O_PATH=/uufs/sp/sys/bin:/uufs/sp/host/bin:/usr/local/bin:/usr/bin:/etc:/usr/sbin:/usr/ucb:/usr/bin/X11:/sbin: /usr/ibmcxx/bin:/usr/local/chpc/bin:/uufs/inscc.utah.edu/sys/bin:/usr/ucb:/usr/totalview/bin:/usr/local/chpc/ncar g/bin:/home/chpc/jhc/bin:. PBS_O_MAIL=/usr/spool/mail/jhc PBS_O_SHELL=/bin/ksh PBS_O_TZ=MST7MDT PBS_O_HOST=a68-e.chpc.utah.edu PBS_O_WORKDIR=/home/chpc/jhc/intro_mpi/sp PBS_O_QUEUE=sp PBS_JOBNAME=sppbsjob PBS_JOBID=5336.a68-e.chpc.utah.edu PBS_QUEUE=sp SHELL=/bin/csh PBS_JOBCOOKIE=2720BE7965F1673E424DD59E5808621C PBS_NODENUM=0 PBS_TASKNUM=1 PBS_MOMPORT=15003 PBS_NODEFILE=/uufs/sp/host/var/pbs/aux/5336.a68-e.chpc.utah.edu MP_PROCS=4 MP_HOSTFILE=/uufs/sp/host/var/pbs/aux/5336.a68-e.chpc.utah.edu PBS_JOB_KEY=1476944412 MP_MPI_NETWORK=0 PBS_ENVIRONMENT=PBS_BATCH ENVIRONMENT=BATCH

PBS Batch Script Options

-a date_time. Declares the time after which the job is eligible for execution. The date_time element is in the form: [[[ CC YY]MM]DD]hhmm .S.

-e path. Defines the path to be used for the standard error stream of the batch job. The path is of the form: path_name.

-h. Specifies that a user hold will be applied to the job at submission time.

-I. Declares that the job is to be run "interactively". The job will be queued and scheduled as PBS batch job, but when executed the standard input, output, and error streams of the job will be connected through qsub to the terminal session in which qsub is running.

-j join. Declares if the standard error stream of the job will be merged with the standard ouput stream. The join argument is one of the following:

oe- Directs the two streams as standard output.

eo- Directs the two streams as standard error.

n- Any two streams will be separate_(Default)_.

-l resource_list. Defines the resources that are required by the job and establishes a limit on the amount of resources that can be consumed. Users will want to specify the walltime resource, and if they wish to run a parallel job, the ncpus resource.

-m mail_options. Conditions under which the server will send a mail message about the job. The options are:

n: No mail ever sent

a (default): When the job aborts

b: When the job begins

e: When the job ends

-M user_list. Declares the list of e-mail addresses to whom mail is sent. If unspecified it defaults to userid@host from where the job was submitted. You will most likely want to set this option.

-N name. Declares a name for the job.

-o path. Defines the path to be used for the standard output. path_name.

-q destination. The destination is the queue.

-S path_list. Declares the shell that interprets the job script. If not specified it will use the user's login shell.

-v variable_list. Expands the list of environment variables which are exported to the job. The variable list is a comma-separated list of strings of the form variable or variable=value.

-V. Declares that all environment variables in the qsub command's environment are to be exported to the batch job.

PBS User Commands

For any of the commands listed below you may do a "man command" for syntax and detailed information.

Frequently used PBS user commands:

qsub. Submits a job to the PBS queuing system. Please see qsub Options below.

qdel. Deletes a PBS job from the queue.

qstat. Shows status of PBS batch jobs.

xpbs. X interface for PBS users.

Less Frequently-Used PBS User Commands:

qalter. Modifies the attributes of a job.

qhold. Requests that the PBS server place a hold on a job.

qmove. Removes a job from the queue in which it resides and places the job in another queue.

qmsg. Sends a message to a PBS batch job. To send a message to a job is to write a message string into one or more of the job's output files.

qorder. Exchanges the order of two PBS batch jobs within a queue.

qrerun. Reruns a PBS batch job.

qrls. Releases a hold on a PBS batch job.

qselect. Lists the job identifier of those jobs which meet certain selection criteria.

qsig. Requests that a signal be sent to the session leader of a batch job. Last Modified: October 06, 2008 @ 21:07:48

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers : Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda : SE quotes : Language Design and Programming Quotes : Random IT-related quotes : Somerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Bernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds : Larry Wall : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOS : Programming Languages History : PL/1 : Simula 67 : C : History of GCC development : Scripting Languages : Perl history : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-Month : How to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D

Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to to buy a cup of coffee for authors of this site

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last modified: February, 27, 2021

OpenPBS, PBSpro and Torque

Packages that are prerequisite

RPMs that should be installed on the headnode

Here is how it looks like:

RPMs that should be installed on the computational nodes:

Passwordless ssh from the headnode to the clients should be enabled

/etc/hosts for servers that constitute the cluster should be identical on all hosts

Firewall should be temporarily shut down. Installation should be done with firewall daemon disabled; you will have enough problems without it ;-)

NFS

Old News ;-)

[Jun 21, 2018] How to install pbs on compute node and configure the server and compute node - Users-Site Administrators - PBS Professional Op

Jun 21, 2018 | community.pbspro.org

[Jul 01, 2017] Bug 1321154 – numa enabled torque don't work

[Jun 16, 2017] Tutorial - Submitting a job using qsub by Sreedhar Manchu

Notable quotes:

"... (the path to your home directory) ..."

"... (which language you are using) ..."

"... (the name that you logged in with) ..."

"... (standard path to excecutables) ..."

"... (location of the users mail file) ..."

"... (command shell, i.e bash,sh,zsh,csh, ect.) ..."

"... (the name of the host upon which the qsub command is running) ..."

"... (the hostname of the pbs_server which qsub submits the job to) ..."

"... (the name of the original queue to which the job was submitted) ..."

"... (the absolute path of the current working directory of the qsub command) ..."

"... (each member of a job array is assigned a unique identifier) ..."

"... (set to PBS_BATCH to indicate the job is a batch job, or to PBS_INTERACTIVE to indicate the job is a PBS interactive job) ..."

"... (the job identifier assigned to the job by the batch system) ..."

"... (the job name supplied by the user) ..."

"... (the name of the file contain the list of nodes assigned to the job) ..."

"... (the name of the queue from which the job was executed from) ..."

"... (the walltime requested by the user or default walltime allotted by the scheduler) ..."

Example

Example

Example

Example

Example

Example

Example

Example

Example

Example

Example

Create Input Data

Submission Script

Submit & Monitor

qstat

pbstop

Comma delimited lists

A more general for loop - Arrays with step size

A List of Input Files/Pulling data from the ith line of a file

Delete

Delete all jobs in array

Delete a single job in array

Batch Environment Variables

PBS Batch Script Options

PBS User Commands

Google matched content

Softpanorama Recommended

Torque

PBSpro

`/etc/hosts` for servers that constitute the cluster should be identical on all hosts