Grid Engine As a High Quality Unix/Linux Batch System

cluster · Installation is pretty raw, but I tried to compensate for that by creating several pages which together document installation process of RHEL 6.5 or 6.6 pretty well:

all.q
$SGE_ROOT
all.q,
qconf -spl

   qconf -sp
               qconf -mp
           qconf -Ap
pe_list 
pe_list               make mpi mpi_fill_up
slots
allocation_rule
control_slaves
job_is_first_task 
accounting_summary
sge_master
sge_qmaster
sge_qmaster
sge_execd
qconf -ah <hostname>
qsub
qstat
qmon
qconf -as
qconf

      sge_execd
qconf
qconf -ae
-ae
vi
EDITOR
host_conf
sge_qmaster 
sge_qmaster
sge_qmaster
sge_execd
sge_qmaster
sge_execd
sge_qmaster
sge_execd
setuid
This is Son of Grid Engine version v8.1.9.

See <http://arc.liv.ac.uk/repos/darcs/sge-release/NEWS> for information on
recent changes.  See <https://arc.liv.ac.uk/trac/SGE> for more
information.

The .deb and .rpm packages and the source tarball are signed with PGP
key B5AEEEA9.

* sge-8.1.9.tar.gz, sge-8.1.9.tar.gz.sig:  Source tarball and PGP signature

* RPMs for Red Hat-ish systems, installing into /opt/sge with GUI
  installer and Hadoop support:

  * gridengine-8.1.9-1.el5.src.rpm:  Source RPM for RHEL, Fedora

  * gridengine-*8.1.9-1.el6.x86_64.rpm:  RPMs for RHEL 6 (and
    CentOS, SL)

  See <https://copr.fedorainfracloud.org/coprs/loveshack/SGE/> for
  hwloc 1.6 RPMs if you need them for building/installing RHEL5 RPMs.

* Debian packages, installing into /opt/sge, not providing the GUI
  installer or Hadoop support:

  * sge_8.1.9.dsc, sge_8.1.9.tar.gz:  Source packaging.  See
    <http://wiki.debian.org/BuildingAPackage>, and see
    <http://arc.liv.ac.uk/downloads/SGE/support/> if you need (a more
    recent) hwloc.

  * sge-common_8.1.9_all.deb, sge-doc_8.1.9_all.deb,
    sge_8.1.9_amd64.deb, sge-dbg_8.1.9_amd64.deb: Binary packages
    built on Debian Jessie.

* debian-8.1.9.tar.gz:  Alternative Debian packaging, for installing
  into /usr.

* arco-8.1.6.tar.gz:  ARCo source (unchanged from previous version)

* dbwriter-8.1.6.tar.gz:  compiled dbwriter component of ARCo
  (unchanged from previous version)

More RPMs (unsigned, unfortunately) are available at
<http://copr.fedoraproject.org/coprs/loveshack/SGE/>.

git clone [email protected]:gawbul/docker-sge.git
cd docker-sge
docker build -t gawbul/docker-sge .

docker pull gawbul/docker-sge

docker run -it --rm gawbul/docker-sge login -f sgeadmin

echo "echo Running test from $HOSTNAME" | qsub

export KUBE_SERVER=xxx.xxx.xxx.xxx
export DNS_DOMAIN=xxxx.xxxx
export DNS_SERVER_IP=xxx.xxx.xxx.xxx
./kubernetes/setup_all.sh 20

kubectl exec sgemaster -- sudo su sgeuser bash -c '. /etc/profile.d/sge.sh; echo "/bin/hostname" | qsub'
kubectl exec sgemaster -- sudo su sgeuser bash -c 'cat /home/sgeuser/STDIN.o1'

./kubernetes/add_sge_workers.sh 10

./kubernetes/setup_k8s.sh

export KUBE_SERVER=xxx.xxx.xxx.xxx
export DNS_DOMAIN=xxxx.xxxx
export DNS_SERVER_IP=xxx.xxx.xxx.xxx
./kubernetes/setup_dns.sh

kubectl create -f ./kubernetes/skydns/busybox.yaml

kubectl exec busybox -- nslookup kubernetes

kubectl exec busybox -- nslookup 10.0.0.1

kubectl exec busybox -- nslookup busybox.default

./kubernetes/setup_sge.sh 10

kubectl exec sgemaster -- sudo su sgeuser bash -c '. /etc/profile.d/sge.sh; echo "/bin/hostname" | qsub'
kubectl exec sgemaster -- sudo su sgeuser bash -c 'cat /home/sgeuser/STDIN.o1'

./kubernetes/add_sge_workers.sh 10

modprobe nfsd

docker run -d --hostname resolvable -v /var/run/docker.sock:/tmp/docker.sock -v /etc/resolv.conf:/tmp/resolv.conf mgood/resolvable

docker run -d --name nfshome --privileged cpuguy83/nfs-server /exports
docker run -d --name nfsopt --privileged cpuguy83/nfs-server /exports

docker run -d -h sgemaster --name sgemaster --privileged --link nfshome:nfshome --link nfsopt:nfsopt wtakase/sge-master:ubuntu

docker run -d -h sgeworker01 --name sgeworker01 --privileged --link sgemaster:sgemaster --link nfshome:nfshome --link nfsopt:nfsopt wtakase/sge-worker:ubuntu
docker run -d -h sgeworker02 --name sgeworker02 --privileged --link sgemaster:sgemaster --link nfshome:nfshome --link nfsopt:nfsopt wtakase/sge-worker:ubuntu

docker exec -u sgeuser -it sgemaster bash -c '. /etc/profile.d/sge.sh; echo "/bin/hostname" | qsub'
docker exec -u sgeuser -it sgemaster cat /home/sgeuser/STDIN.o1

ssh master -Y . /opt/sge/default/common/settings.sh \; qmon

admin@master:~$ qhost | sed -e 's/^.\{35\}[^0-9]\+//' | cut -d" " -f1

/etc/profile.d
/opt/sge/default/common
fah-a
fah-d
$ ls ~/grid/fah-a/
fah-a
client.cfg
FAH504-Linux.exe
work

admin@master:~$ qconf -sp fah-a
pe_name           fah-a
slots             1
user_lists        fah

#!/bin/sh
# use bash
#$ -S /bin/sh
# current directory
#$ -cwd
# merge output
#$ -j y
# mail at end
#$ -m e
# project
#$ -P fah
# name in queue
#$ -N fah-a
# parallel environment
#$ -pe fah-a 1
./FAH504-Linux.exe -oneunit

admin@master$ qconf -srqsl
admin@master$ qconf -mrqs lm2007_slots
{
   name         lm2007_slots
   description  Limit the lm2007 project to 20 slots across the grid
   enabled      TRUE
   limit        projects lm2007 to slots=20
}

paikea
admin@master$ qconf -scal paikeaupgrade
calendar_name    paikeaupgrade
year             17.1.2008=off
week             NONE

admin@master$ qconf -mq all.q
...
calendar              NONE,[paikea=paikeaupgrade]
...

admin@master$ qconf -scal michael
calendar_name    michael
year             NONE
week             mon-sat=13-21=off

qname                 beagle.q
hostlist              paikea.stat.auckland.ac.nz
priority              19,[paikea.stat.auckland.ac.nz=15]
user_lists            beagle
projects              beagle

qname                 paikea.q
hostlist              paikea.stat.auckland.ac.nz
suspend_thresholds    NONE,[paikea.stat.auckland.ac.nz=np_load_short=1.01]
nsuspend              1
suspend_interval      00:05:00
slots                 0,[paikea.stat.auckland.ac.nz=4]

sge-submit.stat.auckland.ac.nz
#!/bin/sh
expr 3 + 5

user@submit:~$ qsub test.sh
Your job 464 ("test.sh") has been submitted
user@submit:~$ qstat
job-ID  prior   name       user         state submit/start at     queue                slots ja-task-ID 
-------------------------------------------------------------------------------------------------------
    464 0.00000 test.sh    user         qw    01/10/2008 10:48:03                          1

user@submit:~$ ls test.sh*
test.sh  test.sh.e464  test.sh.o464
user@submit:~$ cat test.sh.o464
8

o
user@exec:~$ nohup nice R CMD BATCH toodles.R

user@submit:~$ qsub-R toodles.R
Your job 3540 ("toodles.R") has been submitted

toodles.R.o3540
qsub-R
3rd_party/uoa-dos/submit-R
#!/bin/sh
#$ -S /bin/sh
echo Factors $1 and $2
expr $1 + $2

user@submit:~$ for A in 1 2 3 4 5 ; do for B in 1 2 3 4 5 ; do qsub test.sh $A $B ; done ; done

user@submit:~$ cat test.sh.?487
Factors 3 and 5
8

alpha <- ALPHA
beta <- c(BETA)
# magic happens here
alpha
beta

0.9
0.8
0.7

0,0,1
0,1,0
1,0,0

#!/bin/sh

if [ "X${SGE_ROOT}" == "X" ] ; then
         echo Run: . /opt/sge/default/common/settings.sh
         exit
fi

cat ALPHA | while read ALPHA ; do
         cat BETA | while read BETA ; do
                 FILE="t-${ALPHA}-${BETA}"

                 # create our R file
                 cat template.R | sed -e "s/ALPHA/${ALPHA}/" -e "s/BETA/${BETA}/" > ${FILE}.R

                 # create a script
                 echo \#!/bin/sh > ${FILE}.sh
                 echo \#$ -S /bin/sh >> ${FILE}.sh
                 echo "if [ -f ${FILE}.Rout ] ; then echo ERROR: output file exists already ; exit 5 ; fi" >> ${FILE}.sh
                 echo R CMD BATCH ${FILE}.R ${FILE}.Rout >> ${FILE}.sh
                 chmod +x ${FILE}.sh

                 # submit job to grid
                 qsub -j y -cwd ${FILE}.sh
         done
done

qstat

template.R
t-ALPHA-BETA.sh
t-ALPHA-BETA.R
t-ALPHA-BETA.Rout
t-ALPHA-BETA.sh.oNNN
/tmp/*
#!/bin/sh
#$ -S /bin/sh
WORKUNIT=`dsleepc`
sleep $WORKUNIT && echo Processed $WORKUNIT seconds

user@submit:~$ qsub -t 1-100 dsleep
Your job-array 490.1-100:1 ("dsleep") has been submitted

# alpha+1 is found in the SGE TASK number (qsub -t)
alphaenv <- Sys.getenv("SGE_TASK_ID")
alpha <- (as.numeric(alphaenv)-1)

qsub -N timmy test.sh

timmy.[oe]*
qsub -cwd test.sh

qsub -l mem_free=2500M test.sh

qsub -l arch=lx24-amd64 test.bin

#$ -l arch=lx24-amd64

qsub -l hostname=mako test.sh

qsub -t 1-50 test.sh
qsub -t 75-125 test.sh

qsub -q dnetc.q test.sh

qsub -hold_jid 380 test.sh

qstat -j 490
... lots of output ...
scheduling info:            queue instance "[email protected]" dropped because it is temporarily not available
                            queue instance "[email protected]" dropped because it is full
                            cannot run in queue "all.q" because it is not contained in its hard queue list (-q)

#!/bin/sh
#$ -S /bin/bash
# run in current directory, merge output
#$ -cwd -j y
# name the job
#$ -N Splus-lic
# require a single Splus license please
#$ -l splus=1
Splus -headless < $1
RETVAL=$?
if [ $RETVAL == 1 ] ; then
        echo No license for Splus
        sleep 60
        exit 99
fi
if [ $RETVAL == 127 ] ; then
        echo Splus not installed on this host
        # you could try something like this:
        #qalter -l splus=1,h=!`hostname` $JOB_ID
        sleep 60
        exit 99
fi
exit $RETVAL

#!/bin/sh
#$ -S /bin/sh
# run in current directory, merge output
#$ -cwd -j y
# name the job
#$ -N ml
# require a single Matlab license please
#$ -l matlab=1

matlab -nodisplay < $1

RETVAL=$?
if [ $RETVAL == 1 ] ; then
        echo No license for Matlab
        sleep 60
        exit 99
fi
if [ $RETVAL == 127 ] ; then
        echo Matlab not installed on this host, `hostname`
        # you could try something like this:
        #qalter -l matlab=1,h=!`hostname` $JOB_ID
        sleep 60
        exit 99
fi
exit $RETVAL

java Simulation $@ $SGE_TASK_ID $SGE_TASK_LAST

sge_task_id   = Integer.parseInt(args[args.length-2]);
sge_task_last = Integer.parseInt(args[args.length-1]);

#!/bin/sh
#$ -S /bin/sh

DATASET=confidential.csv

# check our environment
umask 0077
cd ${TMPDIR}
chmod 0700 .

# find srm
SRM=`which srm`
NOSRM=$?
if [ $NOSRM -eq 1 ] ; then
        echo system srm not found on this host, exiting >> /dev/stderr 
        exit 99
fi

# copy files from data store
RETRIES=0
while [ ${RETRIES} -lt 5 ] ; do
        ((RETRIES++))
        scp user@filestore:/store/confidential/${DATASET} .
        if [ $? -eq 0 ] ; then
                RETRIES=5000
        else
                # wait for up to a minute (MaxStartups 10 by default)
                sleep `expr ${RANDOM} / 542`
        fi
done
if [ ! -f ${DATASET} ] ; then
        # unable to copy dataset after 5 retries, quit but retry later
        echo unable to copy dataset from store >> /dev/stderr
        exit 99
fi
# if you were decrypting the dataset, you would do that here

# copy our code over too
cp /mount/code/*.class .

# process data
java Simulation ${DATASET}

# collect results
# (We are just printing to the screen.)

# clean up
${SRM} -v ${DATASET} >> /dev/stderr

echo END >> /dev/stderr

#!/bin/csh -f
#$ -cwd
#
#$ -N serial_test_job
#$ -m e
#$ -e sge.err
#$ -o sge.out
# requesting 12hrs wall clock time
#$ -l h_rt=12:00:00

/soft/linux/pkg/apbs/bin/apbs inputfile >& outputfile

#!/bin/csh -f
#$ -cwd
#
#$ -N parallel_test_job
#$ -m e
#$ -e sge.err
#$ -o sge.out
#$ -pe mpi 8
# requesting 10hrs wall clock time
#$ -l h_rt=10:00:00
#
echo Running on host `hostname`
echo Time is `date`
echo Directory is `pwd`
set orig_dir=`pwd`
echo This job runs on the following processors:
cat $TMPDIR/machines
echo This job has allocated $NSLOTS processors

# copy input and support files to a temporary directory on compute node
set temp_dir=/scratch/`whoami`.$$
mkdir $temp_dir
cp input_file support_file $temp_dir
cd $temp_dir

/opt/mpich/intel/bin/mpirun -v -machinefile $TMPDIR/machines \
           -np $NSLOTS $HOME/a.out ./input_file >& output_file

# copy files back and clean up
cp * $orig_dir
rm -rf $temp_dir

#!/bin/csh -f
#$ -cwd
#
#$ -N amber_test_job
#$ -m e
#$ -e sge.err
#$ -o sge.out
#$ -pe mpi 4
# requesting 6hrs wall clock time
#$ -l h_rt=6:00:00
#
setenv MPI_MAX_CLUSTER_SIZE 2

# export all environment variables to SGE 
#$ -V

echo Running on host `hostname`
echo Time is `date`
echo Directory is `pwd`
echo This job runs on the following processors:
cat $TMPDIR/machines
echo This job has allocated $NSLOTS processors

set in=./mdin
set out=./mdout
set crd=./inpcrd.equil

cat <<eof > $in
 short md, nve ensemble
 &cntrl
   ntx=7, irest=1,
   ntc=2, ntf=2, tol=0.0000001,
   nstlim=1000,
   ntpr=10, ntwr=10000,
   dt=0.001, vlimit=10.0,
   cut=9.,
   ntt=0, temp0=300.,
 &end
 &ewald
  a=62.23, b=62.23, c=62.23,
  nfft1=64,nfft2=64,nfft3=64,
  skinnb=2.,
 &end
eof

set sander=/soft/linux/pkg/amber8/exe.parallel/sander
set mpirun=/opt/mpich/intel/bin/mpirun

# needs prmtop and inpcrd.equil files

$mpirun -v -machinefile $TMPDIR/machines -np $NSLOTS \
   $sander -O -i $in -c $crd -o $out < /dev/null

/bin/rm -f $in restrt

# Set P4_GLOBMEMSIZE environment variable used to reserve memory in bytes
# for communication with shared memory on dual nodes
# (optimum/minimum size may need experimentation)
setenv P4_GLOBMEMSIZE 32000000

#!/bin/csh -f
#$ -cwd
#
#$ -N apbs-PARALLEL
#$ -e apbs-PARALLEL.errout
#$ -o apbs-PARALLEL.errout
#
# requesting 8 processors
#$ -pe mpi 8

echo -n "Running on: "
hostname

setenv APBSBIN_PARALLEL /soft/linux/pkg/apbs/bin/apbs-icc-parallel
setenv MPIRUN /opt/mpich/intel/bin/mpirun

echo "Starting apbs-PARALLEL calculation ..."  

$MPIRUN -v -machinefile $TMPDIR/machines -np 8 \
    $APBSBIN_PARALLEL apbs-PARALLEL.in >& apbs-PARALLEL.out

echo "Done."

#!/bin/csh -f
#$ -cwd
#
#$ -N charmm-test
#$ -e charmm-test.errout
#$ -o charmm-test.errout
#
# requesting 4 processors
#$ -pe mpi 4
# requesting 2hrs wall clock time
#$ -l h_rt=2:00:00
#

echo -n "Running on: "
hostname

setenv CHARMM /soft/linux/pkg/c31a1/bin/charmm.parallel.092204
setenv MPIRUN /soft/linux/pkg/mpich-1.2.6/intel/bin/mpirun

echo "Starting CHARMM calculation (using $NSLOTS processors)"

$MPIRUN -v -machinefile $TMPDIR/machines -np $NSLOTS \
    $CHARMM < mbcodyn.inp > mbcodyn.out

echo "Done."

#!/bin/csh -f
#$ -cwd
#
#$ -N namd-job
#$ -e namd-job.errout
#$ -o namd-job.out
#
# requesting 8 processors
#$ -pe mpi 8
# requesting 12hrs wall clock time
#$ -l h_rt=12:00:00
#

echo -n "Running on: "
hostname

/soft/linux/pkg/NAMD/namd2.sh namd_input_file > namd2.log

echo "Done."

#!/bin/csh -f
#$ -cwd
#
#$ -N gromacs-job
#$ -e gromacs-job.errout
#$ -o gromacs-job.out
#
# requesting 4 processors
#$ -pe mpich 4
# requesting 8hrs wall clock time
#$ -l h_rt=8:00:00
#

echo -n "Running on: "
cat $TMPDIR/machines

setenv MDRUN /soft/linux/pkg/gromacs/bin/mdrun-mpi
setenv MPIRUN /soft/linux/pkg/mpich/intel/bin/mpirun

$MPIRUN -v -machinefile $TMPDIR/machines -np $NSLOTS \
 $MDRUN -v -nice 0 -np $NSLOTS -s topol.tpr -o traj.trr \
  -c confout.gro -e ener.edr -g md.log

echo "Done."

qstat
qstat
qstat:
qstat -u hpc1***:
hpc1***
qstat -f:
qstat -j [job_id]:
qstat
-m
-M
-m
qsub -M myaddress@work -m be job.sh
qmod
kill
qmod -s/
-r
job_id
job_id
qstat
qsub
qdel
qdel job_id
job_id
qstat
qsub

$ qstat -j <job number>

$ qacct -j <job number>

qstat -f

qstat -F

$ qping sheridan 536 qmaster 1

$ qping kosh 537 execd 1

$ tail /var/log/sge_messages

$ tail -f /var/log/sge_messages

$ qsub -l hostname=<node/host name> <other job params>

$ qsub -l qname=<queue name> <other job params>

qstat

qdel

r

dr

ssh sheridan
su -
service sgemaster stop
cd /opt/sge/default/
mv spooldb spooldb.fubared
mkdir spooldb
cp spooldb.fubared/sge spooldb/
chown -R sgeadmin:sgeadmin spooldb
service sgemaster start

>qsub -W block=true dothis.sh 
qsub: Undefined attribute  MSG=detected presence of an unknown attribute
>qsub --version
version: 2.4.11

qsub -I -x dothis.sh

qsub -I

* sge-8.1.9.tar.gz, sge-8.1.9.tar.gz.sig:  Source tarball and PGP signature

* RPMs for Red Hat-ish systems, installing into /opt/sge with GUI
  installer and Hadoop support:

  * gridengine-8.1.9-1.el5.src.rpm:  Source RPM for RHEL, Fedora

  * gridengine-*8.1.9-1.el6.x86_64.rpm:  RPMs for RHEL 6 (and
    CentOS, SL)

  See < https://copr.fedorainfracloud.org/coprs/loveshack/SGE/ > for
  hwloc 1.6 RPMs if you need them for building/installing RHEL5 RPMs.

* Debian packages, installing into /opt/sge, not providing the GUI
  installer or Hadoop support:

  * sge_8.1.9.dsc, sge_8.1.9.tar.gz:  Source packaging.  See
    <http://wiki.debian.org/BuildingAPackage> , and see
    < http://arc.liv.ac.uk/downloads/SGE/support/  > if you need (a more
    recent) hwloc.

  * sge-common_8.1.9_all.deb, sge-doc_8.1.9_all.deb,
    sge_8.1.9_amd64.deb, sge-dbg_8.1.9_amd64.deb: Binary packages
    built on Debian Jessie.

* debian-8.1.9.tar.gz:  Alternative Debian packaging, for installing
  into /usr.

* arco-8.1.6.tar.gz:  ARCo source (unchanged from previous version)

* dbwriter-8.1.6.tar.gz:  compiled dbwriter component of ARCo
  (unchanged from previous version)

Am 09.04.2015 um 23:09 schrieb Feng Zhang:

> I know that some people use ssh as rsh_command, which may have similar problem?

Not when you have a tight integration of `ssh` in SGE:

https://arc.liv.ac.uk/SGE/htmlman/htmlman5/remote_startup.html section "SSH TIGHT INTEGRATION"

Then `ssh` can't spawn any process which escapes form SGE.

-- Reuti

> On Thu, Apr 9, 2015 at 3:46 PM, Reuti <reuti at staff.uni-marburg.de> wrote:
>> Am 09.04.2015 um 21:23 schrieb Chris Dagdigian:
>> 
>>> 
>>> I'm one of the people who has been arguing for years that technological methods for stopping abuse of GE systems never work in the long term because motivated users always have more time and interest than overworked admins so it's kind of embarrassing to ask this but ...
>>> 
>>> Does anyone have a script that runs on a node and prints out all the userland processes that are not explicitly a child of a sge_sheperd daemon?
>>> 
>>> I'm basically looking for a light way to scan a node just to see if there are users/tasks running that are outside the awareness of the SGE qmaster.  Back in the day when we talked about this it seemed that one easy method was just looking for user stuff that was not a child process of a SGE daemon process.
>>> 
>>> The funny thing is that it's not the HPC end users who do this. As the grid(s) get closer and closer to the enterprise I'm starting to see software developers and others trying to play games and then plead ignorance when asked "why did you SSH to a compute node and start a tomcat service out of your home directory?". heh.
>> 
>> Why allow `ssh` to a node at all? In my installations only the admins can do this. If users want to peek around on a node I have an interactive queue with a h_cpu limit of 60 seconds for this. So even login in to a node is controlled by SGE.
>> 
>> -- Reuti
>> 
>> 
>>> 
>>> -chris
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> users at gridengine.org
>>> https://gridengine.org/mailman/listinfo/users
>> 
>> 
>> _______________________________________________
>> users mailing list
>> users at gridengine.org
>> https://gridengine.org/mailman/listinfo/users
> 
> 
> 
> -- 
> Best,
> 
> Feng
> 
users
gridengine.debian
sge
$ cat job-dep.dot
digraph jobs101 {
        job_1 -> job_11;
        job_1 -> job_12;
        job_1 -> job_13;
        job_11 -> job_111;
        job_12 -> job_111;
        job_2 -> job_13;
        job_2 -> job_21;
        job_3 -> job_21;
        job_3 -> job_31;
}

$ dot -Tpng -o job-dep.png job-dep.dot

$ cat ./dot2sge.tcl
#! /usr/local/bin/tclsh

if { $argc != 1 } {
        puts stderr "Usage: $argv0 "
        exit 1
}
set dotfile [lindex $argv 0]
if { [file exists $dotfile] == 0 } {
        puts stderr "Error. $dotfile does not exist"
        exit 2
}

# assume simple directed graph a -> b

set fp [open $dotfile r]
set data [read $fp]
close $fp

set sge_jobs {}
foreach i [split [lindex $data 2] {;}] {
        if { [regexp {(\w+)\s*->\s*(\w+)} $i x parent child] != 0 } {
                lappend sge_jobs $parent
                lappend sge_jobs $child

                lappend sge_job_rel($parent) $child
        }
}

# submit unique jobs, and hold
set queue all.q
set sge_unique_jobs [lsort -unique $sge_jobs]
foreach i $sge_unique_jobs {
        puts "qsub -h -q $queue -N $i job-submit.sh"
}

# alter the job dependency, but unhold after all the hold relationships are
# established
foreach i $sge_unique_jobs {
        if { [info exists sge_job_rel($i)] } {
                # with dependency
                puts "qalter -hold_jid [join $sge_job_rel($i) {,}] $i"
        }
}
foreach i $sge_unique_jobs {
        puts "qalter -h U $i"
}

$ ./dot2sge.tcl job-dep.dot
qsub -h -q all.q -N job_1 job-submit.sh
qsub -h -q all.q -N job_11 job-submit.sh
qsub -h -q all.q -N job_111 job-submit.sh
qsub -h -q all.q -N job_12 job-submit.sh
qsub -h -q all.q -N job_13 job-submit.sh
qsub -h -q all.q -N job_2 job-submit.sh
qsub -h -q all.q -N job_21 job-submit.sh
qsub -h -q all.q -N job_3 job-submit.sh
qsub -h -q all.q -N job_31 job-submit.sh
qalter -hold_jid job_11,job_12,job_13 job_1
qalter -hold_jid job_111 job_11
qalter -hold_jid job_111 job_12
qalter -hold_jid job_13,job_21 job_2
qalter -hold_jid job_21,job_31 job_3
qalter -h U job_1
qalter -h U job_11
qalter -h U job_111
qalter -h U job_12
qalter -h U job_13
qalter -h U job_2
qalter -h U job_21
qalter -h U job_3
qalter -h U job_31

#
# ----------below is a very simple script
#
$ cat job-submit.sh
#! /bin/sh
#$ -S /bin/sh

date
sleep 10

#
# ----------run all the qsub to submit jobs, but put them on hold
#
$ qsub -h -q all.q -N job_1 job-submit.sh
Your job 333 ("job_1") has been submitted.
$ qsub -h -q all.q -N job_11 job-submit.sh
Your job 334 ("job_11") has been submitted.
$ qsub -h -q all.q -N job_111 job-submit.sh
Your job 335 ("job_111") has been submitted.
$ qsub -h -q all.q -N job_12 job-submit.sh
Your job 336 ("job_12") has been submitted.
$ qsub -h -q all.q -N job_13 job-submit.sh
Your job 337 ("job_13") has been submitted.
$ qsub -h -q all.q -N job_2 job-submit.sh
Your job 338 ("job_2") has been submitted.
$ qsub -h -q all.q -N job_21 job-submit.sh
Your job 339 ("job_21") has been submitted.
$ qsub -h -q all.q -N job_3 job-submit.sh
Your job 340 ("job_3") has been submitted.
$ qsub -h -q all.q -N job_31 job-submit.sh
Your job 341 ("job_31") has been submitted.

#
# ----------show the status, all jobs are in hold position (hqw)
#
$ qstat -f
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
all.q@sgeexec0                 BIP   0/4       0.01     sol-amd64
----------------------------------------------------------------------------
all.q@sgeexec1                 BIP   0/4       0.01     sol-amd64
----------------------------------------------------------------------------
all.q@sgeexec2                 BIP   0/4       0.01     sol-amd64

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
    333 0.00000 job_1      chihung      hqw   07/19/2007 21:04:34     1
    334 0.00000 job_11     chihung      hqw   07/19/2007 21:04:34     1
    335 0.00000 job_111    chihung      hqw   07/19/2007 21:04:34     1
    336 0.00000 job_12     chihung      hqw   07/19/2007 21:04:34     1
    337 0.00000 job_13     chihung      hqw   07/19/2007 21:04:34     1
    338 0.00000 job_2      chihung      hqw   07/19/2007 21:04:34     1
    339 0.00000 job_21     chihung      hqw   07/19/2007 21:04:34     1
    340 0.00000 job_3      chihung      hqw   07/19/2007 21:04:34     1
    341 0.00000 job_31     chihung      hqw   07/19/2007 21:04:34     1

#
# ----------register the job dependency
#
$ qalter -hold_jid job_11,job_12,job_13 job_1
modified job id hold list of job 333
   blocking jobs: 334,336,337
   exited jobs:   NONE
$ qalter -hold_jid job_111 job_11
modified job id hold list of job 334
   blocking jobs: 335
   exited jobs:   NONE
$ qalter -hold_jid job_111 job_12
modified job id hold list of job 336
   blocking jobs: 335
   exited jobs:   NONE
$ qalter -hold_jid job_13,job_21 job_2
modified job id hold list of job 338
   blocking jobs: 337,339
   exited jobs:   NONE
$ qalter -hold_jid job_21,job_31 job_3
modified job id hold list of job 340
   blocking jobs: 339,341
   exited jobs:   NONE

#
# ----------release all the holds and let SGE to sort itself out
#
$ qalter -h U job_1
modified hold of job 333
$ qalter -h U job_11
modified hold of job 334
$ qalter -h U job_111
modified hold of job 335
$ qalter -h U job_12
modified hold of job 336
$ qalter -h U job_13
modified hold of job 337
$ qalter -h U job_2
modified hold of job 338
$ qalter -h U job_21
modified hold of job 339
$ qalter -h U job_3
modified hold of job 340
$ qalter -h U job_31
modified hold of job 341

#
# ----------query SGE stats
#
$ qstat -f
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
all.q@sgeexec0                 BIP   0/4       0.01     sol-amd64
----------------------------------------------------------------------------
all.q@sgeexec1                 BIP   0/4       0.01     sol-amd64
----------------------------------------------------------------------------
all.q@sgeexec2                 BIP   0/4       0.01     sol-amd64

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
    333 0.00000 job_1      chihung      hqw   07/19/2007 21:04:34     1
    334 0.00000 job_11     chihung      hqw   07/19/2007 21:04:34     1
    335 0.00000 job_111    chihung      qw    07/19/2007 21:04:34     1
    336 0.00000 job_12     chihung      hqw   07/19/2007 21:04:34     1
    337 0.00000 job_13     chihung      qw    07/19/2007 21:04:34     1
    338 0.00000 job_2      chihung      hqw   07/19/2007 21:04:34     1
    339 0.00000 job_21     chihung      qw    07/19/2007 21:04:34     1
    340 0.00000 job_3      chihung      hqw   07/19/2007 21:04:34     1
    341 0.00000 job_31     chihung      qw    07/19/2007 21:04:34     1

#
# ----------some jobs started to run
#
$ qstat -f
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
all.q@sgeexec0                 BIP   2/4       0.01     sol-amd64
    339 0.55500 job_21     chihung      r     07/19/2007 21:05:36     1
    341 0.55500 job_31     chihung      r     07/19/2007 21:05:36     1
----------------------------------------------------------------------------
all.q@sgeexec1                 BIP   1/4       0.01     sol-amd64
    335 0.55500 job_111    chihung      r     07/19/2007 21:05:36     1
----------------------------------------------------------------------------
all.q@sgeexec2                 BIP   1/4       0.01     sol-amd64
    337 0.55500 job_13     chihung      r     07/19/2007 21:05:36     1

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
    333 0.00000 job_1      chihung      hqw   07/19/2007 21:04:34     1
    334 0.00000 job_11     chihung      hqw   07/19/2007 21:04:34     1
    336 0.00000 job_12     chihung      hqw   07/19/2007 21:04:34     1
    338 0.00000 job_2      chihung      hqw   07/19/2007 21:04:34     1
    340 0.00000 job_3      chihung      hqw   07/19/2007 21:04:34     1

$ qstat -f
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
all.q@sgeexec0                 BIP   2/4       0.01     sol-amd64
    339 0.55500 job_21     chihung      r     07/19/2007 21:05:36     1
    341 0.55500 job_31     chihung      r     07/19/2007 21:05:36     1
----------------------------------------------------------------------------
all.q@sgeexec1                 BIP   1/4       0.01     sol-amd64
    335 0.55500 job_111    chihung      r     07/19/2007 21:05:36     1
----------------------------------------------------------------------------
all.q@sgeexec2                 BIP   1/4       0.01     sol-amd64
    337 0.55500 job_13     chihung      r     07/19/2007 21:05:36     1

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
    333 0.00000 job_1      chihung      hqw   07/19/2007 21:04:34     1
    334 0.00000 job_11     chihung      hqw   07/19/2007 21:04:34     1
    336 0.00000 job_12     chihung      hqw   07/19/2007 21:04:34     1
    338 0.00000 job_2      chihung      hqw   07/19/2007 21:04:34     1
    340 0.00000 job_3      chihung      hqw   07/19/2007 21:04:34     1

$ qstat -f
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
all.q@sgeexec0                 BIP   0/4       0.01     sol-amd64
----------------------------------------------------------------------------
all.q@sgeexec1                 BIP   0/4       0.01     sol-amd64
----------------------------------------------------------------------------
all.q@sgeexec2                 BIP   0/4       0.01     sol-amd64

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
    333 0.00000 job_1      chihung      hqw   07/19/2007 21:04:34     1
    334 0.00000 job_11     chihung      qw    07/19/2007 21:04:34     1
    336 0.00000 job_12     chihung      qw    07/19/2007 21:04:34     1
    338 0.00000 job_2      chihung      qw    07/19/2007 21:04:34     1
    340 0.00000 job_3      chihung      qw    07/19/2007 21:04:34     1

$ qstat -f
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
all.q@sgeexec0                 BIP   2/4       0.01     sol-amd64
    338 0.55500 job_2      chihung      r     07/19/2007 21:05:51     1
    340 0.55500 job_3      chihung      r     07/19/2007 21:05:51     1
----------------------------------------------------------------------------
all.q@sgeexec1                 BIP   1/4       0.01     sol-amd64
    334 0.55500 job_11     chihung      r     07/19/2007 21:05:51     1
----------------------------------------------------------------------------
all.q@sgeexec2                 BIP   1/4       0.01     sol-amd64
    336 0.55500 job_12     chihung      r     07/19/2007 21:05:51     1

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
    333 0.00000 job_1      chihung      hqw   07/19/2007 21:04:34     1

$ qstat -f
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
all.q@sgeexec0                 BIP   2/4       0.01     sol-amd64
    338 0.55500 job_2      chihung      r     07/19/2007 21:05:51     1
    340 0.55500 job_3      chihung      r     07/19/2007 21:05:51     1
----------------------------------------------------------------------------
all.q@sgeexec1                 BIP   1/4       0.01     sol-amd64
    334 0.55500 job_11     chihung      r     07/19/2007 21:05:51     1
----------------------------------------------------------------------------
all.q@sgeexec2                 BIP   1/4       0.01     sol-amd64
    336 0.55500 job_12     chihung      r     07/19/2007 21:05:51     1

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
    333 0.00000 job_1      chihung      hqw   07/19/2007 21:04:34     1

$ qstat -f
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
all.q@sgeexec0                 BIP   0/4       0.01     sol-amd64
----------------------------------------------------------------------------
all.q@sgeexec1                 BIP   0/4       0.01     sol-amd64
----------------------------------------------------------------------------
all.q@sgeexec2                 BIP   0/4       0.01     sol-amd64

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
    333 0.00000 job_1      chihung      qw    07/19/2007 21:04:34     1

$ qstat -f
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
all.q@sgeexec0                 BIP   0/4       0.01     sol-amd64
----------------------------------------------------------------------------
all.q@sgeexec1                 BIP   0/4       0.01     sol-amd64
----------------------------------------------------------------------------
all.q@sgeexec2                 BIP   1/4       0.01     sol-amd64
    333 0.55500 job_1      chihung      r     07/19/2007 21:06:06     1

$ qstat -f
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
all.q@sgeexec0                 BIP   0/4       0.01     sol-amd64
----------------------------------------------------------------------------
all.q@sgeexec1                 BIP   0/4       0.01     sol-amd64
----------------------------------------------------------------------------
all.q@sgeexec2                 BIP   1/4       0.01     sol-amd64
    333 0.55500 job_1      chihung      r     07/19/2007 21:06:06     1

#
# ----------output of all jobs, you can see job job_1/2/3 finished last
#
$ grep 2007 job_*.o*
job_111.o335:Thu Jul 19 21:05:36 SGT 2007
job_11.o334:Thu Jul 19 21:05:51 SGT 2007
job_12.o336:Thu Jul 19 21:05:51 SGT 2007
job_13.o337:Thu Jul 19 21:05:36 SGT 2007
job_1.o333:Thu Jul 19 21:06:06 SGT 2007
job_21.o339:Thu Jul 19 21:05:36 SGT 2007
job_2.o338:Thu Jul 19 21:05:51 SGT 2007
job_31.o341:Thu Jul 19 21:05:37 SGT 2007
job_3.o340:Thu Jul 19 21:05:52 SGT 2007

Softpanorama May the source be with you, but remember the KISS principle ;-)	Home	Switchboard	Unix Administration	Red Hat	TCP/IP Networks	Neoliberalism	Toxic Managers
	(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and bastardization of classic Unix

News	Cluster job schedulers	Recommended Links	Implementations	Son of Grid Engine	Grid Scheduler Documentation
Starting and Killing SGE Daemons	SGE hosts	Submitting parallel OpenMPI jobs	Monitoring and Controlling Jobs	Monitoring Queues	Job or Queue Reported in Error State E
Troubleshooting	Monitoring and Controlling Jobs	Monitoring Queues	Queue instance in AU state	SGE hostgroups	Installation of SGE on CentOS 7
SGE Queues	SGE Queues	SGE Jobs	SGE hostgroups
Commands	qalter	qstat	qsub	qrsh	qmod
qacct	qdel	qhost	qhold	qconf
SGE Parallel Environment	qsub	SGE Submit Scripts	Submitting binaries in SGE	SGE hostgroups	Message Passing Interface
Managing User Access	Resource Quota Sets	SGE Consumable Resources	Restricting number of slots per server	Gridengine diag tool	Backup of SGE configuration
Installation of SCE on a small set of multicore servers	Installation of the Master Host	Installation of the Execution Hosts	sge_execd - Sun Grid Engine job execution agent	SGE shepherd	Usage of NFS
SGE cheat sheet	History	Glossary	Tips	Humor	Etc

Top Visited <p>Your browser does not support iframes.</p>					Switchboard
					Latest
					Past week
					Past month

Name	Last modified	Size

Parent Directory		-
README.txt	2016-02-29 23:39	1.5K
arco-8.1.6.tar.gz	2013-11-04 18:03	1.0M
arco-8.1.6.tar.gz.sig	2013-11-04 18:03	287
dbwriter-8.1.6.tar.gz	2013-11-04 18:03	3.8M
dbwriter-8.1.6.tar.gz.sig	2013-11-04 18:03	287
debian.tar.gz	2016-02-28 20:29	30K
gridengine-8.1.9-1.el5.src.rpm	2016-02-28 19:54	18M
gridengine-8.1.9-1.el5.x86_64.rpm	2016-02-28 19:54	19M
gridengine-8.1.9-1.el6.x86_64.rpm	2016-03-01 14:25	16M
gridengine-debuginfo-8.1.9-1.el5.x86_64.rpm	2016-02-28 19:54	38M
gridengine-debuginfo-8.1.9-1.el6.x86_64.rpm	2016-03-01 14:25	35M
gridengine-devel-8.1.9-1.el5.x86_64.rpm	2016-02-28 19:54	1.5M
gridengine-devel-8.1.9-1.el6.noarch.rpm	2016-03-01 14:25	1.2M
gridengine-drmaa4ruby-8.1.9-1.el5.x86_64.rpm	2016-02-28 19:54	13K
gridengine-drmaa4ruby-8.1.9-1.el6.noarch.rpm	2016-03-01 14:25	13K
gridengine-execd-8.1.9-1.el5.x86_64.rpm	2016-02-28 19:54	1.6M
gridengine-execd-8.1.9-1.el6.x86_64.rpm	2016-03-01 14:25	1.4M
gridengine-guiinst-8.1.9-1.el5.x86_64.rpm	2016-02-28 19:54	688K
gridengine-guiinst-8.1.9-1.el6.noarch.rpm	2016-03-01 14:25	696K
gridengine-qmaster-8.1.9-1.el5.x86_64.rpm	2016-02-28 19:54	1.7M
gridengine-qmaster-8.1.9-1.el6.x86_64.rpm	2016-03-01 14:25	1.5M
gridengine-qmon-8.1.9-1.el5.x86_64.rpm	2016-02-28 19:54	1.5M
gridengine-qmon-8.1.9-1.el6.x86_64.rpm	2016-03-01 14:25	1.4M
sge-8.1.9.tar.gz	2016-02-28 19:55	11M
sge-8.1.9.tar.gz.sig	2016-02-28 19:55	287
sge-common_8.1.9_all.deb	2016-02-28 20:20	1.4M
sge-dbg_8.1.9_amd64.deb	2016-02-28 20:20	17M
sge-doc_8.1.9_all.deb	2016-02-28 20:20	916K
sge_8.1.9.dsc	2016-02-28 20:20	1.5K
sge_8.1.9.tar.xz	2016-02-28 20:20	8.2M
sge_8.1.9_amd64.deb	2016-02-28 20:20	8.4M

Name	Description	Name	Description
kernel	Rocks Bootable Kernel Roll required	zfs-linux	ZFS On Linux Roll. Build and Manage Multi Terabyte File Systems.
base	Rocks Base Roll required	fingerprint	Fingerprint application dependencies
core	Core Roll required	hpc	Rocks HPC Roll
CentOS	CentOS Roll required	htcondor	HTCondor High Throughput Computing (version 8.2.8)
Updates-CentOS	CentOS Updates Roll required	sge	Sun Grid Engine (Open Grid Scheduler) job queueing system
kvm	Support for building KVM VMs on cluster nodes	perl	Support for Newer Version of Perl
ganglia	Cluster monitoring system from UCB	python	Python 2.7 and Python 3.x
area51	System security related services and utilities	openvswitch	Rocks integration of OpenVswitch

Grid Engine As a High Quality Unix/Linux Batch System

Old News ;-)

[Jan 29, 2021] I just noticed that in Sept 2020 Univa was bought by Altair

Jan 29, 2021 | finance.yahoo.com

[Dec 16, 2018] Index of -downloads-SGE-releases-8.1.9

Dec 16, 2018 | liv.ac.uk

[Dec 16, 2018] GitHub - gawbul-docker-sge Dockerfile to build a container with SGE installed

Dec 16, 2018 | github.com

[Dec 16, 2018] wtakase-sge-master - Docker Hub

Dec 16, 2018 | hub.docker.com

[Nov 08, 2018] SGE Installation on Centos 7

Nov 08, 2018 | liv.ac.uk

[Sep 07, 2018] Experiences with Sun Grid Engine

Notable quotes:

"... are important ..."

Sep 07, 2018 | auckland.ac.nz

[Aug 17, 2018] Rocks 7.0 Manzanita (CentOS 7.4)

Aug 17, 2018 | www.rocksclusters.org

[Aug 17, 2018] Installation of Son of Grid Engine(SGE) on CentOS7 by byeon iksu

Oct 15, 2017 | biohpc.blogspot.com

[Apr 25, 2018] GridEngine cannot be installed on CentOS7

Apr 25, 2018 | github.com

[Apr 24, 2018] SGE Installation on Centos 7

Apr 24, 2018 | liv.ac.uk

[May 08, 2017] Sample SGE scripts

May 08, 2017 | ctbp.ucsd.edu

[May 07, 2017] Monitoring and Controlling Jobs

[May 07, 2017] Why Won't My Job Run Correctly? ( aka How To Troubleshoot/Diagnose Problems)

May 07, 2017 | biowiki.org

[Feb 08, 2017] Sge,Torque, Pbs WhatS The Best Choise For A Ngs Dedicated Cluster

Feb 08, 2017 | www.biostars.org

[Mar 02, 2016] Son of Grid engine version 8.1.9 is availble

Mar 02, 2016 | liv.ac.uk

[Apr 16, 2015] Undocumented Feature of load sensors

[Apr 9, 2015] Anyone have scripts for detecting users who bypass grid engine

[Mar 30, 2015] Consumable configuration best practices question for hundreds of resources for specific group of nodes

[Sep 20, 2014] README for Son of Grid Engine version v8.1.7

[Sep 20, 2014] Son of Grid Engine

[Dec 30, 2013] gridengine-6.2u5-10.el6.4.x86_64.rpm CentOS 6 / RHEL 6 ...

[Oct 18, 2013] Oracle Grid Engine EOL

[Mar 03, 2013] Son of Grid Engine 8.1.3

[Jun 12, 2012] Son of Grid Engine 8.1.0 available

[May 07, 2012] The memories of a Product Manager The True Story of the Grid Engine Dream

April 25, 2012

[May 07, 2012] Sun-Oracle Grid Engine 6.2 installation on Windows Nirmal's Haven

[Dec 05, 2011] Son of Grid Engine 8.0.0d

[Jun 25, 2011] server farm Tricks Grid Engine License Juggling -

[Jun 25, 2011] Grid Engine for Users BioTeam Blog

Mar 10, 2011

[Jun 23, 2011] ds-gridengine-167114

[Dec 23, 2010 ] Oracle Grid Engine Changes for a Bright Future at Oracle

Dec 23, 2010 | DanT's Grid Blog

[Jan 19, 2010] Oracle Grid Engine Creators Move to Univa by Chris Preimesberger

2011-01-19 | eWeek.com

[Nov 30, 2009] Sun Grid Engine for Dummies

Nov 30, 2009 | DanT's Grid Blog

Google matched content

Softpanorama Recommended