Softpanorama

May the source be with you, but remember the KISS principle ;-)
Contents Bulletin Scripting in shell and Perl Network troubleshooting History Humor

Grid Engine As a High Quality Unix/Linux Batch System

News Cluster job schedulers Recommended Links Implementations Son of Grid Engine Grid Scheduler Documentation
Starting and Killing SGE Daemons SGE hosts Submitting parallel OpenMPI jobs Monitoring and Controlling Jobs Monitoring Queues Job or Queue Reported in Error State E
Troubleshooting Monitoring and Controlling Jobs Monitoring Queues Queue instance in AU state SGE hostgroups  
SGE Queues SGE Queues SGE Jobs SGE hostgroups    
Commands qalter qstat qsub qrsh qmod
qacct qdel qhost qhold qconf  
SGE Parallel Environment qsub SGE Submit Scripts Submitting binaries in SGE SGE hostgroups Message Passing Interface
Managing User Access Resource Quota Sets SGE Consumable Resources Restricting number of slots per server Gridengine diag tool Backup of SGE configuration
Installation of SCE on a small set of multicore servers Installation of the Master Host Installation of the Execution Hosts sge_execd - Sun Grid Engine job execution agent SGE shepherd Usage of NFS
SGE cheat sheet History Glossary Tips Humor Etc

Introduction

Grid Engine which is often called Sun Grid Engine (SGE) is a software classic. It is a batch jobs controller like  batch command on steroids, not so much a typical scheduler. At one point Sun open source the code, so open source version exists. It is the most powerful (albeit specialized) open source scheduler in existence. This is one of most valuable contributions of Sun to open source community as it provides industrial strength batch scheduler for Unix/Linux. 

Again this is one of the few classic Unix software systems. SGE 6.2u7 as released by Sun has all signs of the software classic. It inherited from Sun days a fairly good documentation (although software vandals  form Oracle destroyed a lot of valuable Sun documents). Any engineer or scientist can read the SGE User Manual and Installation Guide. Then install it (it set up a single queue all.q that can be used immediately), and start using it for his/her  needs in a day or two using the defaults without any training. As long as the  networking is reliable and jobs are submitted correctly, SGE runs them with nearly zero administration.

SGE is an very powerful and flexible batch system, that probably should became standard Linux subsystem  replacing or supplementing very basic batch command. It is available in several Linux  distributions such as Debian and Ubuntu as installable software package from the main depository. It is available from "other" depositories for CentOS, RHEL and Suse.

SGE has many options to help effectively use all of computational resources -- grid consisting of head node and computational nodes, each with certain number of cores (aka slots).

But the back side of power and flexibility is complexity. It is a complex system that requires study. You need carefully study the man pages and manuals to get most out of it.  The SGE mailing list is also a great educational resource.  Don't hesitate to ask questions. Then when you become an expert you can help others to get to speed with the product.  Installation is easy, but it usually take a from six month to a year for an isolated person to master the basics (much less if you have at least one expert on the floor). But like with any complex and powerful system even admins with 10years experience probably know only 60-70% of the SGE. 

Now that the pieces are falling into place, after Oracle's acquisition of Sun Microsystems and then abandoning the product, we can see that open source can help "vendor-proof" important parts of Unix. Unix did not have a decent batch scheduler before Grid Engine and now it has it. Grid Engine is alive and well, with a blog, a mailing list, a git repository, and even commercial version from Univa. Source code repositories can also be found from the Open Grid Scheduler (Grid Engine 2011.11 is compatible with Sun Grid Engine 6.2u7 ) and Son of Grid Engine projects. Open Grid Scheduler looks like abandonware (user group is active), while Son of Grid Engine is actively developed and currently represents the most viable open source SGE implementation.

As of version 8.1.8 it  is the most well debugged open source distribution. It might be especially attractive for those who what have experience with building software, but can be used by everybody on RHEL for which precompiled binaries exist.

Installation is pretty raw, but I tried to compensate for that by creating several pages which together document installation process of RHEL 6.5 or 6.6 pretty well:

I am still working on them, but even in the present form they are definitely more clear and useful then old Sun 6.2u5 installation documentation ;-).

Most of the SGE discussion uses the term cluster, but SGE is not linked to cluster technology in any meaningful way. In reality it designed to operate on a heterogeneous server farm.

We will use the term "server farm" here as an alternative and less ambitious term then the term "grid".

The default installation of Grid Engine assumes that the $SGE_ROOT directory (root directory for Grid Engine installation) is on a shared (for example by NFS) filesystem accessible by all hosts.

Right now SGE exists in several competing versions (see SGE implementations):

Key concepts

The Grid Engine system has functions typical for any powerful batch system:

But as a powerful batch system it is oriented on running multiple jobs optimally on the available resources Typically multiple computers(nodes) of a computational cluster. In its simplest form, a grid appears to users as a large system that provides a single point of access to multiple computers.

In other words grid is just a loose confederation of different computers which can run different OSes connected by regular TCP/IP links. In this sense it is very to the concept of a server farm. Grid engine does not care about uniformity of a sever farm and other them scheduling provides some central administration and monitoring capabilities to server farm environment.

SGE enables to distribute jobs across a grid and treat the grid as a single computational resource. It accepts jobs submitted by user(s) and schedule them to be run on appropriate systems in the grid. Users can submit as many jobs at a time as they want without being concerned about where the jobs run.

The main purpose of a batch system like SGE is to optimally utilize system resources that are present in a server farm. To schedule jobs ob available nodes in the most efficient way possible.

Every aspect of the batch system is accessible through the perl API. There is almost no documentation but a few sample scripts in gridengine/source/experimental/perlgui and on Internet such as by Wolfgand Frieebel from DESY ( see ifh.de) can be used as a guidance

Grid Engine architecture is structured around two main concepts:

Queue

A queue is a container for a class of jobs that are allowed to run on one or more hosts concurrently. Logically queue is a child of parallel environment (see below) although it can have several such parents.  It defines set of hosts and limitation of resources on those hosts.

A queue can reside on a single host, or a queue can extend across multiple hosts. The latter are called server farm queues. Server farm queues enable users and administrators to work with a server farm of execution hosts by means of a single queue configuration. Each host that is attached to the head node can belong to one of more queues

A queue determines certain job attributes. Association with a queue affects some of the things that can happen to a job. For example, if a queue is suspended, all jobs associated with that queue are also suspended.

Grid Engine has always have one default queue called all.q, which is created during  the initial installation and updated each time you add another execution host. You can  can have several additional queries each of them defining the set of host to run the jobs each with own computational requirements, for example, number of CPUs (ala slots) . The problem here is that without special measures queues are independent and if they contain the same set of nodes oversubscription can easily occur.

Each job should not exceed maximum parameters defined in queue (directly or indirectly via parralal environment). Then SGE scheduler can optimize the job mix for available resources by selecting the most suitable job from the input query and sending it to the most appropriate node of a grid.

Queue defines class of jobs that consume computer resources in a similar way. It also define list of computational nodes on which such jobs can be run.

Jobs typically are submitted to a queue.  

In the book Building N1™ Grid Solutions Preparing, Architecting, and Implementing Service-Centric Data Centers we can find an interesting although overblown statement:

The N1 part of the name was never intended to be a product name or a strategy name visible outside of Sun. The name leaked out and stuck. It is the abbreviation for the original project name “Network-1.” The SUN1 workstation was Sun's first workstation. It was designed specifically to be connected to the network.

N1 Grid systems are the first systems intended to be built with the network at their core and be based on the principle that an IP-based network is effectively the system bus.

Parallel environment

Parallel environment (PE) is the central notion of SGE and represents  a set of settings that tell Grid Engine how to start, stop, and manage jobs run by the class of  queues that are using this PE.

It sets the maximum number of slot that can be assigned to all jobs within a given queue. It also set some parameters for parallel messaging framework such as MPI, that is used by parallel jobs.

The usual syntax applies:

Parallel environment is the defining characteristic of each queue. Needs to be specified in correctly for queue to work. It is specified in pe_list attribute which can contain a single PE or list of PEs.  For example:

pe_list               make mpi mpi_fill_up

Each parallel environment determines a class of queues that use it and has several important attributes are:

  1. slots - the maximum number of job slots that the parallel environment is allowed to occupy at once
  2. allocation_rule" -> see man page. $pe_slots will allocate all slots for that job on a single host. Other rules support to schedule the job on multiple machines.
  3. control_slaves -> when set to "true" Grid Engine takes care about  starting the slave MPI taks. In this case MPI should be compiled with the option -with_sge
  4. job_is_first_task  The job_is_first_task parameter can be set to TRUE or FALSE. A value of TRUE indicates that the Sun Grid Engine job script already contains one of the tasks of the parallel application (the number of slots reserved for the job is the number of slots requested with the -pe switch), while a value of FALSE indicates that the job script (and its child processes) is not part of the parallel program (the number of slots reserved for the job is the number of slots requested with the -pe switch + 1).

    If wallclock accounting is used (execd_params ACCT_RESERVED_USAGE and/or SHARETREE_RESERVED_USAGE set to TRUE) and control_slaves is set to FALSE, the job_is_first_task parameter influences the accounting for the job: A value of TRUE means that accounting for cpu and requested memory gets multiplied by the number of slots requested with the -pe switch, if job_is_first_task is set to FALSE, the accounting information gets multiplied by number of slots + 1.
     

  5. accounting_summary This parameter is only checked if control_slaves (see above) is set to TRUE and thus Sun Grid Engine is the creator of the slave tasks of a parallel application via sge_execd(8) and sge_shepherd(8). In this case, accounting information is available for every single slave task started by Sun Grid Engine.

    The accounting_summary parameter can be set to TRUE or FALSE. A value of TRUE indicates that only a single accounting record is written to the accounting(5) file, containing the accounting summary of the whole job including all slave tasks, while a value of FALSE indicates an individual accounting(5) record is written for every slave task, as well as for the master task.

    Note:
    When running tightly integrated jobs with SHARETREE_RESERVED_USAGE set, and with having accounting_summary enabled in the parallel environment, reserved usage will only be reported by the master task of the parallel job. No per parallel task usage records will be sent from execd to qmaster, which can significantly reduce load on qmaster when running large tightly integrated parallel jobs.

Some important details are well explained in the blog post Configuring a New Parallel Environment

Architecture

Grid generally consists of a head node and computational nodes.  Head node typically runs sge_master and often called master host. Master host can and often is be the source of export of NFS to computational nodes but this is not necessary.

Daemons

Two daemons provide the functionality of the Grid Engine system. They are started via init scripts.

Documents and code location

Documentation for such a complex and powerful system is fragmentary and generally is of low quality. Even some man pages contain questionable information. Many does not explain features available well, or at all.

This is actually why this set of pages was created: to compensate for insufficient documentation for SGE. 

Although version of SGE generally are compatible, some features implementation depends on version used. See history for the list of major implementations.

Documentation to the last opensource version produced by Sun (version 6.2u5) is floating on the Internet. See, for example:

There are docs for older versions too,

And some presentations

Some old Sun Blueprints about SGe still can be found too. But generally Oracle behaved horribly bad as a trustee of Sun documentation portal. They proved to be simply vandals in this particular respect: discarding almost everything without mercy, destroying considerable value and an important part of Sun heritage.

Moreover, those documents organized into historical website might still can earn some money (and respect, which is solely missing now, after this vandalism) for Oracle if they preserved the website. No they discarded everything mercilessly.

Documentation for Oracle Grid Engine which is now abandonware might also floating around.

For more information see 

See also SGE Documentation


Top Visited
Switchboard
Latest
Past week
Past month

NEWS CONTENTS

Old News ;-)

[May 08, 2017] Sample SGE scripts

May 08, 2017 | ctbp.ucsd.edu
  1. An example of simple APBS serial job.
    #!/bin/csh -f
    #$ -cwd
    #
    #$ -N serial_test_job
    #$ -m e
    #$ -e sge.err
    #$ -o sge.out
    # requesting 12hrs wall clock time
    #$ -l h_rt=12:00:00
    
    /soft/linux/pkg/apbs/bin/apbs inputfile >& outputfile
    
    
  2. An example script for running executable a.out in parallel on 8 CPUs. (Note: For your executable to run in parallel it must be compiled with parallel library like MPICH, LAM/MPI, PVM, etc.) This script shows file staging, i.e., using fast local filesystem /scratch on the compute node in order to eliminate speed bottlenecks.
    #!/bin/csh -f
    #$ -cwd
    #
    #$ -N parallel_test_job
    #$ -m e
    #$ -e sge.err
    #$ -o sge.out
    #$ -pe mpi 8
    # requesting 10hrs wall clock time
    #$ -l h_rt=10:00:00
    #
    echo Running on host `hostname`
    echo Time is `date`
    echo Directory is `pwd`
    set orig_dir=`pwd`
    echo This job runs on the following processors:
    cat $TMPDIR/machines
    echo This job has allocated $NSLOTS processors
    
    # copy input and support files to a temporary directory on compute node
    set temp_dir=/scratch/`whoami`.$$
    mkdir $temp_dir
    cp input_file support_file $temp_dir
    cd $temp_dir
    
    /opt/mpich/intel/bin/mpirun -v -machinefile $TMPDIR/machines \
               -np $NSLOTS $HOME/a.out ./input_file >& output_file
    
    # copy files back and clean up
    cp * $orig_dir
    rm -rf $temp_dir
    
    
  3. An example of SGE script for Amber users (parallel run, 4 CPUs, with input file generated on the fly):
    #!/bin/csh -f
    #$ -cwd
    #
    #$ -N amber_test_job
    #$ -m e
    #$ -e sge.err
    #$ -o sge.out
    #$ -pe mpi 4
    # requesting 6hrs wall clock time
    #$ -l h_rt=6:00:00
    #
    setenv MPI_MAX_CLUSTER_SIZE 2
    
    # export all environment variables to SGE 
    #$ -V
    
    echo Running on host `hostname`
    echo Time is `date`
    echo Directory is `pwd`
    echo This job runs on the following processors:
    cat $TMPDIR/machines
    echo This job has allocated $NSLOTS processors
    
    set in=./mdin
    set out=./mdout
    set crd=./inpcrd.equil
    
    cat <<eof > $in
     short md, nve ensemble
     &cntrl
       ntx=7, irest=1,
       ntc=2, ntf=2, tol=0.0000001,
       nstlim=1000,
       ntpr=10, ntwr=10000,
       dt=0.001, vlimit=10.0,
       cut=9.,
       ntt=0, temp0=300.,
     &end
     &ewald
      a=62.23, b=62.23, c=62.23,
      nfft1=64,nfft2=64,nfft3=64,
      skinnb=2.,
     &end
    eof
    
    set sander=/soft/linux/pkg/amber8/exe.parallel/sander
    set mpirun=/opt/mpich/intel/bin/mpirun
    
    # needs prmtop and inpcrd.equil files
    
    $mpirun -v -machinefile $TMPDIR/machines -np $NSLOTS \
       $sander -O -i $in -c $crd -o $out < /dev/null
    
    /bin/rm -f $in restrt
    
    

    Please note that if you are running parallel amber8 you must include the following in your .cshrc :
    # Set P4_GLOBMEMSIZE environment variable used to reserve memory in bytes
    # for communication with shared memory on dual nodes
    # (optimum/minimum size may need experimentation)
    setenv P4_GLOBMEMSIZE 32000000
    
  4. An example of SGE script for APBS job (parallel run, 8 CPUs, running example input file which is included in APBS distribution (/soft/linux/src/apbs-0.3.1/examples/actin-dimer):
    #!/bin/csh -f
    #$ -cwd
    #
    #$ -N apbs-PARALLEL
    #$ -e apbs-PARALLEL.errout
    #$ -o apbs-PARALLEL.errout
    #
    # requesting 8 processors
    #$ -pe mpi 8
    
    echo -n "Running on: "
    hostname
    
    setenv APBSBIN_PARALLEL /soft/linux/pkg/apbs/bin/apbs-icc-parallel
    setenv MPIRUN /opt/mpich/intel/bin/mpirun
    
    echo "Starting apbs-PARALLEL calculation ..."  
    
    $MPIRUN -v -machinefile $TMPDIR/machines -np 8 \
        $APBSBIN_PARALLEL apbs-PARALLEL.in >& apbs-PARALLEL.out
    
    echo "Done."
    
    
  5. An example of SGE script for parallel CHARMM job (4 processors):
    #!/bin/csh -f
    #$ -cwd
    #
    #$ -N charmm-test
    #$ -e charmm-test.errout
    #$ -o charmm-test.errout
    #
    # requesting 4 processors
    #$ -pe mpi 4
    # requesting 2hrs wall clock time
    #$ -l h_rt=2:00:00
    #
    
    echo -n "Running on: "
    hostname
    
    setenv CHARMM /soft/linux/pkg/c31a1/bin/charmm.parallel.092204
    setenv MPIRUN /soft/linux/pkg/mpich-1.2.6/intel/bin/mpirun
    
    echo "Starting CHARMM calculation (using $NSLOTS processors)"
    
    $MPIRUN -v -machinefile $TMPDIR/machines -np $NSLOTS \
        $CHARMM < mbcodyn.inp > mbcodyn.out
    
    echo "Done."
    
    
  6. An example of SGE script for parallel NAMD job (8 processors):
    #!/bin/csh -f
    #$ -cwd
    #
    #$ -N namd-job
    #$ -e namd-job.errout
    #$ -o namd-job.out
    #
    # requesting 8 processors
    #$ -pe mpi 8
    # requesting 12hrs wall clock time
    #$ -l h_rt=12:00:00
    #
    
    echo -n "Running on: "
    hostname
    
    /soft/linux/pkg/NAMD/namd2.sh namd_input_file > namd2.log
    
    echo "Done."
    
    
  7. An example of SGE script for parallel Gromacs job (4 processors):
    #!/bin/csh -f
    #$ -cwd
    #
    #$ -N gromacs-job
    #$ -e gromacs-job.errout
    #$ -o gromacs-job.out
    #
    # requesting 4 processors
    #$ -pe mpich 4
    # requesting 8hrs wall clock time
    #$ -l h_rt=8:00:00
    #
    
    echo -n "Running on: "
    cat $TMPDIR/machines
    
    setenv MDRUN /soft/linux/pkg/gromacs/bin/mdrun-mpi
    setenv MPIRUN /soft/linux/pkg/mpich/intel/bin/mpirun
    
    $MPIRUN -v -machinefile $TMPDIR/machines -np $NSLOTS \
     $MDRUN -v -nice 0 -np $NSLOTS -s topol.tpr -o traj.trr \
      -c confout.gro -e ener.edr -g md.log
    
    echo "Done."
    

[May 07, 2017] Monitoring and Controlling Jobs

biowiki.org

After submitting your job to Grid Engine you may track its status by using either the qstat command, the GUI interface QMON, or by email.

Monitoring with qstat

The qstat command provides the status of all jobs and queues in the cluster. The most useful options are:

You can refer to the man pages for a complete description of all the options of the qstat command.

Monitoring Jobs by Electronic Mail

Another way to monitor your jobs is to make Grid Engine notify you by email on status of the job.

In your batch script or from the command line use the -m option to request that an email should be send and -M option to precise the email address where this should be sent. This will look like:

#$ -M myaddress@work
#$ -m beas

Where the (-m) option can select after which events you want to receive your email. In particular you can select to be notified at the beginning/end of the job, or when the job is aborted/suspended (see the sample script lines above).

And from the command line you can use the same options (for example):

qsub -M myaddress@work -m be job.sh

How do I control my jobs

Based on the status of the job displayed, you can control the job by the following actions:

Monitoring and controlling with QMON

You can also use the GUI QMON, which gives a convenient window dialog specifically designed for monitoring and controlling jobs, and the buttons are self explanatory.


For further information, see the SGE User's Guide ( PDF, HTML).


[May 07, 2017] Why Won't My Job Run Correctly? ( aka How To Troubleshoot/Diagnose Problems)

May 07, 2017 | biowiki.org

Does your job show "Eqw" or "qw" state when you run qstat , and just sits there refusing to run? Get more info on what's wrong with it using:

$ qstat -j <job number>

Does your job actually get dispatched and run (that is, qstat no longer shows it - because it was sent to an exec host, ran, and exited), but something else isn't working right? Get more info on what's wrong with it using:

$ qacct -j <job number> (especially see the lines "failed" and "exit_status")

If any of the above have an "access denied" message in them, it's probably a permissions problem. Your user account does not have the privileges to read from/write to where you told it (this happens with the -e and -o options to qsub often). So, check to make sure you do. Try, for example, to SSH into the node on which the job is trying to run (or just any node) and make sure that you can actually read from/write to the desired directories from there. While you're at it, just run the job manually from that node, see if it runs - maybe there's some library it needs that the particular node is missing.

To avoid permissions problems, cd into the directory on the NFS where you want your job to run, and submit from there using qsub -cwd to make sure it runs in that same directory on all the nodes.

Not a permissions problem? Well, maybe the nodes or the queues are unreachable. Check with:

qstat -f

or, for even more detail:

qstat -F

If the "state" column in qstat -f has a big E , that host or queue is in an error state due to... well, something. Sometimes an error just occurs and marks the whole queue as "bad", which blocks all jobs from running in that queue, even though there is nothing otherwise wrong with it. Use qmod -c <queue list> to clear the error state for a queue.

Maybe that's not the problem, though. Maybe there is some network problem preventing the SGE master from communicating with the exec hosts, such as routing problems or a firewall misconfiguration. You can troubleshoot these things with qping , which will test whether the SGE processes on the master node and the exec nodes can communicate.

N.B.: remember, the execd process on the exec node is responsible for establishing a TCP/IP connection to the qmaster process on the master node , not the other way around. The execd processes basically "phone home". So you have to run qping from the exec nodes , not the master node!

Syntax example (I am running this on a exec node, and sheridan is the SGE master):

$ qping sheridan 536 qmaster 1

where 536 is the port that qmaster is listening on, and 1 simply means that I am trying to reach a daemon. Can't reach it? Make sure your firewall has a hole on that port, that the routing is correct, that you can ping using the good old ping command, that the qmaster process is actually up, and so on.

Of course, you could ping the exec nodes from the master node, too, e.g. I can see if I can reach exec node kosh like this:

$ qping kosh 537 execd 1

but why would you do such a crazy thing? execd is responsible for reaching qmaster , not the other way around.

If the above checks out, check the messages log in /var/log/sge_messages on the submit and/or master node (on our Babylon Cluster , they're both the node sheridan ):

$ tail /var/log/sge_messages

Personally, I like running:

$ tail -f /var/log/sge_messages

before I submit the job, and then submit a job in a different window. The -f option will update the tail of the file as it grows, so you can see the message log change "live" as your job executes and see what's happening as things take place.

(Note that the above is actually a symbolic link I put in to the messages log in the qmaster spool directory, i.e. /opt/sge/default/spool/qmaster/messages .)

One thing that commonly goes wrong is permissions. Make sure that the user that submitted the job using qsub actually has the permissions to write error, output, and other files to the paths you specified.

For even more precise troubleshooting... maybe the problem is unique only to some nodes(s) or some queue(s)? To pin it down, try to run the job only on some specific node or queue:

$ qsub -l hostname=<node/host name> <other job params>

$ qsub -l qname=<queue name> <other job params>

Maybe you should also try to SSH into the problem nodes directly and run the job locally from there, as your own user, and see if you can get any more detail on why it fails.

If all else fails...

Sometimes, the SGE master host will become so FUBARed that we have to resort to brute, traumatizing force to fix it. The following solution is equivalent to fixing a wristwatch with a bulldozer, but seems to cause more good than harm (although I can't guarantee that it doesn't cause long-term harm in favor of a short-term solution).

Basically, you wipe the database that keeps track of SGE jobs on the master host, taking any problem "stuck" jobs with it. (At least that's what I think this does...)

I've found this useful when:

The solution:

ssh sheridan
su -
service sgemaster stop
cd /opt/sge/default/
mv spooldb spooldb.fubared
mkdir spooldb
cp spooldb.fubared/sge spooldb/
chown -R sgeadmin:sgeadmin spooldb
service sgemaster start

Wipe spooldb.fubared when you are confident that you won't need its contents again.

[Feb 08, 2017] Sge,Torque, Pbs WhatS The Best Choise For A Ngs Dedicated Cluster

Feb 08, 2017 | www.biostars.org
Question: Sge,Torque, Pbs : What'S The Best Choise For A Ngs Dedicated Cluster ? 11 gravatar for abihouee 4.4 years ago by abihouee 110 abihouee 110 wrote:

Sorry, it may be off topics...

We plan to install a scheduler on our cluster (DELL blade cluster over Infiniband storage on Linux CentOS 6.3). This cluster is dedicated to do NGS data analysis.

It seems to me that the most current is SGE, but since Oracle bougth the stuff, there are several alternative developments ( OpenGridEngine , SonGridEngine , Univa Grid Engine ...)

An other possible scheluler is Torque / PBS .

I' m a little bit lost in this scheduler forest ! Is there someone with any experiment on this or who knows some existing benchmark ?

Thanks a lot. Audrey

next-gen analysis clustering • 15k views ADD COMMENT • link modified 2.1 years ago by joe.cornish826 4.4k • written 4.4 years ago by abihouee 110 2

I worked with SGE for years at a genome center in Vancouver. Seemed to work quite well. Now I'm at a different genome center and we are using LSF but considering switching to SGE, which is ironic because we are trying to transition from Oracle DB to PostGres to get away from Oracle... SGE and LSF seemed to offer similar functionality and performance as far as I can tell. Both clusters have several 1000 cpus.

ADD REPLY • link modified 4.3 years ago • written 4.3 years ago by Malachi Griffith 14k 1

openlava ( source code ) is an open-source fork of LSF that while lacking some features does work fairly well.

ADD REPLY • link written 2.1 years ago by Malachi Griffith 14k 1

Torque is fine, and very well tested; either of the SGE forks are widely used in this sort of environment, and has qmake, which some people are very fond of. SLURM is another good possibility.

ADD REPLY • link modified 2.1 years ago • written 2.1 years ago by Jonathan Dursi 250 10 gravatar for matted 4.4 years ago by matted 6.3k Boston, United States matted 6.3k wrote:

I can only offer my personal experiences, with the caveat that we didn't do a ton of testing and so others may have differing opinions.

We use SGE, which installs relatively nicely on Ubuntu with the standard package manager (the gridengine-* packages). I'm not sure what the situation is on CentOS.

We previously used Torque/PBS, but the scheduler performance seemed poor and it bogged down with lots of jobs in the queue. When we switched to SGE, we didn't have any problems. This might be a configuration error on our part, though.

When I last tried out Condor (several years ago), installation was quite painful and I gave up. I believe it claims to work in a cross-platform environment, which might be interesting if for example you want to send jobs to Windows workstations.

LSF is another option, but I believe the licenses cost a lot.

My overall impression is that once you get a system running in your environment, they're mostly interchangeable (once you adapt your submission scripts a bit). The ease with which you can set them up does vary, however. If your situation calls for "advanced" usage (MPI integration, Kerberos authentication, strange network storage, job checkpointing, programmatic job submission with DRMAA, etc. etc.), you should check to see which packages seem to support your world the best.

ADD COMMENT • link written 4.4 years ago by matted 6.3k 1

Recent versions of torque have improved a great deal for large numbers of jobs, but yes, that was a real problem.

I also agree that all are more or less fine once they're up and working, and the main way to decide which to use would be to either (a) just pick something future users are familiar with, or (b) pick some very specific things you want to be able to accomplish with the resource manager/scheduler and start finding out which best support those features/workflows.

ADD REPLY • link written 2.1 years ago by Jonathan Dursi 250 4 gravatar for Jeremy Leipzig 4.4 years ago by Jeremy Leipzig 16k Philadelphia, PA Jeremy Leipzig 16k wrote:

Unlike PBS, SGE has qrsh , which is a command that actually run jobs in the foreground, allowing you to easily inform a script when a job is done. What will they think of next?

This is one area where I think the support you pay for going commercial might be worthwhile. At least you'll have someone to field your complaints.

ADD COMMENT • link modified 2.1 years ago • written 4.4 years ago by Jeremy Leipzig 16k 2

EDIT: Some versions of PBS also have qsub -W block=true that works in a very similar way to SGE qsrh.

ADD REPLY • link modified 4.4 years ago • written 4.4 years ago by Sean Davis 22k

you must have a newer version than me

>qsub -W block=true dothis.sh 
qsub: Undefined attribute  MSG=detected presence of an unknown attribute
>qsub --version
version: 2.4.11

ADD REPLY • link modified 4.4 years ago • written 4.4 years ago by Jeremy Leipzig 16k

For Torque and perhaps versions of PBS without -W block=true, you can use the following to switches. The behaviour is similar but when called, any embedded options to qsub will be ignored. Also, stderr/stdout is sent to the shell.

qsub -I -x dothis.sh
ADD REPLY • link modified 16 months ago • written 16 months ago by matt.demaere 0 1

My answer should be updated to say that any DRMAA-compatible cluster engine is fine, though running jobs through DRMAA (e.g. Snakemake --drmaa ) instead of with a batch scheduler may anger your sysadmin, especially if they are not familiar with scientific computing standards.

using qsub -I just to get a exit code is not ok

ADD REPLY • link written 2.1 years ago by Jeremy Leipzig 16k

Torque definitely allows interactive jobs -

qsub -I

As for Condor, I've never seen it used within a cluster; it was designed back in the day for farming out jobs between diverse resources (e.g., workstations after hours) and would have a lot of overhead for working within a homogeneous cluster. Scheduling jobs between clusters, maybe?

ADD REPLY • link modified 2.1 years ago • written 2.1 years ago by Jonathan Dursi 250 4 gravatar for Ashutosh Pandey 4.4 years ago by Ashutosh Pandey 10k Philadelphia Ashutosh Pandey 10k wrote:

We use Rocks Cluster Distribution that comes with SGE.

http://en.wikipedia.org/wiki/Rocks_Cluster_Distribution

ADD COMMENT • link written 4.4 years ago by Ashutosh Pandey 10k 1

+1 Rocks - If you're setting up a dedicated cluster, it will save you a lot of time and pain.

ADD REPLY • link written 4.3 years ago by mike.thon 30

I'm not a huge rocks fan personally, but one huge advantage, especially (but not only) if you have researchers who use XSEDE compute resources in the US, is that you can use the XSEDE campus bridging rocks rolls which bundle up a large number of relevant software packages as well as the cluster management stuff. That also means that you can directly use XSEDEs extensive training materials to help get the cluster's new users up to speed.

ADD REPLY • link written 2.1 years ago by Jonathan Dursi 250 3 gravatar for samsara 4.3 years ago by samsara 470 The Earth samsara 470 wrote:

It has been more than a year i have been using SGE for processing NGS data. I have not experienced any problem with it. I am happy with it. I have not used any other scheduler except Slurm few times.

ADD COMMENT • link written 4.3 years ago by samsara 470 2 gravatar for richard.deborja 2.1 years ago by richard.deborja 80 Canada richard.deborja 80 wrote:

Used SGE at my old institute, currently using PBS and I really wish we had SGE on the new cluster. Things I miss the most, qmake and the "-sync y" qsub option. These two were completely pipeline savers. I also appreciated the integration of MPI with SGE. Not sure how well it works with PBS as we currently don't have it installed.

ADD COMMENT • link written 2.1 years ago by richard.deborja 80 1 gravatar for joe.cornish826 2.1 years ago by joe.cornish826 4.4k United States joe.cornish826 4.4k wrote:

NIH's Biowulf system uses PBS, but most of my gripes about PBS are more about the typical user load. PBS always looks for the next smallest job, so your 30 node run that will take an hour can get stuck behind hundreds (and thousands) of single node jobs that take a few hours each. Other than that it seems to work well enough.

In my undergrad our cluster (UMBC Tara) uses SLURM, didn't have as many problems there but usage there was different, more nodes per user (82 nodes with ~100 users) and more MPI/etc based jobs. However, a grad student in my old lab did manage to crash the head nodes because we were rushing to rerun a ton of jobs two days before a conference. I think it was likely a result of the head node hardware and not SLURM. Made for a few good laughs.

ADD COMMENT • link modified 2.1 years ago • written 2.1 years ago by joe.cornish826 4.4k 2

"PBS always looks for the next smallest job" -- just so people know, that's not something inherent to PBS. That's a configurable choice the scheduler (probably maui in this case) makes, but you can easily configure the scheduler so that bigger jobs so that they don't get starved out by little jobs that get "backfilled" into temporarily open slots.

ADD REPLY • link written 2.1 years ago by Jonathan Dursi 250

Part of it is because Biowulf looks for the next smallest job but also prioritizes by how much cpu time a user has been consuming. If I've run 5 jobs with 30x 24 core nodes each taking 2 hours of wall time, I've used roughly 3600 CPU hours. If someone is using a single core on each node (simple because of memory requirements), they're basically at a 1:1 ratio between wall and cpu time. It will take a while for their CPU hours to catch up to mine.

It is a pain, but unlike math/physics/etc there are fewer programs in bioinformatics that make use of message passing (and when they do, they don't always need low-latency ICs), so it makes more sense to have PBS work for the generic case. This behavior is mostly seen on the ethernet IC nodes, there's a much smaller (245 nodes) system set up with infiniband for jobs that really need it (e.g. MrBayes, structural stuff).

Still I wish they'd try and strike a better balance. I'm guilty of it but it stinks when the queue gets clogged with memory intensive python/perl/R scripts that probably wouldn't need so much memory if they were written in C/C++/etc.

[Mar 02, 2016] Son of Grid engine version 8.1.9 is availble

Mar 02, 2016 | liv.ac.uk

README

This is Son of Grid Engine version v8.1.9.

See <http://arc.liv.ac.uk/repos/darcs/sge-release/NEWS> for information on recent changes. See <https://arc.liv.ac.uk/trac/SGE> for more information.

The .deb and .rpm packages and the source tarball are signed with PGP key B5AEEEA9.

* sge-8.1.9.tar.gz, sge-8.1.9.tar.gz.sig:  Source tarball and PGP signature

* RPMs for Red Hat-ish systems, installing into /opt/sge with GUI
  installer and Hadoop support:

  * gridengine-8.1.9-1.el5.src.rpm:  Source RPM for RHEL, Fedora

  * gridengine-*8.1.9-1.el6.x86_64.rpm:  RPMs for RHEL 6 (and
    CentOS, SL)

  See < https://copr.fedorainfracloud.org/coprs/loveshack/SGE/ > for
  hwloc 1.6 RPMs if you need them for building/installing RHEL5 RPMs.

* Debian packages, installing into /opt/sge, not providing the GUI
  installer or Hadoop support:

  * sge_8.1.9.dsc, sge_8.1.9.tar.gz:  Source packaging.  See
    <http://wiki.debian.org/BuildingAPackage> , and see
    < http://arc.liv.ac.uk/downloads/SGE/support/  > if you need (a more
    recent) hwloc.

  * sge-common_8.1.9_all.deb, sge-doc_8.1.9_all.deb,
    sge_8.1.9_amd64.deb, sge-dbg_8.1.9_amd64.deb: Binary packages
    built on Debian Jessie.

* debian-8.1.9.tar.gz:  Alternative Debian packaging, for installing
  into /usr.

* arco-8.1.6.tar.gz:  ARCo source (unchanged from previous version)

* dbwriter-8.1.6.tar.gz:  compiled dbwriter component of ARCo
  (unchanged from previous version)

More RPMs (unsigned, unfortunately) are available at < http://copr.fedoraproject.org/coprs/loveshack/SGE/ >.

[Sep 20, 2014] README for Son of Grid Engine version v8.1.7

arc.liv.ac.uk

This is Son of Grid Engine version v8.1.7.

See <http://arc.liv.ac.uk/repos/darcs/sge-release/NEWS> for information on
recent changes. See <https://arc.liv.ac.uk/trac/SGE> for more
information.

The .deb and .rpm packages and the source tarball are signed with PGP
key B5AEEEA9. For some reason the el5 signatures won't verify on
RHEL5, but they can be verified by transferring the rpms to an RHEL6
system.

More (S)RPMS may be available at http://jur-linux.org/rpms/el-updates/,
thanks to Florian La Roche.

[Sep 20, 2014] Son of Grid Engine

Contents

  1. News
  2. Repositories/Source
  3. Building
  4. Bug reporting, patches, and mail lists
  5. History
  6. Copyright and Naming
  7. Related
  8. Other Resources
  9. Contact

The Son of Grid Engine is a community project to continue Sun's old gridengine ​free software project that used to live at ​http://gridengine.sunsource.net after Oracle shut down the site and stopped contributing code. (Univa now own the copyright — see below.) It will maintain copies of as much as possible/useful from the old site.

The idea is to encourage sharing, in the spirit of the original project, informed by long experience of free software projects and scientific computing support. Please contribute, and share code or ideas for improvement, especially any ideas for encouraging contribution.

This effort precedes Univa taking over gridengine maintenance and subsequently apparently making it entirely proprietary, rather than the originally-promised ‘open core’. What's here was originally based on ​Univa's free code and was intended to be fed into that.

See also the ​gridengine.org site, in particular the ​mail lists hosted there. The gridengine.org users list is probably the best one to use for general gridengine discussions and questions which aren't specific to this project.

Currently most information you find for the gridengine v6.2u5 release will apply to this effort, but the non-free documentation that used to be available from Oracle has been expurgated and no-one has the time/interest to replace it. See also Other Resources, particularly ​extra information locally, and the ​download area.

This wiki isn't currently generally editable, but will be when spam protection is in place; yes it needs reorganizing and expanding. If you're a known past contributor to gridengine and would like to help, please get in touch for access or to make any other contributions.

[Dec 30, 2013] gridengine-6.2u5-10.el6.4.x86_64.rpm CentOS 6 / RHEL 6 ...

Download gridengine-6.2u5-10.el6.4.x86_64.rpm for CentOS 6 / RHEL 6 from EPEL repository. 10. Advanced search. About; Contact; ... /usr/share/gridengine/bin/lx26-amd64/qhost /usr/share/gridengine/bin/lx26 ... 2012-03-15 - Orion Poplawski <orion@cora.nwra.com> 6.2u5-10.2 - Use sge_/SGE_ in man pages.

pkgs.org/centos-6-rhel-6/epel-x86_64/gridengine-... More from pkgs.org

Univa Grid Engine Truth

What is Grid Engine?

Grid Engine is a job scheduler that has been around for years and it's FREE!! If you are already using it under an open source license you certainly don't need to buy it. Grid Engine started out as a Sun Microsystems product known as Sun Grid Engine (SGE). After Oracle purchased Sun it became Oracle Grid Engine.

Why is another company trying to sell Grid Engine if it is Free?

A small company called Univa has essentially taken away some of the Grid Engine development staff from Oracle is selling support bundled with what they feel is an upgraded source code that is no longer Open Source. This company wants to sell you Grid Engine support instead of you going to Oracle and buying it for essentially the same price. You can even get free Grid Engine Support here with the open source community, and here with the Oracle community.

And you can get the Oracle version here for free which is being developed just like the small company version is but WITH the blessing of Oracle who actually bought this product from Sun.

If you are looking at buying the univa company version of Grid Engine you might ask yourself what you are buying? Is there a free product that is the same? Yes, from Oracle and Source forge. Is there another more reputable version of the same product? Yes, from Oracle. Are there other schedulers out there that are more robust that you can buy? Yes, Platform Computing has an excellent product called LSF that can often be purchased for much less than univa grid engine can be. PBSWorks offers a scheduler that is very good as well as RTDA. There is even a new company that is developing the free Grid Engine source code as well as the core and is actively supporting the free community with support and upgrades called Scalable logic. They have even now come out with an upgraded free version of Grid Engine as Univa has attempted to but this version from Scalable Logic is free and totally open source. It has support for many Operating Systems including even Windows.

Are there risks in going with this version of Grid Engine from Univa?

It's possible that univa may tell you that you could be risking violation of software licensing agreements with Oracle or other parties by using certain versions of Grid Engine for free. They may try to use fear, uncertainty, and doubt (FUD) to scare you into buying with them in thinking that it will protect you from Oracle. It may, but before you buy you may want to check that out with Oracle and the open source community and find out for yourself because that may not be the real risk you face. What you may face with this small company is potentially more operational than legal.

If you think about it, they are essentially trying to make money off of a free open source product. This is not the most lucrative idea in the software world and makes the prospect of making money as a company doing this very difficult if not impossible. You might ask yourself if you think they are going to make it. They have carved out a software product and a team from one of the largest software companies in the world, trying to make money on a free product that Oracle bought with the Sun acquisition. If they do not make it and fail as a company, where will you be with your paid software subscription and product through them? If they do make it and then happen to gain the attention of Oracle and its Lawyers, where will you be if Oracle decides to take legal action against them, or just decides to shut them down? Do you really think that a small company with possibly faulty management and financials would have the resources to remain, let alone still be concerned with your support contract? Would your company be protected or could that liability extend to you as well? These might all be questions you would want to pose to Oracle or at least another party besides Univa if you decided on purchasing Grid Engine.

Either way, univa and its pay version of Grid Engine could be in a tough spot. No matter which way they go they may have a good chance of ending up insolvent or worse. If this happens where would your support contract with them be. Or worse still, what position would you be in with to Oracle at that point? Again, a very good question to ask Oracle. With all these risks it might be better to again look at the free version which even Oracle is offering as they themselves are showing commitment to Grid Engine and the enhancement of the free version.

[Oct 18, 2013] Oracle Grid Engine EOL

Effective October 22, 2013, Univa, a leader in Grid Engine technology, will assume product support for Oracle Grid Engine customers for the remaining term of their existing Oracle Grid Engine support contracts.

For continued access to Grid Engine product support from Univa, customers with an active support contract should visit support.univa.com, or contact Univa Support at support@univa.com or 800.370.5320.

For more details on this announcement or future sales and support inquiries for Grid Engine, please visit www.univa.com/oracle or contact oracle_customer@univa.com.

[Mar 03, 2013] Son of Grid Engine 8.1.3

Son of Grid Engine is a highly-scalable and versatile distributed resource manager for scheduling batch or interactive jobs on clusters or desktop farms. It is a community project to continue Sun's Grid... Engine.

It is competitive against proprietary systems and provides better scheduling features and scalability than other free DRMs like Torque, SLURM, Condor, and Lava

[Jun 12, 2012] Son of Grid Engine 8.1.0 available

SGE 8.1.0 is available from

http://arc.liv.ac.uk/downloads/SGE/releases/8.1.0

It corrects a few problems with the previous version, takes an overdue opportunity to adopt a more logical numbering now that tracking the Univa repo is irrelevant, and improves the RPM packaging.

The RPMs now include the GUI installer and the "herd" Hadoop integration built against a more recent Cloudera distribution. (The GUI installer was previously separate as the original IzPack packaging was non-distributable.)

Generally this distribution has hundreds of improvements not (freely?) available in others, including security fixes, maintained documentation, and easy building at least on recent GNU/Linux.

Univa Announces Grid Engine 8.1 to Further Evolve the Popular Software

Yahoo!

Univa, the Data Center Automation Company, announced today the release of Univa Grid Engine Version 8.1, the most widely deployed, distributed resource management software platform used by enterprises and research organizations across the globe. Univa Grid Engine is the industry-leading choice for workload management and integrating Big Data solutions while saving time and money through increased uptime and reduced total cost of ownership. Corporations in the industries of Oil and Energy, Life Sciences and Biology, and Semiconductors rely on Univa Grid Engine when they need mission-critical computing capacity to model and solve complex problems.

Key features include:

Jeppesen has implemented Univa Grid Engine to support their Crew & Fleet management products for distributing optimization jobs, RAVE compilations and Studio sessions. “Jeppesen has selected Univa Grid Engine as this was the most appealing alternative looking at both cost and Univa’s ability to make future enhancements to the product,” said Pete Catherall, Business Operations Manager, Jeppesen. “This is another example of that.”

[May 07, 2012] The memories of a Product Manager The True Story of the Grid Engine Dream

April 25, 2012

After Wolfgang left Sun, - many fine people in Sun had to leave at that time - it was frustrating to see how our efforts to have two Sun Grid Engine products (one available by subscription and one available as free Open Source) failed because of management veto. On one hand we were under pressure to be profitable as a unit, on the other hand, our customers appeared to have no reason to pay even one cent for a subscription or license.

Oracle still has IP control of Grid Engine. Both Univa and Oracle decided to make no more contributions to the open source. While in Oracle open source policies are clear, Univa, a champion of open source for many years, has surprised the community. This has created an agitated thread on Grid Engine discussion group.

[May 07, 2012] Sun-Oracle Grid Engine 6.2 installation on Windows Nirmal's Haven

Sun Grid Engine 6.2 Update 2 introduced the support for Windows Operating systems to run as worker nodes. Sun or Oracle Grid Engine as it’s being relabeled now is a distributed resource manager primarily used in HPC environment, but there’s more widespread use now with all the new features introduced as part of Update 5.

Here I’m going to detail a quick how-to of getting Grid Engine installed and running on Windows hosts. This is more applicable for Windows XP and Windows Server 2003, some of additional prerequisites required on the Windows hosts are now standard in Windows Server 2008 and Windows 7.

[Dec 05, 2011] Son of Grid Engine 8.0.0d

Son of Grid Engine is a highly-scalable and versatile distributed resource manager for scheduling batch or interactive jobs on server farms or desktop farms. It is a community project to continue Sun's Grid...

[Jun 25, 2011] server farm Tricks Grid Engine License Juggling -

Bio-IT World

...NIBR had already chosen Sun Grid Engine Enterprise Edition (SGEEE) to run on the server farm. The BioTeam was asked to deploy SGEEE and integrate several FLEXlm-licensed scientific applications. Acceptance tests for determining success were rigorous. The server farm had to withstand test cases developed by the researchers while automatically detecting and correcting license-related job errors without human intervention.

The core problem turned out to be the most straightforward to solve. To prevent the NIBR server farm from running jobs when no licenses were available, the Grid Engine scheduler needed to become license aware. This was accomplished via a combination of "load sensor" scripts and specially configured Grid Engine "resources."

· Load sensor scripts give Grid Engine operators the ability to collect additional system measurements to help make scheduling or resource allocation decisions.

· Resources are a Grid Engine concept used primarily by users who require a particular need to be met in order for a job to complete successfully. A user-requested resource could be dynamic ("run job only on a system with at least 2 GB of free memory") or static ("run job on the machine with laser printer attached").

The NIBR plan involved creating custom resource attributes within Grid Engine so that scientists could submit jobs with the requirement "only run this job if a license is available." If licenses were available, the jobs would be dispatched immediately; if not, the jobs would be held until licenses were available.

To this point, the project was easy. Much more difficult — and more interesting — were efforts to meet NIBR acceptance tests.

The first minor headache resulted from trying to accurately automate querying of the FLEXlm license servers. One FLEXlm license server was an older version that only revealed the number of currently in-use licenses. This meant that the total number of available licenses (equally important) needed to be hard-coded into Grid Engine. NIBR researchers felt strongly that this could create server farm management problems, so the license server was upgraded to a version that allowed the load sensor script to learn how many licenses were available.

The next problem was figuring out how to automatically detect jobs that still managed to fail with license-related errors. The root cause of these failures is the loose integration between the FLEXlm license servers and Grid Engine. Race conditions may occur when Grid Engine launches server farm jobs that do not immediately check out their licenses from the FLEXlm server. Delays can cause Grid Engine's internal license values to get slightly out of sync with the real values held by the license server.

Nasty race conditions between license servers and server farm resource management systems such as Grid Engine are mostly unavoidable at present. The solution everyone is hoping for is FLEXlm support of an API (application programming interface) for advance license reservation and checkout. Applications such as Grid Engine could then directly hook into the FLEXlm system rather than rely on external polling methods. Until this occurs, we are left with half-measures and workarounds.

[Jun 25, 2011] Grid Engine for Users BioTeam Blog

Mar 10, 2011

Back in the day …

Way back in 2009 I placed an aging copy of my Grid Engine Administration training materials online. Response has been fantastic and it’s still one of the more popular links on this blog.

Today

Well it’s probably past time I did something similar aimed at people actually interested in using Grid Engine rather than just administering it.

It’s not comprehensive or all that interesting but I am going to post a bunch of slides cherry picked from the collection of things I build custom training presentations from. Think of them as covering an intro-level view of Grid Engine use and usage.

Intro to Grid Engine Usage & Simple Workflows

There are two slide decks, both of which are mostly stripped of information that is unique to a particular client, customer or Grid Engine server farm.

The first presentation is a short and dry introduction aimed at a user audience – it explains what Grid Engine is, what it does and what the expectations of the users are. It then dives into commands and examples.

The second presentation is also aimed at a basic user audience but talks a bit more about workflows, pipelines and simple SGE features that make life a bit easier for people who need to do more than a few simple ‘qsub’ actions.

[Jun 23, 2011] ds-gridengine-167114

By abstracting end users from the specific machines processing the workload, machine failures can be taken in stride. When a machine fails, the workload it was processing can be requeued and rescheduled. While the machine remains down, new workload is scheduled around that machine, preventing end users from ever noticing the machine failure. In addition to the Oracle Grid Engine product's rich scheduling and workload management capabilities, it also has the ability to share resources among fixed services, such as between two Oracle Grid Engine server farms, resulting in even higher overall data center utilization. Included in this capability is the ability to reach out to a private or public cloud service provider to lease additional resources when needed. During peak workload periods, additional virtual machines can be leased from a cloud service provider to augment the on-site resources. When the workload subsides the leased cloud resources are released back to the cloud, minimizing the costs. Such cloud bursting capabilities allow an enterprise to handle regular and unpredictable peak workloads without resorting to purchasing excess additional

Son of Grid Engine

The Son of Grid Engine is a community project to continue Sun's old grid engine free software project that used to live at http://gridengine.sunsource.net, now that Oracle have shut down the site and are not contributing code. It will maintain copies of as much as possible/useful from the old site. Currently we do not have the old mail archives online, though articles from the old gridengine-users list from the last five years or so will be available soon, and Oracle have promised to donate artifacts from the old site, so we should be able to get complete copies of everything.

The idea is to encourage sharing, in the spirit of the original project, and informed by long experience of free software projects and scientific computing support. Please contribute, and share code or ideas for improvement, especially any ideas for encouraging contribution.

Currently any information you find for the grid engine v6.2u5 release will apply to this effort, including the v6.2u5 wiki documentation and pointers therefrom, including the grid engine.info site and its wiki. There may eventually also be useful info at the Oracle Grid Engine Forum. You should note its terms of use before posting there; they include even relinquishing moral rights.

This wiki isn't currently generally editable, but will be when spam protection is in place. If you're a known past contributor to grid engine and would like to help, please get in touch for access.

Oracle Grid Engine Creators Move to Univa by Chris Preimesberger

2011-01-19 | eWeek.com

As a result, Univa will offer engineering support for current Oracle Grid Engine deployments and will release a new Univa version of the DRM by March.

Univa revealed Jan. 19 that the principal engineers from the Sun/Oracle Grid Engine team, including Grid Engine founder and original project owner Fritz Ferstl, have left Oracle and are joining the company.

As a result, Univa will now offer engineering support for current Oracle Grid Engine deployments and will release a new Univa version of Grid Engine before the end of the first quarter of 2011.

Oracle Grid Engine software is a distributed resource management (DRM) system that manages the distribution of users' workloads to the best available compute resources within the system. While compute resources in a typical data center have utilization rates that average only 10 percent to 25 percent, the Oracle Grid Engine can help a company increase utilization to 80, 90 or even 95 percent, Oracle said.

This significant improvement comes from the intelligent distribution of workload to the most appropriate available resources.

When users submit their work to Oracle Grid Engine as jobs, the software monitors the current state of all resources in the server farm and is able to assign these jobs to the best-suited resources. Oracle Grid Engine gives administrators both the flexibility to accurately model their computing environments as resources and to translate business rules into policies that govern the use of those resources, Oracle said.

"Combining the Grid Engine and Univa technology offerings was a once-in-a-lifetime opportunity that the new Univa EMEA team and I just couldn't miss," Ferstl said. "Now we'll be able to interact with and serve users worldwide investigating and understanding their data center optimization needs."

Lisle, Ill.-based Univa will concentrate on improving the Grid Engine for technical computing and HPC use cases in addition to promoting the continuity of the Grid Engine open-source community, Univa said.

Oracle Grid Engine Changes for a Bright Future at Oracle

Dec 23, 2010 | DanT's Grid Blog

For the past decade, Oracle Grid Engine has been helping thousands of customers marshal the enterprise technical computing processes at the heart of bringing their products to market. Many customers have achieved outstanding results with it via higher data center utilization and improved performance. The latest release of the product provides best-in-class capabilities for resource management including: Hadoop integration, topology-aware scheduling, and on-demand connectivity to the cloud.

Oracle Grid Engine has a rich history, from helping BMW Oracle Racing prepare for the America’s Cup to helping isolate and identify the genes associated with obesity; from analyzing and predicting the world's financial markets to producing the digital effects for the popular Harry Potter series of films. Since 2001, the Grid Engine open source project has made Oracle Grid Engine functionality available for free to open source users. The Grid Engine open source community has grown from a handful of users in 2001 into the strong, self-sustaining community that it is now.

Today, we are entering a new chapter in Oracle Grid Engine’s life. Oracle has been working with key members of the open source community to pass on the torch for maintaining the open source code base to the Open Grid Scheduler project hosted on SourceForge. This transition will allow the Oracle Grid Engine engineering team to focus their efforts more directly on enhancing the product. In a matter of days, we will take definitive steps in order to roll out this transition. To ensure on-going communication with the open source community, we will provide the following services:

Oracle is committed to enhancing Oracle Grid Engine as a commercial product and has an exciting road map planned. In addition to developing new features and functionality to continue to improve the customer experience, we also plan to release game-changing integrations with several other Oracle products, including Oracle Enterprise Manager and Oracle Coherence. Also, as Oracle's cloud strategy unfolds, we expect that the Oracle Grid Engine product's role in the overall strategy will continue to grow. To discuss our general plans for the product, we would like to invite you to join us for a live webcast on Oracle Grid Engine’s new road map. Click here to register.

Next Steps:

Grid Computing Oracle Grid Engine Software Sun Microsystems

SGE 6.2u3 beta release

[Nov 30, 2009] Sun Grid Engine for Dummies

Nov 30, 2009 | DanT's Grid Blog
Servers tend to be used for one of two purposes: running services or processing workloads. Services tend to be long-running and don't tend to move around much. Workloads, however, such as running calculations, are usually done in a more "on demand" fashion. When a user needs something, he tells the server, and the server does it. When it's done, it's done. For the most part it doesn't matter on which particular machine the calculations are run. All that matters is that the user can get the results. This kind of work is often called batch, offline, or interactive work. Sometimes batch work is called a job. Typical jobs include processing of accounting files, rendering images or movies, running simulations, processing input data, modeling chemical or mechanical interactions, and data mining. Many organizations have hundreds, thousands, or even tens of thousands of machines devoted to running jobs.

Now, the interesting thing about jobs is that (for the most part) if you can run one job on one machine, you can run 10 jobs on 10 machines or 100 jobs on 100 machines. In fact, with today's multi-core chips, it's often the case that you can run 4, 8, or even 16 jobs on a single machine. Obviously, the more jobs you can run in parallel, the faster you can get your work done. If one job takes 10 minutes on one machine, 100 jobs still only take ten minutes when run on 100 machines. That's much better than 1000 minutes to run those 100 jobs on a single machine. But there's a problem. It's easy for one person to run one job on one machine. It's still pretty easy to run 10 jobs on 10 machines. Running 1600 jobs on 100 machines is a tremendous amount of work. Now imagine that you have 1000 machines and 100 users all trying to running 1600 jobs each. Chaos and unhappiness would ensue.

To solve the problem of organizing a large number of jobs on a set of machines, distributed resource managers (DRMs) were created. (A DRM is also sometimes called a workload manager. I will stick with the term, DRM.) The role of a DRM is to take a list of jobs to be executed and distributed them across the available machines. The DRM makes life easier for the users because they don't have to track all their jobs themselves, and it makes life easier for the administrators because they don't have to manage users' use of the machines directly. It's also better for the organization in general because a DRM will usually do a much better job of keeping the machines busy than users would on their own, resulting in much higher utilization of the machines. Higher utilization effectively means more compute power from the same set of machines, which makes everyone happy.

Here's a bit more terminology, just to make sure we're all on the same page. A cluster is a group of machines cooperating to do some work. A DRM and the machines it manages compose a cluster. A cluster is also often called a grid. There has historically been some debate about what exactly a grid is, but for most purposes grid can be used interchangeably with cluster. Cloud computing is a hot topic that builds on concepts from grid/cluster computing. One of the defining characteristics of a cloud is the ability to "pay as you go." Sun Grid Engine offers an accounting module that can track and report on fine grained usage of the system. Beyond that, Sun Grid Engine now offers deep integration to other technologies commonly being used in the cloud, such as Apache Hadoop.

Onobre of the best ways to show Sun Grid Engine's flexibility is to take a look a some unusual use cases. These are by no means exhaustive, but they should serve to give you an idea of what can be done with the Sun Grid Engine software.Further Reading

For more information about Sun Grid Engine, here are some useful links:

Chi Hung Chan SGE Grid Job Dependency

SGE Grid Job Dependency

It is possible to describe SGE (Sun Grid Engine) job (or any other grid engine) dependency in a DAG (Directed Acyclic Graph) format. By taking advantage of the opensource Graphviz, it is very easy to document this dependency in DOT language format. Below shows you a sample DOT file:
$ cat job-dep.dot
digraph jobs101 {
        job_1 -> job_11;
        job_1 -> job_12;
        job_1 -> job_13;
        job_11 -> job_111;
        job_12 -> job_111;
        job_2 -> job_13;
        job_2 -> job_21;
        job_3 -> job_21;
        job_3 -> job_31;
}

With this DOT file, one can generate the graphical representation:

$ dot -Tpng -o job-dep.png job-dep.dot

It is also possible to derive the corresponding SGE commands by the following Tcl script.

$ cat ./dot2sge.tcl
#! /usr/local/bin/tclsh


if { $argc != 1 } {
        puts stderr "Usage: $argv0 "
        exit 1
}
set dotfile [lindex $argv 0]
if { [file exists $dotfile] == 0 } {
        puts stderr "Error. $dotfile does not exist"
        exit 2
}


# assume simple directed graph a -> b

set fp [open $dotfile r]
set data [read $fp]
close $fp


set sge_jobs {}
foreach i [split [lindex $data 2] {;}] {
        if { [regexp {(\w+)\s*->\s*(\w+)} $i x parent child] != 0 } {
                lappend sge_jobs $parent
                lappend sge_jobs $child

                lappend sge_job_rel($parent) $child
        }
}


# submit unique jobs, and hold
set queue all.q
set sge_unique_jobs [lsort -unique $sge_jobs]
foreach i $sge_unique_jobs {
        puts "qsub -h -q $queue -N $i job-submit.sh"
}


# alter the job dependency, but unhold after all the hold relationships are
# established
foreach i $sge_unique_jobs {
        if { [info exists sge_job_rel($i)] } {
                # with dependency
                puts "qalter -hold_jid [join $sge_job_rel($i) {,}] $i"
        }
}
foreach i $sge_unique_jobs {
        puts "qalter -h U $i"
}

Run this Tcl script to generate the SGE submission commands and alternation commands to register the job dependency

$ ./dot2sge.tcl job-dep.dot
qsub -h -q all.q -N job_1 job-submit.sh
qsub -h -q all.q -N job_11 job-submit.sh
qsub -h -q all.q -N job_111 job-submit.sh
qsub -h -q all.q -N job_12 job-submit.sh
qsub -h -q all.q -N job_13 job-submit.sh
qsub -h -q all.q -N job_2 job-submit.sh
qsub -h -q all.q -N job_21 job-submit.sh
qsub -h -q all.q -N job_3 job-submit.sh
qsub -h -q all.q -N job_31 job-submit.sh
qalter -hold_jid job_11,job_12,job_13 job_1
qalter -hold_jid job_111 job_11
qalter -hold_jid job_111 job_12
qalter -hold_jid job_13,job_21 job_2
qalter -hold_jid job_21,job_31 job_3
qalter -h U job_1
qalter -h U job_11
qalter -h U job_111
qalter -h U job_12
qalter -h U job_13
qalter -h U job_2
qalter -h U job_21
qalter -h U job_3
qalter -h U job_31

Below show the above proof-of-concept in action. So sit back....

#
# ----------below is a very simple script
#
$ cat job-submit.sh
#! /bin/sh
#$ -S /bin/sh

date
sleep 10


#
# ----------run all the qsub to submit jobs, but put them on hold
#
$ qsub -h -q all.q -N job_1 job-submit.sh
Your job 333 ("job_1") has been submitted.
$ qsub -h -q all.q -N job_11 job-submit.sh
Your job 334 ("job_11") has been submitted.
$ qsub -h -q all.q -N job_111 job-submit.sh
Your job 335 ("job_111") has been submitted.
$ qsub -h -q all.q -N job_12 job-submit.sh
Your job 336 ("job_12") has been submitted.
$ qsub -h -q all.q -N job_13 job-submit.sh
Your job 337 ("job_13") has been submitted.
$ qsub -h -q all.q -N job_2 job-submit.sh
Your job 338 ("job_2") has been submitted.
$ qsub -h -q all.q -N job_21 job-submit.sh
Your job 339 ("job_21") has been submitted.
$ qsub -h -q all.q -N job_3 job-submit.sh
Your job 340 ("job_3") has been submitted.
$ qsub -h -q all.q -N job_31 job-submit.sh
Your job 341 ("job_31") has been submitted.


#
# ----------show the status, all jobs are in hold position (hqw)
#
$ qstat -f
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
all.q@sgeexec0                 BIP   0/4       0.01     sol-amd64
----------------------------------------------------------------------------
all.q@sgeexec1                 BIP   0/4       0.01     sol-amd64
----------------------------------------------------------------------------
all.q@sgeexec2                 BIP   0/4       0.01     sol-amd64

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
    333 0.00000 job_1      chihung      hqw   07/19/2007 21:04:34     1
    334 0.00000 job_11     chihung      hqw   07/19/2007 21:04:34     1
    335 0.00000 job_111    chihung      hqw   07/19/2007 21:04:34     1
    336 0.00000 job_12     chihung      hqw   07/19/2007 21:04:34     1
    337 0.00000 job_13     chihung      hqw   07/19/2007 21:04:34     1
    338 0.00000 job_2      chihung      hqw   07/19/2007 21:04:34     1
    339 0.00000 job_21     chihung      hqw   07/19/2007 21:04:34     1
    340 0.00000 job_3      chihung      hqw   07/19/2007 21:04:34     1
    341 0.00000 job_31     chihung      hqw   07/19/2007 21:04:34     1


#
# ----------register the job dependency
#
$ qalter -hold_jid job_11,job_12,job_13 job_1
modified job id hold list of job 333
   blocking jobs: 334,336,337
   exited jobs:   NONE
$ qalter -hold_jid job_111 job_11
modified job id hold list of job 334
   blocking jobs: 335
   exited jobs:   NONE
$ qalter -hold_jid job_111 job_12
modified job id hold list of job 336
   blocking jobs: 335
   exited jobs:   NONE
$ qalter -hold_jid job_13,job_21 job_2
modified job id hold list of job 338
   blocking jobs: 337,339
   exited jobs:   NONE
$ qalter -hold_jid job_21,job_31 job_3
modified job id hold list of job 340
   blocking jobs: 339,341
   exited jobs:   NONE


#
# ----------release all the holds and let SGE to sort itself out
#
$ qalter -h U job_1
modified hold of job 333
$ qalter -h U job_11
modified hold of job 334
$ qalter -h U job_111
modified hold of job 335
$ qalter -h U job_12
modified hold of job 336
$ qalter -h U job_13
modified hold of job 337
$ qalter -h U job_2
modified hold of job 338
$ qalter -h U job_21
modified hold of job 339
$ qalter -h U job_3
modified hold of job 340
$ qalter -h U job_31
modified hold of job 341


#
# ----------query SGE stats
#
$ qstat -f
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
all.q@sgeexec0                 BIP   0/4       0.01     sol-amd64
----------------------------------------------------------------------------
all.q@sgeexec1                 BIP   0/4       0.01     sol-amd64
----------------------------------------------------------------------------
all.q@sgeexec2                 BIP   0/4       0.01     sol-amd64

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
    333 0.00000 job_1      chihung      hqw   07/19/2007 21:04:34     1
    334 0.00000 job_11     chihung      hqw   07/19/2007 21:04:34     1
    335 0.00000 job_111    chihung      qw    07/19/2007 21:04:34     1
    336 0.00000 job_12     chihung      hqw   07/19/2007 21:04:34     1
    337 0.00000 job_13     chihung      qw    07/19/2007 21:04:34     1
    338 0.00000 job_2      chihung      hqw   07/19/2007 21:04:34     1
    339 0.00000 job_21     chihung      qw    07/19/2007 21:04:34     1
    340 0.00000 job_3      chihung      hqw   07/19/2007 21:04:34     1
    341 0.00000 job_31     chihung      qw    07/19/2007 21:04:34     1


#
# ----------some jobs started to run
#
$ qstat -f
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
all.q@sgeexec0                 BIP   2/4       0.01     sol-amd64
    339 0.55500 job_21     chihung      r     07/19/2007 21:05:36     1
    341 0.55500 job_31     chihung      r     07/19/2007 21:05:36     1
----------------------------------------------------------------------------
all.q@sgeexec1                 BIP   1/4       0.01     sol-amd64
    335 0.55500 job_111    chihung      r     07/19/2007 21:05:36     1
----------------------------------------------------------------------------
all.q@sgeexec2                 BIP   1/4       0.01     sol-amd64
    337 0.55500 job_13     chihung      r     07/19/2007 21:05:36     1

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
    333 0.00000 job_1      chihung      hqw   07/19/2007 21:04:34     1
    334 0.00000 job_11     chihung      hqw   07/19/2007 21:04:34     1
    336 0.00000 job_12     chihung      hqw   07/19/2007 21:04:34     1
    338 0.00000 job_2      chihung      hqw   07/19/2007 21:04:34     1
    340 0.00000 job_3      chihung      hqw   07/19/2007 21:04:34     1


$ qstat -f
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
all.q@sgeexec0                 BIP   2/4       0.01     sol-amd64
    339 0.55500 job_21     chihung      r     07/19/2007 21:05:36     1
    341 0.55500 job_31     chihung      r     07/19/2007 21:05:36     1
----------------------------------------------------------------------------
all.q@sgeexec1                 BIP   1/4       0.01     sol-amd64
    335 0.55500 job_111    chihung      r     07/19/2007 21:05:36     1
----------------------------------------------------------------------------
all.q@sgeexec2                 BIP   1/4       0.01     sol-amd64
    337 0.55500 job_13     chihung      r     07/19/2007 21:05:36     1

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
    333 0.00000 job_1      chihung      hqw   07/19/2007 21:04:34     1
    334 0.00000 job_11     chihung      hqw   07/19/2007 21:04:34     1
    336 0.00000 job_12     chihung      hqw   07/19/2007 21:04:34     1
    338 0.00000 job_2      chihung      hqw   07/19/2007 21:04:34     1
    340 0.00000 job_3      chihung      hqw   07/19/2007 21:04:34     1


$ qstat -f
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
all.q@sgeexec0                 BIP   0/4       0.01     sol-amd64
----------------------------------------------------------------------------
all.q@sgeexec1                 BIP   0/4       0.01     sol-amd64
----------------------------------------------------------------------------
all.q@sgeexec2                 BIP   0/4       0.01     sol-amd64

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
    333 0.00000 job_1      chihung      hqw   07/19/2007 21:04:34     1
    334 0.00000 job_11     chihung      qw    07/19/2007 21:04:34     1
    336 0.00000 job_12     chihung      qw    07/19/2007 21:04:34     1
    338 0.00000 job_2      chihung      qw    07/19/2007 21:04:34     1
    340 0.00000 job_3      chihung      qw    07/19/2007 21:04:34     1


$ qstat -f
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
all.q@sgeexec0                 BIP   2/4       0.01     sol-amd64
    338 0.55500 job_2      chihung      r     07/19/2007 21:05:51     1
    340 0.55500 job_3      chihung      r     07/19/2007 21:05:51     1
----------------------------------------------------------------------------
all.q@sgeexec1                 BIP   1/4       0.01     sol-amd64
    334 0.55500 job_11     chihung      r     07/19/2007 21:05:51     1
----------------------------------------------------------------------------
all.q@sgeexec2                 BIP   1/4       0.01     sol-amd64
    336 0.55500 job_12     chihung      r     07/19/2007 21:05:51     1

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
    333 0.00000 job_1      chihung      hqw   07/19/2007 21:04:34     1


$ qstat -f
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
all.q@sgeexec0                 BIP   2/4       0.01     sol-amd64
    338 0.55500 job_2      chihung      r     07/19/2007 21:05:51     1
    340 0.55500 job_3      chihung      r     07/19/2007 21:05:51     1
----------------------------------------------------------------------------
all.q@sgeexec1                 BIP   1/4       0.01     sol-amd64
    334 0.55500 job_11     chihung      r     07/19/2007 21:05:51     1
----------------------------------------------------------------------------
all.q@sgeexec2                 BIP   1/4       0.01     sol-amd64
    336 0.55500 job_12     chihung      r     07/19/2007 21:05:51     1

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
    333 0.00000 job_1      chihung      hqw   07/19/2007 21:04:34     1


$ qstat -f
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
all.q@sgeexec0                 BIP   0/4       0.01     sol-amd64
----------------------------------------------------------------------------
all.q@sgeexec1                 BIP   0/4       0.01     sol-amd64
----------------------------------------------------------------------------
all.q@sgeexec2                 BIP   0/4       0.01     sol-amd64

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
    333 0.00000 job_1      chihung      qw    07/19/2007 21:04:34     1


$ qstat -f
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
all.q@sgeexec0                 BIP   0/4       0.01     sol-amd64
----------------------------------------------------------------------------
all.q@sgeexec1                 BIP   0/4       0.01     sol-amd64
----------------------------------------------------------------------------
all.q@sgeexec2                 BIP   1/4       0.01     sol-amd64
    333 0.55500 job_1      chihung      r     07/19/2007 21:06:06     1


$ qstat -f
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
all.q@sgeexec0                 BIP   0/4       0.01     sol-amd64
----------------------------------------------------------------------------
all.q@sgeexec1                 BIP   0/4       0.01     sol-amd64
----------------------------------------------------------------------------
all.q@sgeexec2                 BIP   1/4       0.01     sol-amd64
    333 0.55500 job_1      chihung      r     07/19/2007 21:06:06     1


#
# ----------output of all jobs, you can see job job_1/2/3 finished last
#
$ grep 2007 job_*.o*
job_111.o335:Thu Jul 19 21:05:36 SGT 2007
job_11.o334:Thu Jul 19 21:05:51 SGT 2007
job_12.o336:Thu Jul 19 21:05:51 SGT 2007
job_13.o337:Thu Jul 19 21:05:36 SGT 2007
job_1.o333:Thu Jul 19 21:06:06 SGT 2007
job_21.o339:Thu Jul 19 21:05:36 SGT 2007
job_2.o338:Thu Jul 19 21:05:51 SGT 2007
job_31.o341:Thu Jul 19 21:05:37 SGT 2007
job_3.o340:Thu Jul 19 21:05:52 SGT 2007

Another successful proof-of-concept. :-)

Labels: Graphviz, SGE, Tcl

posted by chihungchan at 9:32 PM

Consumable configuration best practices question for hundreds of resources for specific group of nodes

[gridengine users] Consumable configuration best practices question for hundreds of resources for specific group of nodes
William Hay w.hay at ucl.ac.uk
Mon Mar 30 08:41:10 UTC 2015
•Previous message: [gridengine users] Consumable configuration best practices question for hundreds of resources for specific group of nodes
•Next message: [gridengine users] External Scheduler
• Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Sun, 29 Mar 2015 08:50:15 +0000
Yuri Burmachenko <yuribu at mellanox.com> wrote:


>
> Users will care about which cells they are using.

Could you confirm my understanding is correct below is correct:
The users of this system care which cells they need to use for reasons other than avoiding oversubscription of the cell.
Cell 25 is fundamentally different from cell 39 even when both are free.
The users want to be able to tell the scheduler which cells to use rather than being able to write a job script that can read a list of cells
to use from a file or similar.

If all the above is true then your 300 different complex_values are probably unavoidable but it won't be pretty.

>
> Our partial solution should allow the users to control/monitor/request/free these cells.
>
>
>
> I looked into the links https://arc.liv.ac.uk/trac/SGE/ticket/1426 and http://gridengine.eu/grid-engine-internals/102-univa-grid-engine-810-features-part-2-better-resource-management-with-the-rsmap-complex-2012-05-25 - I see that many consumable resources can be attached on host basis with RSMAP.
>
Not entirely AIUI (and we're not Univa customers) RSMAP resources can be associated with queues or the global host as well. Also you request the number of resources you want but UGE assigns the specific resources(cells in you case) that your job will use. If I'm understanding you correctly that won't work for you.

> We need to be able to attach these 300 consumable resources as shared between 4 nodes – is it possible? Maybe a separate queue for these 4 particular hosts with list of complex consumable resources?

That doesn't work because resources defined on a cluster queue exist for each queue instance.

Grid Engine doesn't have a simple way to associate a resource with a group of hosts other than the cluster as a whole. What you can do is define resource availability on the global pseudo host then add a restriction by some means to prevent usage other than on the hosts in question:

*You could define your queue configuration so that all queues on all other nodes have 0 of the resource available while the nodes with access say nothing about availability and therefore have access to the full resources defined on the global host.
*You could define the resources as having 0 availability on hosts other than the ones in question.
*You could probably also do the same with resource quotas.

The first of the above is probably simplest/least work assuming your existing queue configuration is simple.


> All cells are different and users will need to know which one they need to request. At this stage they all should be distinct.

OK. If users request a lot of different cells for individual jobs this will probably lead to long delays before jobs start. Said users will almost certainly want to request
a dynamic reservation for their jobs (-R y).

Undocumented Feature of load sensors

[gridengine users] Undocumented Feature of load sensors
Fritz Ferstl fferstl at univa.com
Thu Apr 16 15:15:36 UTC 2015
•Previous message: [gridengine users] Undocumented Feature of load sensors
•Next message: [gridengine users] Undocumented Feature of load sensors
• Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

It is certainly an intended feature, William. Always was since load
sensor were introduced in the late 90s.

The thought behind it was that you might have central system management
services which maintain host level information. You can then put the
load sensor on the system management server instead of having 1000s of
hosts query it. But you can use it for other stuff as well, of course.

Cheers,

Fritz

William Hay schrieb:
> It appears that you can have load sensors report values for individual hosts other than the one on which it runs. I've tested this by having a load sensor run on one host report different values for two different hosts and used qhost -F to verify that gridengine reports them.
>
> The possibility of doing this is implied by the format of load sensor reports but I've never seen it explicitly documented as possible or used elsewhere.
>
> Being able to use this would simplify certain aspects of the configuration of our production cluster so it would be useful to know if this is intended behavior
> and therefore something I can rely on or an implementation quirk.
>
> Opinions?

Anyone have scripts for detecting users who bypass grid engine

Reuti reuti at staff.uni-marburg.de
Thu Apr 9 21:19:12 UTC 2015


Am 09.04.2015 um 23:09 schrieb Feng Zhang:

> I know that some people use ssh as rsh_command, which may have similar problem?

Not when you have a tight integration of `ssh` in SGE:

https://arc.liv.ac.uk/SGE/htmlman/htmlman5/remote_startup.html section "SSH TIGHT INTEGRATION"

Then `ssh` can't spawn any process which escapes form SGE.

-- Reuti


> On Thu, Apr 9, 2015 at 3:46 PM, Reuti <reuti at staff.uni-marburg.de> wrote:
>> Am 09.04.2015 um 21:23 schrieb Chris Dagdigian:
>> 
>>> 
>>> I'm one of the people who has been arguing for years that technological methods for stopping abuse of GE systems never work in the long term because motivated users always have more time and interest than overworked admins so it's kind of embarrassing to ask this but ...
>>> 
>>> Does anyone have a script that runs on a node and prints out all the userland processes that are not explicitly a child of a sge_sheperd daemon?
>>> 
>>> I'm basically looking for a light way to scan a node just to see if there are users/tasks running that are outside the awareness of the SGE qmaster.  Back in the day when we talked about this it seemed that one easy method was just looking for user stuff that was not a child process of a SGE daemon process.
>>> 
>>> The funny thing is that it's not the HPC end users who do this. As the grid(s) get closer and closer to the enterprise I'm starting to see software developers and others trying to play games and then plead ignorance when asked "why did you SSH to a compute node and start a tomcat service out of your home directory?". heh.
>> 
>> Why allow `ssh` to a node at all? In my installations only the admins can do this. If users want to peek around on a node I have an interactive queue with a h_cpu limit of 60 seconds for this. So even login in to a node is controlled by SGE.
>> 
>> -- Reuti
>> 
>> 
>>> 
>>> -chris
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> users at gridengine.org
>>> https://gridengine.org/mailman/listinfo/users
>> 
>> 
>> _______________________________________________
>> users mailing list
>> users at gridengine.org
>> https://gridengine.org/mailman/listinfo/users
> 
> 
> 
> -- 
> Best,
> 
> Feng
> 

Recommended Links

Softpanorama hot topic of the month

Softpanorama Recommended

Oracle Grid Engine documentation

Oracle Grid Engine - Wikipedia, the free encyclopedia

Guide to Using the Grid Engine The Particle Beam Physics Laboratory at the UCLA Department of Physics and Astronomy

How To Use Sun Grid Engine Main Biowiki

SUN Grid Engine - UU/Department of Information Technology

BeocatDocs-SunGridEngine - CIS Support

Sun

Oracle

Univa

Grid Engine in 2012 & Beyond

What the heck is going on with Grid Engine in 2012 and beyond? If you’ve found this page and have managed to keep reading, you are probably interested in Grid Engine and what it may look like in the future. This post will attempt to summarize what is currently available.

History of this site

This website was thrown together very quickly in early 2011 when Oracle announced it was taking Oracle Grid Engine in a new “closed source” direction. Very soon after the announcement, the open source SGE codebase was forked by multiple groups. Oracle had also been hosting the popular gridengine.sunsource.net site where documentation, HowTo’s and a very active mailing list had become the default support channel for many SGE users and administrators.

This website was seen as a gathering point and central public portal for the various grid engine fork efforts. It was also a natural place to host a new “users@gridengine.org” mailing list in an attempt to recreate the atmosphere found in the old “users@gridengine.sunsource.net” listserv community.

The new mailing list was a success but efforts to build a “Steering Committee” that would drive some sort of coordinated effort stalled throughout most of 2011. Truth be told, we probably don’t need a central site or even a steering committee - the maintainers of the various forks all know each other and can easily trade patches, advice and collaborative efforts among themselves.

It’s best simply to recast the gridengine.org site as a convenient place for information broadly of interest to all Grid Engine users, administrators and maintainers – mailing lists, news and pointers to information, software & resources.

Available Grid Engine Options

Open Source

“Son of Grid Engine”

URL: https://arc.liv.ac.uk/trac/SGE
News & Announcements: http://arc.liv.ac.uk/repos/darcs/sge/NEWS
Description: Baseline code comes from the Univa public repo with additional enhancements and improvements added. The maintainer(s) have deep knowledge of SGE source and internals and are committed to the effort. Future releases may start to diverge from Univa as Univa pursues an “open core” development model. Maintainers have made efforts to make building binaries from source easier and the latest release offers RedHat Linux SRPMS and RPM files ready for download.
Support: Supported via the maintainers and the users mailing list.

“Open Grid Scheduler”

URL: http://gridscheduler.sourceforge.net/
Description: Baseline code comes from the last Oracle open source release with significant additional enhancements and improvements added. The maintainer(s) have deep knowledge of SGE source and internals and are committed to the effort. No pre-compiled “courtesy binaries” available at the SourceForge site (just source code and instructions on how to build Grid Engine locally). In November 2011 a new company ScalableLogic announced plans to offer commercial support options for users of Open Grid Scheduler.
Support
: Supported via the maintainers and the users mailing list. Commercial support from ScalableLogic.

Commercial

“Univa Grid Engine”

URL: http://www.univa.com/products/grid-engine
Description: Commercial company selling Grid Engine, support and layered products that add features and functionality. Several original SGE developers are now employed by Univa. Evaluation versions and “48 cores for free” are available from the website.
Support: Univa supports their own products.

“Oracle Grid Engine”

URL: http://www.oracle.com/us/products/tools/oracle-grid-engine-075549.html
Description: Continuation of “Sun Grid Engine” after Oracle purchased Sun. This is the current commercial version of Oracle Grid Engine after Oracle discontinued the open source version of their product and went 100% closed-source.
Support: Oracle supports their own products, a web support forum for Oracle customers can be found at https://forums.oracle.com/forums/forum.jspa?forumID=859

Univa Grid Engine - Wikipedia, the free encyclopedia

Univa Grid Engine - Daniel's Blog about Grid Engine



Etc

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available in our efforts to advance understanding of environmental, political, human rights, economic, democracy, scientific, and social justice issues, etc. We believe this constitutes a 'fair use' of any such copyrighted material as provided for in section 107 of the US Copyright Law. In accordance with Title 17 U.S.C. Section 107, the material on this site is distributed without profit exclusivly for research and educational purposes.   If you wish to use copyrighted material from this site for purposes of your own that go beyond 'fair use', you must obtain permission from the copyright owner. 

ABUSE: IPs or network segments from which we detect a stream of probes might be blocked for no less then 90 days. Multiple types of probes increase this period.  

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least


Copyright © 1996-2016 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License.

The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to make a contribution, supporting development of this site and speed up access. In case softpanorama.org is down you can use the at softpanorama.info

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

Last modified: July 21, 2017